SlideShare a Scribd company logo
1 of 10
Download to read offline
Stitching Video from Webcams

                             Mai Zheng, Xiaolin Chen, and Li Guo

           Department of Electronic Science and Technology, University of Science and
                                    Technology of China
          {zhengmai,myhgy}@mail.ustc.edu.cn, lguo@ustc.edu.cn



        Abstract. This paper presents a technique to create wide field-of-view from
        common webcams. Our system consists of two stages: the initialization stage
        and the real-time stage. In the first stage, we detect robust features in the initial
        frame of each webcam and find the corresponding points between them. Then
        the matched point pairs are employed to compute the perspective matrix which
        describes the geometric relationship of the adjacent views. After the initializa-
        tion stage, we register the frame sequences of different webcams on the same
        plane using the perspective matrix and synthesize the overlapped region using a
        nonlinear blending method in real time. In this way, the narrow fields of each
        webcam are displayed together as one wide scene. We demonstrate the effec-
        tiveness of our method on a prototype that consists of two ordinary webcams
        and show that this is an interesting and inexpensive way to experience the wide-
        angle observation.



1 Introduction

As we explore a scene, we turn our eyes and head around, capture information in a
wide field-of-view, and then get a comprehensive view. Similarly, a panoramic pic-
ture or video can always provide much more information as well as richer experience
than a single narrow representation. These advantages, together with various applica-
tion prospects, such as teleconferencing and virtual reality, have motivated many
researchers to develop techniques for creating a panorama.
   Typically, the term “panorama” refers to single-viewpoint panoramic image [1],
which can be created by rotating a camera around its optical center. Another main
type of panorama is called the strip panorama [2][3], which is created from a translat-
ing camera. But no matter which technical variants they use, creating a panorama
starts with static images and requires that all of the frames to be stitched be prepared
and organized as an image sequence before mosaicing. For a static mosaic, there is no
time constraint on stitching all of the images into one.
   In this paper, we propose a novel method to create panoramic video from web-
cams. Different from previous video mosaics[4] which move one camera to record a
continuous image sequence and then create a static panoramic image, we capture two-
pass videos and stitch each pair of frames from the different videos in real-time. In
other words, our panorama is a wide video displayed in real-time instead of a static
panoramic picture.

G. Bebis et al. (Eds.): ISVC 2008, Part II, LNCS 5359, pp. 420–429, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Stitching Video from Webcams      421


2 Related Work

A tremendous amount of progress has been made in static image mosaicing. For
example, strip panorama techniques [2][3] capture the horizontal outdoor scenes
continuously and then stitch them into a long panoramic picture, which can be used
for digital tourism and the like. Many techniques such as plane-sweep [5] and
multi-view projection [6] have been developed for removing ghosting and blurring
artifacts.
   As for panoramic video, however, the technology is still not mature. One of the
main troubles is the real-time requirement. The common frame rate is 25~ 30 FPS,
so if we want to create video panorama, we need to create each panoramic frames
within at most 0.04 seconds, which means that the stitching algorithms for static
image mosaicing cannot be applied to stitch real-time frames directly. And due to
the time-consuming computation involved, existing methods for improving static
panorama can hardly be applied for stitching videos. To skirt these troubles, some
researchers resort to hardware. For example, a carefully designed camera cluster
which guarantees an approximate common virtual COP (center of projection) [7]
can easily register the inputs and avoid parallax to some extent. But from another
perspective, this kind of algorithm is undesirable because it relies heavily on the
capturing device.
   Our approach does not need special hardware. Instead, it makes use of the or-
dinary webcams, which means that the system is inexpensive and easily applica-
ble. Besides this, the positions and directions of the webcams are flexible as long
as they have some overlapped field-of-view. We design a two-stage solution to
tackle this challenging situation. The whole system will be discussed in section 3
and the implementation of each stage will be discussed in section 4 and 5 in
detail.


3 System Framework

As is shown in Fig. 1, the inputs of our system are independent frame sequences from
two common webcams and the output is the stitching video. To achieve real time, we
separate the processing into two stages. The first one, called the initialization stage,
only needs to be run once after the webcams are fixed. This stage includes several
time-consuming procedures which are responsible for calculating the geometric rela-
tionship between the adjacent webcams. We firstly detect robust features in the initial
frame of each webcams and then match them between the adjacent views. The correct
matches are then employed to estimate the perspective matrix. The next stage runs in
real time. In this stage, we make use of the matrix from the first stage to register the
frames of different webcams on the same plane and blend the overlapped region using
a nonlinear weight mask. The implementation of the two stages will be discussed later
in detail.
422      M. Zheng, X. Chen, and L. Guo



                                     Real-time Stage

                Frames                                Projection      Frames
                  of
                                  Initialized
                                                Y        &              of
                Narrow                                Blending        Wide
                 View                  ?                              View
                                                                                 Display
   Webcams                            N
                            Feature Detection         Projective Matrix



                            Feature Matching             RANSAC


                                            Initialization

Fig. 1. Framework of our system. The initialization stage estimates the geometric relationship
between the webcams based on the initial frames. The real-time stage registers and blends the
frame sequences in real time.


4 Initialization Stage

Since the location and orientation of the webcams are flexible, the geometric relation-
ship between the adjacent views is unknown before registration. We choose one web-
cam as a base and use the full planar perspective motion model [8] to register the
other view on the same plane. The planar perspective transform warps an image into
another using 8 parameters:

                      ⎡x '⎤               ⎛ h11 h12   h13 ⎞ ⎡ x ⎤
                      ⎢ y '⎥ = u ' ~ Hu = ⎜ h h           ⎟
                                                      h23 ⎟ ⎢ y ⎥ .
                      ⎢ ⎥                 ⎜ 21 22           ⎢ ⎥                            (1)
                      ⎢1 ⎥                ⎜               ⎟⎢ ⎥
                      ⎣ ⎦                 ⎝ h31 h32    1 ⎠ ⎣1 ⎦

where u = ( x, y,1)T and u ' = ( x ', y ',1)T are homogeneous coordinates of the two
views, and ~ indicates equality up to scale since H is itself homogeneous. The per-
spective transform is a superset of translation, rigid, and similarity well as affine
transforms. We seek to compute an optimized matrix H between the views so that
they can be aligned well in the same plane.
   To recover the 8 parameters, we firstly extract keypoints in each input frame, and
then match them between the adjacent two. Many classic detectors such as Canny [9]
and Harris [10] can be employed to extract interesting points. However, they are not
robust enough for matching in our case, which involves rotation and some perspective
relation between the adjacent views. In this paper, we calculate the SIFT features [11]
[12], which were originally used in object recognition.
Stitching Video from Webcams                      423


  Simply put, there are 4 extraction steps. In the first step, we filter the frame with a
Gaussian kernel:

                             L( x, y,σ) =G( x, y,σ) ⋅ I ( x, y) .                                           (2)

                                                                                     (x +y )
                                                                                      2      2


        I ( x, y ) is the initial frame and G ( x, y,σ ) =           1           −
where                                                                       ⋅e        2σ 2       . Then we con-
                                                                    2πσ 2
struct a DoG (Difference of Gaussian) space as follows:

                  D ( x, y, σ ) = L ( x, y, kσ ) − L ( x, y, σ ) .                                          (3)

where k is the scaling factor. The extrema in the DoG space are taken as keypoints.
  In the second step, we calculate the accurate localization of the keypoints through
Taylor expansion of the DoG function:

                                        ∂DT    1 ∂2D
                     D( v) = D +            v + vT      v.                                                  (4)
                                         ∂v    2   ∂v 2
where v = ( x, y , σ ) . From formula (4), we get the sub-pixel and sub-scale coordi-
                        T

nates as follows:
                                 ^   ∂ 2 D −1 ∂DT
                                 v=−         .    .                                                         (5)
                                      ∂v 2 ∂v
                                       ^
A threshold is added on the D ( v ) value to discard unstable points. Also, we make
use of the Hessian matrix to eliminate edge responses:

                                Tr ( M Hes ) ( r + 1)
                                                          2

                                              <       .                                                     (6)
                                Det ( M Hes )    r

                ⎛ Dxx       Dxy ⎞
where M Hes = ⎜                 ⎟ is the Hessian matrix, and Tr ( M Hes ) and Det ( M Hes ) are
                ⎝ Dxy       Dyy ⎠
the trace and determinant of M Hes . r is an experience threshold and we set r =10 in
this study.
   In the third step, the gradient orientations and magnitudes of the sample pixels
within a Gaussian window are employed to calculate a histogram to assign the key-
point with orientation. And finally, a 128D descriptor of every keypoint is obtained by
concatenating the orientation histograms over a 16 × 16 region.
   By comparing the Euclidean distances of the descriptors, we get an initial set of
corresponding keypoints (Fig. 2 (a)). The feature descriptors are invariant to transla-
tion and rotation as well as scaling. However, they are only partially affine-invariant,
so the initial matched pairs often contain outliers in our cases. We prune the outliers
424      M. Zheng, X. Chen, and L. Guo


by fitting the candidate correspondences into a perspective motion model based on
RANSAC [13] iteration. Specifically, we randomly choose 4 pairs of matched points
in each iteration and calculate an initial projective matrix, then use the formula below
to find out whether the matrix is suitable for other points:

                               ⎛ xn ⎞
                                  '
                                            ⎛ xn ⎞
                               ⎜ '⎟         ⎜ ⎟
                               ⎜ yn ⎟ − H ⋅ ⎜ yn ⎟ < θ .                                (7)
                               ⎜1⎟          ⎜1⎟
                               ⎝ ⎠          ⎝ ⎠
Here H is the initial projective matrix andθ is the threshold of outliers. In order to
get a better toleration of parallax, a loose threshold of inliers is used. The matrix con-
sistent with most initial matched pairs is considered as the best initial matrix, and the
pairs fitting in it are considered as correct matches (Fig. 2 (b)).




                                                (a)




                                                (b)

Fig. 2. (a) Two frames with large misregistration and initial matched features between them.
Note that there are mismatched pairs besides the correct ones (b) Correct matches after
RANSAC filtering


   After purifying the matched pairs, the ideal perspective matrix H is estimated us-
ing use a least squares method. In detail, we construct the error function below and
minimize the sum of the squared distances between the coordinates of the correspond-
ing features:
                         N                      2     N                      2

                Ferror = ∑ Hu warp,n − u base, n = ∑ u warp ',n − u base,n              (8)
                        n =1                          n =1


where ubase,n is the homogeneous coordinate of the nth feature in the image to be
projected to, and uwarp,n is the correspondence of ubase,n in another view.
Stitching Video from Webcams             425


5 Real-Time Stage
After obtaining the perspective matrix between the adjacent webcams, we project the
frames of one webcam onto another and blend them in real-time. Since the webcams
are placed relatively freely, they may not have a common center of projection and
thus are likely to result in parallax. In other words, the frames of different webcams
cannot be registered strictly. Therefore, we designed a nonlinear blending strategy to
minimize the ghosting and blurring of the overlapped region. Essentially, this is a
kind of alpha-blending. The synthesized frames Fsyn can be presented as follows:

                Fsyn ( x, y ) = α ( x, y ) ∗ Fbase + (1 − α ( x, y ) ) * Fproj                          (9)
where Fbase are the frames of the base webcam, Fproj are the frames projected from
the adjacent webcam, and        α ( x, y )   is the weight on pixel ( x, y ) .
   In the conventional blending method, the weight of pixels is a linear function in
proportion to the distance to the image boundaries. This method treats the different
views equally and performs well in normal cases. However, in the case of severe
parallax, the linear combination will result in blurring and ghosting in the whole over-
lapped region, as is the case in Fig 3(b), so we use a special α function to give prior-
ity to one pass to avoid the conflict in the overlapped region of two webcams. Simply
put, we construct a nonlinear α mask as below:
                       ⎧1,if min(x, y, x -W , y - H )>T
                       ⎪
            α ( x, y ) = ⎨sin(π ⋅ (min( x, y, x −W , y − H ) / T − 0.5) +1                             (10)
                       ⎪                                                     , otherwise
                       ⎩                        2
where W and H are the width and height of the frame, and T is the width of the
nonlinear decreased border. The mask is registered with the frames and clipped ac-
cording to the region to be blended. The α value remains the same in the central part
of the base frame, and begins to drop sharply when it comes close enough to the
boundaries of another layer. The gradual change can be controlled by T . The transi-
tion of different frames is smoother and more natural if T is larger, but the clear cen-
tral region is also smaller, and vice versa. We refer to this method as nonlinear mask
blending. By this nonlinear synthesis, we keep a balance between the smooth transi-
tion of boundaries and the uniqueness and clarity of the interiors.




    (a) A typical pair of scene                     (b) Linear Blending             (c) Our blending
        with strong parallax
Fig. 3. Comparison between linear blending and our blending strategy on typical scenes with
severe parallax
426     M. Zheng, X. Chen, and L. Guo


6 Results
In this section, we show the results of our method on different scenes. We built a
prototype with two common webcams, as is shown in Fig, 4. The webcams are placed
together on a simple support, and the lenses are flexible and can be rotated and di-
rected to different orientations freely. Each webcam has a resolution of QVGA
(320 × 240 pixels) with a frame rate of 30 FPS.




Fig. 4. Two common webcams fixed on a simple support. The lenses are flexible and can be
rotated and adjusted to different orientations freely.

               Table 1. Processing time of the main procedures of the system

       Stage             Procedure                            Time (second)
                         Feature detection                    0.320 ~ 0.450
                         Feature matching                     0.040 ~ 0.050
        Initialization
                         RANSAC filtering                     0.000 ~ 0.015
                         Matrix computation                   0.000 ~ 0.001
          Real time      Projection and Blending              0.000 ~ 0.020

   The processing time of the main procedures is listed in Table 1. The system runs
on a PC with E4500 2.2GHz CPU and 2GB memory. The initialization stage usually
takes about 0.7 ~ 1 second, according to the content of the scene. The projection and
blending usually takes less than 0.02 seconds for a pair of frames, thus can run in real-
time. Note that whenever the webcams are changed, there should be an initialization
stage to re-compute the geometric relationship between the webcams. Currently, this
re-initialization is started by user. After the initialization, the system can process the
video at a rate of 30FPS.
   In our system, the positions and directions of the webcams are adjustable as long as
they have some overlapped field-of view. Typically, the overlapped region should be
20% of the original view or above, otherwise there may not be enough robust features
to match between the webcams. Fig. 5 shows the stitching result of some typical
frames. In these cases, the webcams are intentionally rotated to a certain angle or even
turned upside down. As can be seen in the figures, the system can still register and
blend the frames into a natural whole scene. Fig. 6 shows some typical stitching
scenes from a real-time video. In (a), the two static indoor views are stitched into a
wide view. In (b) and (c), some moving objects show up in the scene, either far away
or close to the lens. As illustrated in the figures, the stitching views are as clear and
natural as the original narrow view.
Stitching Video from Webcams         427




              (a)A pair of frames with 150 rotation and the stitching result




             (b) A pair of frames with 90 0 rotation and the stitching result




             (c) A pair of frames with 1800 rotation and the stitching result
Fig. 5. Stitching frames of some typical scenes. The webcams are intentionally rotated to a
certain angle or turned upside down.




                                     (a) A static scene




                        (b) A far away object moving in the scene




                          (c) A close object moving in the scene
Fig. 6. Stitching results from a real-time video. Moving objects in the stitching scene are as
clear as in the original narrow view.
428     M. Zheng, X. Chen, and L. Guo


   Although our system is flexible and robust enough in normal conditions, the qual-
ity of the mosaicing video does drop severely in the following two cases: firstly, when
the scene lacks salient features, as is the case of a white wall, then the geometric rela-
tionship of the webcams cannot be estimated correctly; secondly, when the parallax is
too strong, there may be noticeable traces of stitching in the frame border. These
problems can be avoided by targeting the lens at some salient scenes and adjusting the
orientation of the webcams.


7 Conclusions and Future Work
In this paper, we have presented a technique for stitching videos from webcams. The
system receives the frame sequences from common webcams and outputs a synthe-
sized video with a wide field-of-view in real time. The positions and directions of the
webcams are flexible as long as they have some overlapped field-of-view. There are
two stages in the system. The initialization stage calculates the geometric relationship
of frames from adjacent webcams. A nonlinear mask blending method which avoids
the ghosting and blurring in the main part of the overlapped region is proposed for
synthesizing the frames in real time. As illustrated by experimental result, this is an
effective and inexpensive way to construct video with a wide field-of-view.
   Currently, we have only focused on using two webcams. As a natural extension of
our work, we would like to boost up to more webcams. We also plan to explore the
hard and interesting issues of how to eliminate the exposure differences between
webcams in real time and solve the problems mentioned at the end of last section.


Acknowledgment
The financial support provided by National Natural Science Foundation of China
(Project ID: 60772032) and Microsoft (China) Co., Ltd. are gratefully acknowledged.


References
 1. Szeliski, R., Shum, H.Y.: Creating Full View Panoramic Mosaics and Environment Maps.
    In: Proc. of SIGGRAPH 1997, Computer Graphics Proceedings. Annual Conference Se-
    ries, pp. 251–258 (1997)
 2. Agarwala, A., Agrawala, M., Chen, M., Salesin, D., Szeliski, R.: Photographing Long
    Scenes with Multi-Viewpoint Panoramas. In: Proc. of SIGGRAPH, pp. 853–861 (2006)
 3. Zheng, J.Y.: Digital route panoramas. IEEE MultiMedia 10(3), 57–67 (2003)
 4. Hsu, C.-T., Cheng, T.-H., Beukers, R.A., Horng, J.-K.: Feature-based Video Mosaic. Im-
    age Processing, 887–890 (2000)
 5. Kang, S.B., Szeliski, R., Uyttendaele, M.: Seamless Stitching Using Multi-Perspective
    Plane Sweep. Microsoft Research, Tech. Rep. MSR-TR-2004-48 (2004)
 6. Zelnik-Manor, L., Peters, G., Perona, P.: Squaring the Circle in Panoramas. In: Proc. 10th
    IEEE Conf. on Computer Vision (ICCV 2005), pp. 1292–1299 (2005)
Stitching Video from Webcams        429


 7. Majumder, A., Gopi, M., Seales, W.B., Fuchs, H.: Immersive Ieleconferencing: A New
    Algorithm to Generate Seamless Panoramic Imagery. In: Proc. of ACM Multimedia, pp.
    169–178 (1999)
 8. Szeliski, R.: Video Mosaics for Virtual Environments. IEEE Computer Graphics and Ap-
    plications, 22–23 (1996)
 9. Canny, J.: A Computational Approach to Edge Detection. IEEE Trans Pattern Analysis
    and Machine Intelligence 8, 679–698 (1986)
10. Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: Proc. of the 4th
    Alvey Vision Conference, pp. 147–151 (1988)
11. Lowe, D.G.: Distinctive Image Features From Scale-invariant Keypoints. International
    Journal of Computer Vision 60(2), 91–110 (2004)
12. Winder, S., Brown, M.: Learning Local Image Descriptors. In: Proc. of the International
    Conference on Computer Vision and Pattern Recognition (CVPR 2007), pp. 1–8 (2007)
13. Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice-Hall, Engle-
    wood Cliffs (2003)

More Related Content

What's hot

Java3 d 1
Java3 d 1Java3 d 1
Java3 d 1Por Non
 
Simulations of Strong Lensing
Simulations of Strong LensingSimulations of Strong Lensing
Simulations of Strong LensingNan Li
 
Computer vision 3 4
Computer vision 3 4Computer vision 3 4
Computer vision 3 4sachinmore76
 
4 satellite image fusion using fast discrete
4 satellite image fusion using fast discrete4 satellite image fusion using fast discrete
4 satellite image fusion using fast discreteAlok Padole
 
Bidirectional bias correction for gradient-based shift estimation
Bidirectional bias correction for gradient-based shift estimationBidirectional bias correction for gradient-based shift estimation
Bidirectional bias correction for gradient-based shift estimationTuan Q. Pham
 
3 intensity transformations and spatial filtering slides
3 intensity transformations and spatial filtering slides3 intensity transformations and spatial filtering slides
3 intensity transformations and spatial filtering slidesBHAGYAPRASADBUGGE
 
改进的固定点图像复原算法_英文_阎雪飞
改进的固定点图像复原算法_英文_阎雪飞改进的固定点图像复原算法_英文_阎雪飞
改进的固定点图像复原算法_英文_阎雪飞alen yan
 
Robust Super-Resolution by minimizing a Gaussian-weighted L2 error norm
Robust Super-Resolution by minimizing a Gaussian-weighted L2 error normRobust Super-Resolution by minimizing a Gaussian-weighted L2 error norm
Robust Super-Resolution by minimizing a Gaussian-weighted L2 error normTuan Q. Pham
 
Hussain Learning Relevant Eye Movement Feature Spaces Across Users
Hussain Learning Relevant Eye Movement Feature Spaces Across UsersHussain Learning Relevant Eye Movement Feature Spaces Across Users
Hussain Learning Relevant Eye Movement Feature Spaces Across UsersKalle
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)Jia-Bin Huang
 
Automatic Classification Satellite images for weather Monitoring
Automatic Classification Satellite images for weather MonitoringAutomatic Classification Satellite images for weather Monitoring
Automatic Classification Satellite images for weather Monitoringguest7782414
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
Kccsi 2012 a real-time robust object tracking-v2
Kccsi 2012   a real-time robust object tracking-v2Kccsi 2012   a real-time robust object tracking-v2
Kccsi 2012 a real-time robust object tracking-v2Prarinya Siritanawan
 
Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4IAEME Publication
 

What's hot (20)

Java3 d 1
Java3 d 1Java3 d 1
Java3 d 1
 
Simulations of Strong Lensing
Simulations of Strong LensingSimulations of Strong Lensing
Simulations of Strong Lensing
 
Computer vision 3 4
Computer vision 3 4Computer vision 3 4
Computer vision 3 4
 
4 satellite image fusion using fast discrete
4 satellite image fusion using fast discrete4 satellite image fusion using fast discrete
4 satellite image fusion using fast discrete
 
CS 354 Lighting
CS 354 LightingCS 354 Lighting
CS 354 Lighting
 
Bidirectional bias correction for gradient-based shift estimation
Bidirectional bias correction for gradient-based shift estimationBidirectional bias correction for gradient-based shift estimation
Bidirectional bias correction for gradient-based shift estimation
 
3 intensity transformations and spatial filtering slides
3 intensity transformations and spatial filtering slides3 intensity transformations and spatial filtering slides
3 intensity transformations and spatial filtering slides
 
改进的固定点图像复原算法_英文_阎雪飞
改进的固定点图像复原算法_英文_阎雪飞改进的固定点图像复原算法_英文_阎雪飞
改进的固定点图像复原算法_英文_阎雪飞
 
Robust Super-Resolution by minimizing a Gaussian-weighted L2 error norm
Robust Super-Resolution by minimizing a Gaussian-weighted L2 error normRobust Super-Resolution by minimizing a Gaussian-weighted L2 error norm
Robust Super-Resolution by minimizing a Gaussian-weighted L2 error norm
 
Hussain Learning Relevant Eye Movement Feature Spaces Across Users
Hussain Learning Relevant Eye Movement Feature Spaces Across UsersHussain Learning Relevant Eye Movement Feature Spaces Across Users
Hussain Learning Relevant Eye Movement Feature Spaces Across Users
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 
Automatic Classification Satellite images for weather Monitoring
Automatic Classification Satellite images for weather MonitoringAutomatic Classification Satellite images for weather Monitoring
Automatic Classification Satellite images for weather Monitoring
 
Presnt3
Presnt3Presnt3
Presnt3
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Ch3
Ch3Ch3
Ch3
 
Kccsi 2012 a real-time robust object tracking-v2
Kccsi 2012   a real-time robust object tracking-v2Kccsi 2012   a real-time robust object tracking-v2
Kccsi 2012 a real-time robust object tracking-v2
 
B04 07 0614
B04 07 0614B04 07 0614
B04 07 0614
 
Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4Object tracking by dtcwt feature vectors 2-3-4
Object tracking by dtcwt feature vectors 2-3-4
 
Nanotechnology
NanotechnologyNanotechnology
Nanotechnology
 

Similar to Isvc08

[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...Seiya Ito
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...Kitsukawa Yuki
 
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...Tomohiro Fukuda
 
Class[4][19th jun] [three js-camera&amp;light]
Class[4][19th jun] [three js-camera&amp;light]Class[4][19th jun] [three js-camera&amp;light]
Class[4][19th jun] [three js-camera&amp;light]Saajid Akram
 
Intelligent Auto Horn System Using Artificial Intelligence
Intelligent Auto Horn System Using Artificial IntelligenceIntelligent Auto Horn System Using Artificial Intelligence
Intelligent Auto Horn System Using Artificial IntelligenceIRJET Journal
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingYu Huang
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTIRJET Journal
 
Dance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformDance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformIRJET Journal
 
Cj31365368
Cj31365368Cj31365368
Cj31365368IJMER
 
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an ObjectAnkur Tyagi
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171Yaxin Liu
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalTaeKang Woo
 
ANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationAnish Patel
 

Similar to Isvc08 (20)

[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
 
Video to Video Translation CGAN
Video to Video Translation CGANVideo to Video Translation CGAN
Video to Video Translation CGAN
 
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
 
I0343065072
I0343065072I0343065072
I0343065072
 
Log polar coordinates
Log polar coordinatesLog polar coordinates
Log polar coordinates
 
Oc2423022305
Oc2423022305Oc2423022305
Oc2423022305
 
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
Visual Environment by Semantic Segmentation Using Deep Learning: A Prototype ...
 
Class[4][19th jun] [three js-camera&amp;light]
Class[4][19th jun] [three js-camera&amp;light]Class[4][19th jun] [three js-camera&amp;light]
Class[4][19th jun] [three js-camera&amp;light]
 
Intelligent Auto Horn System Using Artificial Intelligence
Intelligent Auto Horn System Using Artificial IntelligenceIntelligent Auto Horn System Using Artificial Intelligence
Intelligent Auto Horn System Using Artificial Intelligence
 
iwvp11-vivet
iwvp11-vivetiwvp11-vivet
iwvp11-vivet
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFT
 
Dance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformDance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platform
 
Cj31365368
Cj31365368Cj31365368
Cj31365368
 
998-isvc16
998-isvc16998-isvc16
998-isvc16
 
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
 
DICTA 2017 poster
DICTA 2017 posterDICTA 2017 poster
DICTA 2017 poster
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normal
 
ANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentationANISH_and_DR.DANIEL_augmented_reality_presentation
ANISH_and_DR.DANIEL_augmented_reality_presentation
 

Recently uploaded

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 

Recently uploaded (20)

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

Isvc08

  • 1. Stitching Video from Webcams Mai Zheng, Xiaolin Chen, and Li Guo Department of Electronic Science and Technology, University of Science and Technology of China {zhengmai,myhgy}@mail.ustc.edu.cn, lguo@ustc.edu.cn Abstract. This paper presents a technique to create wide field-of-view from common webcams. Our system consists of two stages: the initialization stage and the real-time stage. In the first stage, we detect robust features in the initial frame of each webcam and find the corresponding points between them. Then the matched point pairs are employed to compute the perspective matrix which describes the geometric relationship of the adjacent views. After the initializa- tion stage, we register the frame sequences of different webcams on the same plane using the perspective matrix and synthesize the overlapped region using a nonlinear blending method in real time. In this way, the narrow fields of each webcam are displayed together as one wide scene. We demonstrate the effec- tiveness of our method on a prototype that consists of two ordinary webcams and show that this is an interesting and inexpensive way to experience the wide- angle observation. 1 Introduction As we explore a scene, we turn our eyes and head around, capture information in a wide field-of-view, and then get a comprehensive view. Similarly, a panoramic pic- ture or video can always provide much more information as well as richer experience than a single narrow representation. These advantages, together with various applica- tion prospects, such as teleconferencing and virtual reality, have motivated many researchers to develop techniques for creating a panorama. Typically, the term “panorama” refers to single-viewpoint panoramic image [1], which can be created by rotating a camera around its optical center. Another main type of panorama is called the strip panorama [2][3], which is created from a translat- ing camera. But no matter which technical variants they use, creating a panorama starts with static images and requires that all of the frames to be stitched be prepared and organized as an image sequence before mosaicing. For a static mosaic, there is no time constraint on stitching all of the images into one. In this paper, we propose a novel method to create panoramic video from web- cams. Different from previous video mosaics[4] which move one camera to record a continuous image sequence and then create a static panoramic image, we capture two- pass videos and stitch each pair of frames from the different videos in real-time. In other words, our panorama is a wide video displayed in real-time instead of a static panoramic picture. G. Bebis et al. (Eds.): ISVC 2008, Part II, LNCS 5359, pp. 420–429, 2008. © Springer-Verlag Berlin Heidelberg 2008
  • 2. Stitching Video from Webcams 421 2 Related Work A tremendous amount of progress has been made in static image mosaicing. For example, strip panorama techniques [2][3] capture the horizontal outdoor scenes continuously and then stitch them into a long panoramic picture, which can be used for digital tourism and the like. Many techniques such as plane-sweep [5] and multi-view projection [6] have been developed for removing ghosting and blurring artifacts. As for panoramic video, however, the technology is still not mature. One of the main troubles is the real-time requirement. The common frame rate is 25~ 30 FPS, so if we want to create video panorama, we need to create each panoramic frames within at most 0.04 seconds, which means that the stitching algorithms for static image mosaicing cannot be applied to stitch real-time frames directly. And due to the time-consuming computation involved, existing methods for improving static panorama can hardly be applied for stitching videos. To skirt these troubles, some researchers resort to hardware. For example, a carefully designed camera cluster which guarantees an approximate common virtual COP (center of projection) [7] can easily register the inputs and avoid parallax to some extent. But from another perspective, this kind of algorithm is undesirable because it relies heavily on the capturing device. Our approach does not need special hardware. Instead, it makes use of the or- dinary webcams, which means that the system is inexpensive and easily applica- ble. Besides this, the positions and directions of the webcams are flexible as long as they have some overlapped field-of-view. We design a two-stage solution to tackle this challenging situation. The whole system will be discussed in section 3 and the implementation of each stage will be discussed in section 4 and 5 in detail. 3 System Framework As is shown in Fig. 1, the inputs of our system are independent frame sequences from two common webcams and the output is the stitching video. To achieve real time, we separate the processing into two stages. The first one, called the initialization stage, only needs to be run once after the webcams are fixed. This stage includes several time-consuming procedures which are responsible for calculating the geometric rela- tionship between the adjacent webcams. We firstly detect robust features in the initial frame of each webcams and then match them between the adjacent views. The correct matches are then employed to estimate the perspective matrix. The next stage runs in real time. In this stage, we make use of the matrix from the first stage to register the frames of different webcams on the same plane and blend the overlapped region using a nonlinear weight mask. The implementation of the two stages will be discussed later in detail.
  • 3. 422 M. Zheng, X. Chen, and L. Guo Real-time Stage Frames Projection Frames of Initialized Y & of Narrow Blending Wide View ? View Display Webcams N Feature Detection Projective Matrix Feature Matching RANSAC Initialization Fig. 1. Framework of our system. The initialization stage estimates the geometric relationship between the webcams based on the initial frames. The real-time stage registers and blends the frame sequences in real time. 4 Initialization Stage Since the location and orientation of the webcams are flexible, the geometric relation- ship between the adjacent views is unknown before registration. We choose one web- cam as a base and use the full planar perspective motion model [8] to register the other view on the same plane. The planar perspective transform warps an image into another using 8 parameters: ⎡x '⎤ ⎛ h11 h12 h13 ⎞ ⎡ x ⎤ ⎢ y '⎥ = u ' ~ Hu = ⎜ h h ⎟ h23 ⎟ ⎢ y ⎥ . ⎢ ⎥ ⎜ 21 22 ⎢ ⎥ (1) ⎢1 ⎥ ⎜ ⎟⎢ ⎥ ⎣ ⎦ ⎝ h31 h32 1 ⎠ ⎣1 ⎦ where u = ( x, y,1)T and u ' = ( x ', y ',1)T are homogeneous coordinates of the two views, and ~ indicates equality up to scale since H is itself homogeneous. The per- spective transform is a superset of translation, rigid, and similarity well as affine transforms. We seek to compute an optimized matrix H between the views so that they can be aligned well in the same plane. To recover the 8 parameters, we firstly extract keypoints in each input frame, and then match them between the adjacent two. Many classic detectors such as Canny [9] and Harris [10] can be employed to extract interesting points. However, they are not robust enough for matching in our case, which involves rotation and some perspective relation between the adjacent views. In this paper, we calculate the SIFT features [11] [12], which were originally used in object recognition.
  • 4. Stitching Video from Webcams 423 Simply put, there are 4 extraction steps. In the first step, we filter the frame with a Gaussian kernel: L( x, y,σ) =G( x, y,σ) ⋅ I ( x, y) . (2) (x +y ) 2 2 I ( x, y ) is the initial frame and G ( x, y,σ ) = 1 − where ⋅e 2σ 2 . Then we con- 2πσ 2 struct a DoG (Difference of Gaussian) space as follows: D ( x, y, σ ) = L ( x, y, kσ ) − L ( x, y, σ ) . (3) where k is the scaling factor. The extrema in the DoG space are taken as keypoints. In the second step, we calculate the accurate localization of the keypoints through Taylor expansion of the DoG function: ∂DT 1 ∂2D D( v) = D + v + vT v. (4) ∂v 2 ∂v 2 where v = ( x, y , σ ) . From formula (4), we get the sub-pixel and sub-scale coordi- T nates as follows: ^ ∂ 2 D −1 ∂DT v=− . . (5) ∂v 2 ∂v ^ A threshold is added on the D ( v ) value to discard unstable points. Also, we make use of the Hessian matrix to eliminate edge responses: Tr ( M Hes ) ( r + 1) 2 < . (6) Det ( M Hes ) r ⎛ Dxx Dxy ⎞ where M Hes = ⎜ ⎟ is the Hessian matrix, and Tr ( M Hes ) and Det ( M Hes ) are ⎝ Dxy Dyy ⎠ the trace and determinant of M Hes . r is an experience threshold and we set r =10 in this study. In the third step, the gradient orientations and magnitudes of the sample pixels within a Gaussian window are employed to calculate a histogram to assign the key- point with orientation. And finally, a 128D descriptor of every keypoint is obtained by concatenating the orientation histograms over a 16 × 16 region. By comparing the Euclidean distances of the descriptors, we get an initial set of corresponding keypoints (Fig. 2 (a)). The feature descriptors are invariant to transla- tion and rotation as well as scaling. However, they are only partially affine-invariant, so the initial matched pairs often contain outliers in our cases. We prune the outliers
  • 5. 424 M. Zheng, X. Chen, and L. Guo by fitting the candidate correspondences into a perspective motion model based on RANSAC [13] iteration. Specifically, we randomly choose 4 pairs of matched points in each iteration and calculate an initial projective matrix, then use the formula below to find out whether the matrix is suitable for other points: ⎛ xn ⎞ ' ⎛ xn ⎞ ⎜ '⎟ ⎜ ⎟ ⎜ yn ⎟ − H ⋅ ⎜ yn ⎟ < θ . (7) ⎜1⎟ ⎜1⎟ ⎝ ⎠ ⎝ ⎠ Here H is the initial projective matrix andθ is the threshold of outliers. In order to get a better toleration of parallax, a loose threshold of inliers is used. The matrix con- sistent with most initial matched pairs is considered as the best initial matrix, and the pairs fitting in it are considered as correct matches (Fig. 2 (b)). (a) (b) Fig. 2. (a) Two frames with large misregistration and initial matched features between them. Note that there are mismatched pairs besides the correct ones (b) Correct matches after RANSAC filtering After purifying the matched pairs, the ideal perspective matrix H is estimated us- ing use a least squares method. In detail, we construct the error function below and minimize the sum of the squared distances between the coordinates of the correspond- ing features: N 2 N 2 Ferror = ∑ Hu warp,n − u base, n = ∑ u warp ',n − u base,n (8) n =1 n =1 where ubase,n is the homogeneous coordinate of the nth feature in the image to be projected to, and uwarp,n is the correspondence of ubase,n in another view.
  • 6. Stitching Video from Webcams 425 5 Real-Time Stage After obtaining the perspective matrix between the adjacent webcams, we project the frames of one webcam onto another and blend them in real-time. Since the webcams are placed relatively freely, they may not have a common center of projection and thus are likely to result in parallax. In other words, the frames of different webcams cannot be registered strictly. Therefore, we designed a nonlinear blending strategy to minimize the ghosting and blurring of the overlapped region. Essentially, this is a kind of alpha-blending. The synthesized frames Fsyn can be presented as follows: Fsyn ( x, y ) = α ( x, y ) ∗ Fbase + (1 − α ( x, y ) ) * Fproj (9) where Fbase are the frames of the base webcam, Fproj are the frames projected from the adjacent webcam, and α ( x, y ) is the weight on pixel ( x, y ) . In the conventional blending method, the weight of pixels is a linear function in proportion to the distance to the image boundaries. This method treats the different views equally and performs well in normal cases. However, in the case of severe parallax, the linear combination will result in blurring and ghosting in the whole over- lapped region, as is the case in Fig 3(b), so we use a special α function to give prior- ity to one pass to avoid the conflict in the overlapped region of two webcams. Simply put, we construct a nonlinear α mask as below: ⎧1,if min(x, y, x -W , y - H )>T ⎪ α ( x, y ) = ⎨sin(π ⋅ (min( x, y, x −W , y − H ) / T − 0.5) +1 (10) ⎪ , otherwise ⎩ 2 where W and H are the width and height of the frame, and T is the width of the nonlinear decreased border. The mask is registered with the frames and clipped ac- cording to the region to be blended. The α value remains the same in the central part of the base frame, and begins to drop sharply when it comes close enough to the boundaries of another layer. The gradual change can be controlled by T . The transi- tion of different frames is smoother and more natural if T is larger, but the clear cen- tral region is also smaller, and vice versa. We refer to this method as nonlinear mask blending. By this nonlinear synthesis, we keep a balance between the smooth transi- tion of boundaries and the uniqueness and clarity of the interiors. (a) A typical pair of scene (b) Linear Blending (c) Our blending with strong parallax Fig. 3. Comparison between linear blending and our blending strategy on typical scenes with severe parallax
  • 7. 426 M. Zheng, X. Chen, and L. Guo 6 Results In this section, we show the results of our method on different scenes. We built a prototype with two common webcams, as is shown in Fig, 4. The webcams are placed together on a simple support, and the lenses are flexible and can be rotated and di- rected to different orientations freely. Each webcam has a resolution of QVGA (320 × 240 pixels) with a frame rate of 30 FPS. Fig. 4. Two common webcams fixed on a simple support. The lenses are flexible and can be rotated and adjusted to different orientations freely. Table 1. Processing time of the main procedures of the system Stage Procedure Time (second) Feature detection 0.320 ~ 0.450 Feature matching 0.040 ~ 0.050 Initialization RANSAC filtering 0.000 ~ 0.015 Matrix computation 0.000 ~ 0.001 Real time Projection and Blending 0.000 ~ 0.020 The processing time of the main procedures is listed in Table 1. The system runs on a PC with E4500 2.2GHz CPU and 2GB memory. The initialization stage usually takes about 0.7 ~ 1 second, according to the content of the scene. The projection and blending usually takes less than 0.02 seconds for a pair of frames, thus can run in real- time. Note that whenever the webcams are changed, there should be an initialization stage to re-compute the geometric relationship between the webcams. Currently, this re-initialization is started by user. After the initialization, the system can process the video at a rate of 30FPS. In our system, the positions and directions of the webcams are adjustable as long as they have some overlapped field-of view. Typically, the overlapped region should be 20% of the original view or above, otherwise there may not be enough robust features to match between the webcams. Fig. 5 shows the stitching result of some typical frames. In these cases, the webcams are intentionally rotated to a certain angle or even turned upside down. As can be seen in the figures, the system can still register and blend the frames into a natural whole scene. Fig. 6 shows some typical stitching scenes from a real-time video. In (a), the two static indoor views are stitched into a wide view. In (b) and (c), some moving objects show up in the scene, either far away or close to the lens. As illustrated in the figures, the stitching views are as clear and natural as the original narrow view.
  • 8. Stitching Video from Webcams 427 (a)A pair of frames with 150 rotation and the stitching result (b) A pair of frames with 90 0 rotation and the stitching result (c) A pair of frames with 1800 rotation and the stitching result Fig. 5. Stitching frames of some typical scenes. The webcams are intentionally rotated to a certain angle or turned upside down. (a) A static scene (b) A far away object moving in the scene (c) A close object moving in the scene Fig. 6. Stitching results from a real-time video. Moving objects in the stitching scene are as clear as in the original narrow view.
  • 9. 428 M. Zheng, X. Chen, and L. Guo Although our system is flexible and robust enough in normal conditions, the qual- ity of the mosaicing video does drop severely in the following two cases: firstly, when the scene lacks salient features, as is the case of a white wall, then the geometric rela- tionship of the webcams cannot be estimated correctly; secondly, when the parallax is too strong, there may be noticeable traces of stitching in the frame border. These problems can be avoided by targeting the lens at some salient scenes and adjusting the orientation of the webcams. 7 Conclusions and Future Work In this paper, we have presented a technique for stitching videos from webcams. The system receives the frame sequences from common webcams and outputs a synthe- sized video with a wide field-of-view in real time. The positions and directions of the webcams are flexible as long as they have some overlapped field-of-view. There are two stages in the system. The initialization stage calculates the geometric relationship of frames from adjacent webcams. A nonlinear mask blending method which avoids the ghosting and blurring in the main part of the overlapped region is proposed for synthesizing the frames in real time. As illustrated by experimental result, this is an effective and inexpensive way to construct video with a wide field-of-view. Currently, we have only focused on using two webcams. As a natural extension of our work, we would like to boost up to more webcams. We also plan to explore the hard and interesting issues of how to eliminate the exposure differences between webcams in real time and solve the problems mentioned at the end of last section. Acknowledgment The financial support provided by National Natural Science Foundation of China (Project ID: 60772032) and Microsoft (China) Co., Ltd. are gratefully acknowledged. References 1. Szeliski, R., Shum, H.Y.: Creating Full View Panoramic Mosaics and Environment Maps. In: Proc. of SIGGRAPH 1997, Computer Graphics Proceedings. Annual Conference Se- ries, pp. 251–258 (1997) 2. Agarwala, A., Agrawala, M., Chen, M., Salesin, D., Szeliski, R.: Photographing Long Scenes with Multi-Viewpoint Panoramas. In: Proc. of SIGGRAPH, pp. 853–861 (2006) 3. Zheng, J.Y.: Digital route panoramas. IEEE MultiMedia 10(3), 57–67 (2003) 4. Hsu, C.-T., Cheng, T.-H., Beukers, R.A., Horng, J.-K.: Feature-based Video Mosaic. Im- age Processing, 887–890 (2000) 5. Kang, S.B., Szeliski, R., Uyttendaele, M.: Seamless Stitching Using Multi-Perspective Plane Sweep. Microsoft Research, Tech. Rep. MSR-TR-2004-48 (2004) 6. Zelnik-Manor, L., Peters, G., Perona, P.: Squaring the Circle in Panoramas. In: Proc. 10th IEEE Conf. on Computer Vision (ICCV 2005), pp. 1292–1299 (2005)
  • 10. Stitching Video from Webcams 429 7. Majumder, A., Gopi, M., Seales, W.B., Fuchs, H.: Immersive Ieleconferencing: A New Algorithm to Generate Seamless Panoramic Imagery. In: Proc. of ACM Multimedia, pp. 169–178 (1999) 8. Szeliski, R.: Video Mosaics for Virtual Environments. IEEE Computer Graphics and Ap- plications, 22–23 (1996) 9. Canny, J.: A Computational Approach to Edge Detection. IEEE Trans Pattern Analysis and Machine Intelligence 8, 679–698 (1986) 10. Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: Proc. of the 4th Alvey Vision Conference, pp. 147–151 (1988) 11. Lowe, D.G.: Distinctive Image Features From Scale-invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004) 12. Winder, S., Brown, M.: Learning Local Image Descriptors. In: Proc. of the International Conference on Computer Vision and Pattern Recognition (CVPR 2007), pp. 1–8 (2007) 13. Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice-Hall, Engle- wood Cliffs (2003)