SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
EFFICIENT STEREO VIDEO ENCODING FOR MOBILE APPLICATIONS
                   USING THE 3D+F CODEC
                                Fabio Maninchedda1,2 , Marc Pollefeys1,2 , Alain Fogel2
                    1
                        Computer Science Department, ETH Z¨ rich, 8092 Z¨ rich, Switzerland
                                                            u            u
                                2
                                  Nomad3D, Z¨ rich, Switzerland and Nice, France
                                             u

                           ABSTRACT                                              2. OVERVIEW OF THE 3D+F PIPELINE

This paper presents a stereo video codec called 3D+F which has         The encoding of a stereo movie using the proposed approach in-
specifically been designed to meet the low complexity require-          volves the following steps (see Figure 1). In an optional prepro-
ments imposed by mobile applications while trying to be compet-        cessing step the input movie is rectified and color corrected. Rec-
itive with the state-of-the-art coders, such as the multi view video   tification is required as it is assumed that corresponding pixels lie
coding (MVC) extension of the H.264/AVC standard, in terms of          on the same scanline. Color equalization is recommended as it im-
rate-distortion (RD). By exploiting the stereo geometry of stereo-     proves the disparity compensated prediction. Then, given the left
scopic image pairs it is possible to transcode a 3D movie into a 2D    and right input images L1 and R1 a disparity estimation step is
movie and a 2-bit labeling. The 2D movie can then be compressed        performed to compute a disparity map D. From the input images
using any video coder while the labeling needs to be compressed        and the disparity map one can compute the cyclopean view C and
losslessly. Preliminary subjective testing shows that the resulting    the corresponding 2-bit visibility labeling V . The cyclopean view
quality has potential to be very competitive.                          is a 2D image and can therefore be encoded using any video coder
                                                                       while the visibility labeling needs to be encoded using a lossless
                                                                       entropy coder. Finally the cyclopean video and encoded labeling
    Index Terms — Stereo image processing, Stereoscopic rep-           are multiplexed into an output stream which is denoted by S. The
resentation, Mobile computing
                                                                                           In              Algorithm                Out
                        1. INTRODUCTION                                                    L0       Geometric          Color        L1
                                                                         Preprocessing
                                                                                           R0      Rectification      Correction     R1
As stereoscopic 3D is becoming popular on mobile devices the
                                                                           Disparity       L1                 Disparity
need for new efficient technologies for the 3D video processing                                                                       D
                                                                          Estimation       R1                Estimation
chain are required. This is mainly due to the fact that for mobile
applications the constraint of limited power has to be considered.        Cyclopean                          Cyclopean               C
                                                                                           D
In particular the decoding should be of low complexity to allow           Generation                         Algorithm               V
the playback of longer movies without draining the battery. The
                                                                                           C        Video Codec           Multi-
transition from 2D to stereoscopic 3D leads to a doubling of the           Encoder                                                   S
                                                                                           V       Entropy Codec          plexer
amount of data to be processed. Efforts towards improved coding
efficiency of 3D video have been made in the MVC extension of
the H.264/AVC standard [1, 2]. However this efficiency comes                               Figure 1: Encoding pipeline.
at the cost of a high complexity. To efficiently code multi-view
videos it is necessary to additionally exploit the inter-view depen-   decoder works as follows (see Figure 2). The output stream S
dencies next to the temporal ones used in traditional video cod-       is demultiplexed and decoded to obtain the cyclopean frames C
ing. For this purpose in MVC the prediction may choose among           and the corresponding labeling V . Due to the lossy video coding
temporal and inter-view prediction and select the predictor which      C = C. Then the cyclopean frames are decoded to obtain the
leads to the best coding efficiency on a block basis. Our approach      decoded left and right views L1 and R1 . Note that the transition
is more ambitious and tries to remove the inter-view dependencies      to a cyclopean representation and back is also a lossy process.
completely by understanding the scene geometry to detect shared
regions in the image pairs. The necessary knowledge is acquired                            In              Algorithm                 Out
by performing a disparity estimation step for the stereo video. The                               Demulti-         Video Codec       C
disparity information is then used to go from the image pair to a          Decoder         S                      Entropy Codec
                                                                                                   plexer                            V
so called cyclopean representation. The cyclopean representation
is composed of a cyclopean view, which is an image that contains           3D+F           C                     3D+F                 L1
the shared regions along with the occluded regions, and a visibility      Decoder         V                    Decoder               R1
labeling that for each pixel in the cyclopean tells in which view it
is visible. Given the cyclopean view and the labeling it is possible
to reconstruct the original frames. Instead of having to encode a                         Figure 2: Decoding pipeline.
stereoscopic video one has to encode a 2D video and a labeling.
3. PREPROCESSING                                                  4. DISPARITY ESTIMATION

The preprocessing steps are required for non properly calibrated       The problem of estimating disparities has been studied extensively
movies and can be skipped whenever the data to be coded meets          in the past [5]. In our application scenario the goal is slightly
all the assumptions made in the 3D+F algorithm. Geometric rec-         different, in particular, the goal is to recover disparity informa-
tification is required to obtain competitive results due to the as-     tion which leads to the best possible prediction of the right view
sumption that corresponding pixels lie on the same scanline. Color     from the left one while minimizing the size of the cyclopean. The
correction should be applied whenever the mismatch of the pixel        two goals are partially in conflict because in general it is not true
intensities is significant as it will negatively affect the PSNR of     that the image of corresponding pixels represents the best possi-
the encoded sequences. In the remainder of this section the geo-       ble match in terms of color differences. This is due to variations
metric rectification and color equalization procedures are briefly       in gain and bias and effects such as different sampling, specular
explained.                                                             highlights and others. In general, a piecewise constant disparity
                                                                       map will lead to a smaller cyclopean representation, which again
3.1. Geometric Rectification                                            implies that one cannot simply choose the disparity assignment
                                                                       which locally minimizes the color difference of the corresponding
The geometric rectification is performed using the method pre-          pixels because that would lead to a quite random result. The prob-
sented in [3]. Instead of rectifying each frame independently it is    lem of estimating such a disparity map is less ambiguous than the
assumed that the intrinsic camera parameters are constant during       traditional stereo goal. Indeed, untextured regions, which are dif-
each scene. This model can be extended to handle scenes with           ficult to match in the traditional stereo matching problem are triv-
time-varying camera intrinsic parameters which are currently not       ially handled in our case, since most disparity assignments will
handled. A possible way to handle such sequences would be to           provide a result that fulfills our goals. Even though enforcing
rectify each frame independently while enforcing some tempo-           more smoothness can reduce the size of the cyclopean, it is es-
ral consistency to avoid inconsistencies among successive frames.      sentially dominated by the amount of occlusions in the stereo pair.
Since the method has not been specifically developed for data that      Currently the disparity estimation is performed using the method
is used for stereoscopic viewing purposes possible improvements        presented in [6]. In the remainder of this section an extension to
include the minimization of an error function which leads to the       improve the temporal consistency is presented. This aspect has
least possible amount of visual distortion. For almost rectified        mostly not been taken into account by currently available dispar-
stereo pairs, however, it is expected that the introduced distortion   ity estimation methods. The presented approach follows the lines
is small. To obtain optimal results one should also perform a radial   of those methods which use the motion between adjacent frames
distortion estimation.                                                 to impose some additional constraints in the stereo matching pro-
                                                                       cess [7]. Assume that for two consecutive left views the motion
3.2. Color Equalization                                                field mt that maps points at time t to corresponding points at time
                                                                       t + 1 is known, in formulas pt+1 = pt + mt . Such a motion
                                                                                                                        p
First correspondences are computed using stereo matching (see
                                                                       field can be computed using optical flow methods [8]. Then, to
Section 4). Then the brightness transfer function (BTF) is com-
                                                                       enforce some temporal continuity for the disparities of the cor-
puted from the joint histogram using a robust maximum likelihood
                                                                       responding pixels in time Dt (pt ) should not differ substantially
estimation for each color channel independently [4]. The BTF can
                                                                       from Dt+1 (pt+1 ) whenever assigning the same disparity appears
be used to map the intensities of one image to the other. Example
                                                                       to be appropriate. In essence, if mt ≈ 0, which is the condition
                                                                                                            p
histograms before and after color correction using the proposed
                                                                       for which the point pt is stationary, the assignment of a different
method are shown in Figure 3. This method does not account for
                                                                       disparity should be penalized. However, if there is substantial mo-
inter channel dependencies as every color channel is treated inde-
                                                                       tion the disparity could change by a few values and the penalty
pendently. More involved approaches that model those dependen-
                                                                       should therefore be much lower. To achieve this goal the energy
cies may lead to improved results.
                                                                       functional that is minimized in the stereo matching is extended by
                                                                       a consistency term of the form
×106                     ×106                  ×106
   8                        8                     8
    6                       6                     6                                                 |Dt+1 (pt+1 ) − Dt (pt )|
                                                                               PT · 1 − exp −                                          (1)
    4                       4                     4
                                                                                                           γ · mt p

    2                       2                     2                    where PT is a user-defined penalty term and γ influences the
    0                       0                     0                    strength of the motion magnitude on the penalty. In particular,
        0   10      20          0   10    20          0   10    20
×106                     ×106                  ×106                    the larger γ the smaller the penalty also for low motion. The
   8                        8                     8
                                                                       penalty can never exceed PT , such that if the motion estimation
    6                       6                     6                    fails the stereo optimization has still some freedom to choose a
    4                       4                     4                    better match. Furthermore, it is known that optical flow methods
    2                       2                     2                    tend to become inaccurate for large motion. Therefore, the adap-
                                                                       tation of the cost function to the motion magnitude seems to be
    0                       0                     0
        0   10      20          0   10    20          0   10    20     a reasonable choice since it can handle this type of inaccuracies
                                                                       properly. Nonetheless, the choice of reasonable values for param-
Figure 3: Error histograms before and after color correction for R,
                                                                       eters PT and γ is crucial. Indeed, if too strong constraints are
G and B color channels.
                                                                       enforced the disparity map is essentially warped according to the
                                                                       motion field.
(a) Left                 (b) Right               (c) Disparity                (d) Cyclopean                 (e) Labeling

Figure 4: Illustration of the cyclopean (d) and corresponding labeling (e) for input stereo pair (a,b) and disparity (c). Label VB is black,
VL is dark gray and VR is light gray.


               5. CYCLOPEAN GENERATION                                    Due to the fact that the cyclopean movie is just slightly larger than
                                                                          one of the original views (10-20%), the encoding complexity is
The fundamental insight is that both views share a considerable           much cheaper than the one for encoding the full 3D movie.
amount of information. Indeed, most parts of the image represent
the same scene points. Scene points that are not visible from one         6.2. Decoding Complexity
viewpoint while being visible from the other are called occluded
pixels. In the following discussion it is assumed, without loss of        The decoding of the cyclopean movie is much cheaper due to the
generality, that the left view is the reference view. The cyclopean       small size increase with respect to a single view. Furthermore, the
view consists of an image which contains all pixels of the left view      complexity of the 3D+F decoder is linear in the size of the cyclo-
along with its occluded pixels, plus a labeling which stores the vis-     pean. Therefore, the whole decoding process is very efficient.
ibility of each pixel. It is constructed row by row and the occluded
pixels are inserted at the position where the occlusion occurs. The                         7. EXPERIMENTAL RESULTS
labeling can assume one of the three values V ∈ {VB , VL , VR },
where VB denotes that the pixel is visible in both views, VL de-          All coding experiments are carried out with the H.264/AVC Ref-
notes that it is visible in the left view only and VR denotes that        erence Software JM 18.2. The encoding parameters are reported
it is visible in the right view only. In Figure 4 it is possible to       in Table 1. The test movies used for the evaluation are part of
see a stereo pair and the corresponding cyclopean representation.
In the cyclopean representation some of the spatial and temporal               Parameter                       Value MVC           Value 3D+F
consistency present in the original imagery is lost lost due to the            Profile                       Stereo High profile High profile
misalignment of adjacent scanlines. To recover, at least partially,            Temporal prediction               hierarchical bi-predictive
the spatial and temporal consistency it is possible to perform an              Number views                          2                   1
alignment step. Please note that the conversion to a cyclopean                 Intra period / GOP Size              16                   16
representation and back is a lossy process. Indeed, correspond-                Symbol mode                               CABAC
ing pixels are stored only once even though they might be slightly             Search range/mode                    32              EPZ search
different due to color mismatches and different sampling.                      RD optimization                              on
                                                                               Subpel ME                                    on
                6. ENCODER AND DECODER                                         I/P/B modes                                  on

The cyclopean stream is a 2D movie and can therefore be encoded                              Table 1: Encoder configurations.
using any video coder. On the other hand the labeling needs to
be encoded losslessly and for this purpose a CABAC [1] style en-          the MPEG 3DVC activity [9]. No preprocessing has been done as
coder has been used. The prediction of the current label at time t is     the sequences are properly rectified and the color mismatches are
based on the labels in a small neighborhood that have already been        minor. The RD plot for the sequences Kendo, Dancer and Poz-
encoded and the previously encoded label at the same position at          nan Street are shown in Figure 5. The plots include the bitrates
time t − 1. Decoding cyclopean frames is trivial as it requires a         required for encoding the visibility labeling which are reported in
simple lookup of the visibility labeling.                                 Table 2. Currently the labeling bitrate is independent of the bitrate
                                                                          at which the cyclopean movie is encoded. Therefore, the overhead
6.1. Encoding Complexity
The most expensive step in the encoding pipeline is the disparity
estimation with a complexity of O(|D|wh), where |D| denotes                         Sequence         Kendo       Dancer     Poznan Street
the disparity range and w and h the width and height of the views                   Bitrate [kbps]   1210         1655          1696
respectively. The motion estimation step used by video coders has
a higher complexity than the proposed disparity estimation step.                Table 2: Bitrate of visibility labeling for various sequences.
Kendo                                                         Dancer                                            Poznan Street
                        (1024x768)                                                   (1920x1088)                                          (1920x1088)

              43                                                              36                                                    40
              42                                                                                                                    39
                                                                              35
              41
Y-PSNR [dB]




                                                                Y-PSNR [dB]




                                                                                                                      Y-PSNR [dB]
                                                                                                                                    38
              40                                                              34
                                                                                                                                    37
              39                                                              33
                                                                                                                                    36
              38                                                              32
              37                               MVC                                                      MVC                         35                     MVC
                                               3D+F                           31                        3D+F                        34                     3D+F
              36
              35                              3D+F DC                         30                       3D+F DC                      33                    3D+F DC

                   0         1   2      3    4          5   6                      0 1 2 3 4 5 6 7 8 9 10                                0 1 2 3 4 5 6 7 8 9 10
                                 Bitrate [Mbps]                                           Bitrate [Mbps]                                        Bitrate [Mbps]

Figure 5: RD plots of MVC, 3D+F and 3D+F with PSNR evaluated on decoded cyclopean (DC) for the sequences Kendo, Dancer and
Poznan Street. Both the 3D+F and 3D+F DC curves include the bitrates required for encoding the labeling which are reported in Table 2.


at low bitrates is substantial. In the future support for RD opti-                                           8. CONCLUSION AND FUTURE WORK
mization between cyclopean movie and labeling will be explored.
The plots in Figure 5 also show the PSNR of the 3D+F algorithm                                     A novel low complexity stereoscopic 3D encoder has been pre-
when evaluated with respect to the decoded cyclopean (DC). Re-                                     sented. Preliminary results show that while inferior in terms of
call that the cyclopean conversion introduces some distortion and                                  PSNR the perceived visual quality is expected to be very compet-
therefore the 3D+F DC curve should give a hint at what visual                                      itive and comparable to that of current state-of-the-art encoders
quality one can expect assuming that the conversion to the cyclo-                                  such as MVC. Future work will explore ways of optimizing the
pean representation does not introduce any visual artifacts. Distor-                               RD of cyclopean movies and labeling jointly.
tions in the cyclopean conversion are mainly introduced by color
differences and sub-pixel shifts as only integer valued disparities                                                                 9. REFERENCES
are used. Most of this distortions are not visually significant but
affect negatively the PSNR. Even though the RD performance of                                      [1] ISO/IEC JTC1/SC29/WG11 and ITU-T SG16/Q.6, “Draft
3D+F is inferior to MVC according to PSNR the visual quality                                           ITU-T Recommendation H.264 and Draft ISO/IEC 14 496-
has potential to be very competitive. In Figure 6 the image quality                                    10 AVC,” Doc. JVT-G050, 2003.
at similar bitrates and similar distortions for the Poznan Street se-                              [2] Ying Chen, Ye-Kui Wang, Kemal Ugur, Miska M. Hannuk-
quence are shown. It is possible to see that the quality of MVC at                                     sela, Jani Lainema, and Moncef Gabbouj, “The emerging
a similar bitrate to 3D+F is just slightly better even though there is                                 mvc standard for 3d video services,” EURASIP Journal on
a difference of 1.95 dB in PSNR. At similar distortion the visual                                      Applied Signal Processing, 2008.
quality of MVC is clearly worse than 3D+F. This can be explained                                   [3] A. Fusiello and L. Irsara, “Quasi-Euclidean Uncalibrated
by the 3D+F DC curve which at 3567 kbps has similar distortion                                         Epipolar Rectification,” in International Conference on Pat-
values to MVC.                                                                                         tern Recognition, 2008.
                                                                                                   [4] Seon Joo Kim and M. Pollefeys, “Robust radiometric calibra-
                                                                                                       tion and vignetting correction,” Pattern Analysis and Machine
                                                                                                       Intelligence, 2008.
                                                                                                   [5] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation
                                                                                                       of Dense Two-Frame Stereo Correspondence Algorithms,”
                                                                                                       International Journal of Computer Vision, 2001.
                                                                                                   [6] M. Bleyer and M. Gelautz, “Simple but Effective Tree Struc-
                                                                                                       tures for Dynamic Programming-based Stereo Matching,” in
                                                                                                       International Conference on Computer Vision Theory and Ap-
                                                                                                       plications, 2008.
                                                                                                   [7] M. Dongbo, Y. Sehoon, and A. Vetro, “Temporally consis-
                                                                                                       tent stereo matching using coherence function,” in 3DTV-
                                                                                                       Conference, 2010.
                       (a)                        (b)                               (c)
                                                                                                   [8] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black,
Figure 6: Visual comparison between MVC and 3D+F at similar                                            and R. Szeliski, “A Database and Evaluation Methodology
bitrates or similar quality. Image (a) shows the coding results of                                     for Optical Flow,” in International Conference on Computer
MVC at 4925 kbps (38.89 dB), image (b) shows the coding re-                                            Vision, 2007.
sults of 3D+F at 5263 kbps (36.94 dB) while image (c) shows the                                    [9] ISO/IEC JTC1/SC29/WG11, “Call for Proposals on 3D
coding results of MVC at 1525 kbps (36.53 dB).                                                         Video Coding Technology,” Doc. N12036, 2011.

Weitere ähnliche Inhalte

Was ist angesagt?

Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayNAVER Engineering
 
A New Watermarking Algorithm Based on Image Scrambling and SVD in the Wavelet...
A New Watermarking Algorithm Based on Image Scrambling and SVD in the Wavelet...A New Watermarking Algorithm Based on Image Scrambling and SVD in the Wavelet...
A New Watermarking Algorithm Based on Image Scrambling and SVD in the Wavelet...IDES Editor
 
CS 354 Project 2 and Compression
CS 354 Project 2 and CompressionCS 354 Project 2 and Compression
CS 354 Project 2 and CompressionMark Kilgard
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingPractical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingMark Kilgard
 
CS 354 Shadows (cont'd) and Scene Graphs
CS 354 Shadows (cont'd) and Scene GraphsCS 354 Shadows (cont'd) and Scene Graphs
CS 354 Shadows (cont'd) and Scene GraphsMark Kilgard
 
CS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingCS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingMark Kilgard
 
CS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingCS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingMark Kilgard
 
Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...
Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...
Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...Masahiro Kanazaki
 
CS 354 Procedural Methods
CS 354 Procedural MethodsCS 354 Procedural Methods
CS 354 Procedural MethodsMark Kilgard
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksNAVER Engineering
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU ArchitectureMark Kilgard
 
non verbal handoff
non verbal handoffnon verbal handoff
non verbal handoffMohit Varma
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
Keynote - SPIE Stereoscopic Displays & Applications 2014
Keynote - SPIE Stereoscopic Displays & Applications 2014Keynote - SPIE Stereoscopic Displays & Applications 2014
Keynote - SPIE Stereoscopic Displays & Applications 2014Gordon Wetzstein
 
Evaluation of bandwidth performance for interactive spherical video
Evaluation of bandwidth performance for interactive spherical videoEvaluation of bandwidth performance for interactive spherical video
Evaluation of bandwidth performance for interactive spherical videoAlpen-Adria-Universität
 
CS 354 Interaction
CS 354 InteractionCS 354 Interaction
CS 354 InteractionMark Kilgard
 

Was ist angesagt? (20)

Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things away
 
G04654247
G04654247G04654247
G04654247
 
A New Watermarking Algorithm Based on Image Scrambling and SVD in the Wavelet...
A New Watermarking Algorithm Based on Image Scrambling and SVD in the Wavelet...A New Watermarking Algorithm Based on Image Scrambling and SVD in the Wavelet...
A New Watermarking Algorithm Based on Image Scrambling and SVD in the Wavelet...
 
H017524854
H017524854H017524854
H017524854
 
CS 354 Project 2 and Compression
CS 354 Project 2 and CompressionCS 354 Project 2 and Compression
CS 354 Project 2 and Compression
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingPractical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
 
CS 354 Shadows (cont'd) and Scene Graphs
CS 354 Shadows (cont'd) and Scene GraphsCS 354 Shadows (cont'd) and Scene Graphs
CS 354 Shadows (cont'd) and Scene Graphs
 
CS 354 Ray Casting & Tracing
CS 354 Ray Casting & TracingCS 354 Ray Casting & Tracing
CS 354 Ray Casting & Tracing
 
CS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingCS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path Rendering
 
Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...
Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...
Multidisciplinary Design Optimization of Supersonic Transport Wing Using Surr...
 
CS 354 Procedural Methods
CS 354 Procedural MethodsCS 354 Procedural Methods
CS 354 Procedural Methods
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU Architecture
 
non verbal handoff
non verbal handoffnon verbal handoff
non verbal handoff
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Keynote - SPIE Stereoscopic Displays & Applications 2014
Keynote - SPIE Stereoscopic Displays & Applications 2014Keynote - SPIE Stereoscopic Displays & Applications 2014
Keynote - SPIE Stereoscopic Displays & Applications 2014
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Evaluation of bandwidth performance for interactive spherical video
Evaluation of bandwidth performance for interactive spherical videoEvaluation of bandwidth performance for interactive spherical video
Evaluation of bandwidth performance for interactive spherical video
 
CS 354 Interaction
CS 354 InteractionCS 354 Interaction
CS 354 Interaction
 

Andere mochten auch

Relatorio prevencao quedas_velhice
Relatorio prevencao quedas_velhiceRelatorio prevencao quedas_velhice
Relatorio prevencao quedas_velhiceivone guedes borges
 
Bakker mbi 2001-39-40(2)
Bakker mbi 2001-39-40(2)Bakker mbi 2001-39-40(2)
Bakker mbi 2001-39-40(2)Docuzmc
 
Eye On COMESA Newsletter - October 2009
Eye On COMESA Newsletter - October 2009Eye On COMESA Newsletter - October 2009
Eye On COMESA Newsletter - October 2009AFRIKASOURCES
 
Sind Unternehmen die besseren Menschen?
Sind Unternehmen die besseren Menschen?Sind Unternehmen die besseren Menschen?
Sind Unternehmen die besseren Menschen?Coyo Zuma-Shōjin
 

Andere mochten auch (7)

Relatorio prevencao quedas_velhice
Relatorio prevencao quedas_velhiceRelatorio prevencao quedas_velhice
Relatorio prevencao quedas_velhice
 
Defense
DefenseDefense
Defense
 
Fitness
FitnessFitness
Fitness
 
Bakker mbi 2001-39-40(2)
Bakker mbi 2001-39-40(2)Bakker mbi 2001-39-40(2)
Bakker mbi 2001-39-40(2)
 
Eye On COMESA Newsletter - October 2009
Eye On COMESA Newsletter - October 2009Eye On COMESA Newsletter - October 2009
Eye On COMESA Newsletter - October 2009
 
CHARLES NJOROGE C.V_2
CHARLES NJOROGE C.V_2CHARLES NJOROGE C.V_2
CHARLES NJOROGE C.V_2
 
Sind Unternehmen die besseren Menschen?
Sind Unternehmen die besseren Menschen?Sind Unternehmen die besseren Menschen?
Sind Unternehmen die besseren Menschen?
 

Ähnlich wie EFFICIENT STEREO VIDEO ENCODING FOR MOBILE APPLICATIONS USING THE 3D+F CODEC

A Video Watermarking Scheme to Hinder Camcorder Piracy
A Video Watermarking Scheme to Hinder Camcorder PiracyA Video Watermarking Scheme to Hinder Camcorder Piracy
A Video Watermarking Scheme to Hinder Camcorder PiracyIOSR Journals
 
Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)Joel P
 
Aiar. unit v. machine vision 1462642546237
Aiar. unit v. machine vision 1462642546237Aiar. unit v. machine vision 1462642546237
Aiar. unit v. machine vision 1462642546237Kunal mane
 
Hardware Implementation of Genetic Algorithm Based Digital Colour Image Water...
Hardware Implementation of Genetic Algorithm Based Digital Colour Image Water...Hardware Implementation of Genetic Algorithm Based Digital Colour Image Water...
Hardware Implementation of Genetic Algorithm Based Digital Colour Image Water...IDES Editor
 
Digital Image Processing - Image Compression
Digital Image Processing - Image CompressionDigital Image Processing - Image Compression
Digital Image Processing - Image CompressionMathankumar S
 
ANALYSING JPEG CODING WITH MASKING
ANALYSING JPEG CODING WITH MASKINGANALYSING JPEG CODING WITH MASKING
ANALYSING JPEG CODING WITH MASKINGijma
 
Video Compression Basics
Video Compression BasicsVideo Compression Basics
Video Compression BasicsSanjiv Malik
 
Vediocompressed
VediocompressedVediocompressed
Vediocompressedtangbinsen
 
Paper id 25201490
Paper id 25201490Paper id 25201490
Paper id 25201490IJRAT
 
Low complexity video coding for sensor network
Low complexity video coding for sensor networkLow complexity video coding for sensor network
Low complexity video coding for sensor networkeSAT Journals
 
Low complexity video coding for sensor network
Low complexity video coding for sensor networkLow complexity video coding for sensor network
Low complexity video coding for sensor networkeSAT Publishing House
 
Direct satellite broadcast receiver using mpeg 2
Direct satellite broadcast receiver using mpeg 2Direct satellite broadcast receiver using mpeg 2
Direct satellite broadcast receiver using mpeg 2arpit shukla
 
Novel DCT based watermarking scheme for digital images
Novel DCT based watermarking scheme for digital imagesNovel DCT based watermarking scheme for digital images
Novel DCT based watermarking scheme for digital imagesIDES Editor
 
Performance Analysis of Digital Watermarking Of Video in the Spatial Domain
Performance Analysis of Digital Watermarking Of Video in the Spatial DomainPerformance Analysis of Digital Watermarking Of Video in the Spatial Domain
Performance Analysis of Digital Watermarking Of Video in the Spatial Domainpaperpublications3
 

Ähnlich wie EFFICIENT STEREO VIDEO ENCODING FOR MOBILE APPLICATIONS USING THE 3D+F CODEC (20)

A Video Watermarking Scheme to Hinder Camcorder Piracy
A Video Watermarking Scheme to Hinder Camcorder PiracyA Video Watermarking Scheme to Hinder Camcorder Piracy
A Video Watermarking Scheme to Hinder Camcorder Piracy
 
Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)
 
Aiar. unit v. machine vision 1462642546237
Aiar. unit v. machine vision 1462642546237Aiar. unit v. machine vision 1462642546237
Aiar. unit v. machine vision 1462642546237
 
Hardware Implementation of Genetic Algorithm Based Digital Colour Image Water...
Hardware Implementation of Genetic Algorithm Based Digital Colour Image Water...Hardware Implementation of Genetic Algorithm Based Digital Colour Image Water...
Hardware Implementation of Genetic Algorithm Based Digital Colour Image Water...
 
Digital Image Processing - Image Compression
Digital Image Processing - Image CompressionDigital Image Processing - Image Compression
Digital Image Processing - Image Compression
 
Z03301550160
Z03301550160Z03301550160
Z03301550160
 
ANALYSING JPEG CODING WITH MASKING
ANALYSING JPEG CODING WITH MASKINGANALYSING JPEG CODING WITH MASKING
ANALYSING JPEG CODING WITH MASKING
 
Video Compression Basics
Video Compression BasicsVideo Compression Basics
Video Compression Basics
 
Task 2
Task 2Task 2
Task 2
 
Vediocompressed
VediocompressedVediocompressed
Vediocompressed
 
Paper id 25201490
Paper id 25201490Paper id 25201490
Paper id 25201490
 
Low complexity video coding for sensor network
Low complexity video coding for sensor networkLow complexity video coding for sensor network
Low complexity video coding for sensor network
 
Low complexity video coding for sensor network
Low complexity video coding for sensor networkLow complexity video coding for sensor network
Low complexity video coding for sensor network
 
H0545156
H0545156H0545156
H0545156
 
Direct satellite broadcast receiver using mpeg 2
Direct satellite broadcast receiver using mpeg 2Direct satellite broadcast receiver using mpeg 2
Direct satellite broadcast receiver using mpeg 2
 
3D Workshop
3D Workshop3D Workshop
3D Workshop
 
A0540106
A0540106A0540106
A0540106
 
Novel DCT based watermarking scheme for digital images
Novel DCT based watermarking scheme for digital imagesNovel DCT based watermarking scheme for digital images
Novel DCT based watermarking scheme for digital images
 
Gd3111841188
Gd3111841188Gd3111841188
Gd3111841188
 
Performance Analysis of Digital Watermarking Of Video in the Spatial Domain
Performance Analysis of Digital Watermarking Of Video in the Spatial DomainPerformance Analysis of Digital Watermarking Of Video in the Spatial Domain
Performance Analysis of Digital Watermarking Of Video in the Spatial Domain
 

EFFICIENT STEREO VIDEO ENCODING FOR MOBILE APPLICATIONS USING THE 3D+F CODEC

  • 1. EFFICIENT STEREO VIDEO ENCODING FOR MOBILE APPLICATIONS USING THE 3D+F CODEC Fabio Maninchedda1,2 , Marc Pollefeys1,2 , Alain Fogel2 1 Computer Science Department, ETH Z¨ rich, 8092 Z¨ rich, Switzerland u u 2 Nomad3D, Z¨ rich, Switzerland and Nice, France u ABSTRACT 2. OVERVIEW OF THE 3D+F PIPELINE This paper presents a stereo video codec called 3D+F which has The encoding of a stereo movie using the proposed approach in- specifically been designed to meet the low complexity require- volves the following steps (see Figure 1). In an optional prepro- ments imposed by mobile applications while trying to be compet- cessing step the input movie is rectified and color corrected. Rec- itive with the state-of-the-art coders, such as the multi view video tification is required as it is assumed that corresponding pixels lie coding (MVC) extension of the H.264/AVC standard, in terms of on the same scanline. Color equalization is recommended as it im- rate-distortion (RD). By exploiting the stereo geometry of stereo- proves the disparity compensated prediction. Then, given the left scopic image pairs it is possible to transcode a 3D movie into a 2D and right input images L1 and R1 a disparity estimation step is movie and a 2-bit labeling. The 2D movie can then be compressed performed to compute a disparity map D. From the input images using any video coder while the labeling needs to be compressed and the disparity map one can compute the cyclopean view C and losslessly. Preliminary subjective testing shows that the resulting the corresponding 2-bit visibility labeling V . The cyclopean view quality has potential to be very competitive. is a 2D image and can therefore be encoded using any video coder while the visibility labeling needs to be encoded using a lossless entropy coder. Finally the cyclopean video and encoded labeling Index Terms — Stereo image processing, Stereoscopic rep- are multiplexed into an output stream which is denoted by S. The resentation, Mobile computing In Algorithm Out 1. INTRODUCTION L0 Geometric Color L1 Preprocessing R0 Rectification Correction R1 As stereoscopic 3D is becoming popular on mobile devices the Disparity L1 Disparity need for new efficient technologies for the 3D video processing D Estimation R1 Estimation chain are required. This is mainly due to the fact that for mobile applications the constraint of limited power has to be considered. Cyclopean Cyclopean C D In particular the decoding should be of low complexity to allow Generation Algorithm V the playback of longer movies without draining the battery. The C Video Codec Multi- transition from 2D to stereoscopic 3D leads to a doubling of the Encoder S V Entropy Codec plexer amount of data to be processed. Efforts towards improved coding efficiency of 3D video have been made in the MVC extension of the H.264/AVC standard [1, 2]. However this efficiency comes Figure 1: Encoding pipeline. at the cost of a high complexity. To efficiently code multi-view videos it is necessary to additionally exploit the inter-view depen- decoder works as follows (see Figure 2). The output stream S dencies next to the temporal ones used in traditional video cod- is demultiplexed and decoded to obtain the cyclopean frames C ing. For this purpose in MVC the prediction may choose among and the corresponding labeling V . Due to the lossy video coding temporal and inter-view prediction and select the predictor which C = C. Then the cyclopean frames are decoded to obtain the leads to the best coding efficiency on a block basis. Our approach decoded left and right views L1 and R1 . Note that the transition is more ambitious and tries to remove the inter-view dependencies to a cyclopean representation and back is also a lossy process. completely by understanding the scene geometry to detect shared regions in the image pairs. The necessary knowledge is acquired In Algorithm Out by performing a disparity estimation step for the stereo video. The Demulti- Video Codec C disparity information is then used to go from the image pair to a Decoder S Entropy Codec plexer V so called cyclopean representation. The cyclopean representation is composed of a cyclopean view, which is an image that contains 3D+F C 3D+F L1 the shared regions along with the occluded regions, and a visibility Decoder V Decoder R1 labeling that for each pixel in the cyclopean tells in which view it is visible. Given the cyclopean view and the labeling it is possible to reconstruct the original frames. Instead of having to encode a Figure 2: Decoding pipeline. stereoscopic video one has to encode a 2D video and a labeling.
  • 2. 3. PREPROCESSING 4. DISPARITY ESTIMATION The preprocessing steps are required for non properly calibrated The problem of estimating disparities has been studied extensively movies and can be skipped whenever the data to be coded meets in the past [5]. In our application scenario the goal is slightly all the assumptions made in the 3D+F algorithm. Geometric rec- different, in particular, the goal is to recover disparity informa- tification is required to obtain competitive results due to the as- tion which leads to the best possible prediction of the right view sumption that corresponding pixels lie on the same scanline. Color from the left one while minimizing the size of the cyclopean. The correction should be applied whenever the mismatch of the pixel two goals are partially in conflict because in general it is not true intensities is significant as it will negatively affect the PSNR of that the image of corresponding pixels represents the best possi- the encoded sequences. In the remainder of this section the geo- ble match in terms of color differences. This is due to variations metric rectification and color equalization procedures are briefly in gain and bias and effects such as different sampling, specular explained. highlights and others. In general, a piecewise constant disparity map will lead to a smaller cyclopean representation, which again 3.1. Geometric Rectification implies that one cannot simply choose the disparity assignment which locally minimizes the color difference of the corresponding The geometric rectification is performed using the method pre- pixels because that would lead to a quite random result. The prob- sented in [3]. Instead of rectifying each frame independently it is lem of estimating such a disparity map is less ambiguous than the assumed that the intrinsic camera parameters are constant during traditional stereo goal. Indeed, untextured regions, which are dif- each scene. This model can be extended to handle scenes with ficult to match in the traditional stereo matching problem are triv- time-varying camera intrinsic parameters which are currently not ially handled in our case, since most disparity assignments will handled. A possible way to handle such sequences would be to provide a result that fulfills our goals. Even though enforcing rectify each frame independently while enforcing some tempo- more smoothness can reduce the size of the cyclopean, it is es- ral consistency to avoid inconsistencies among successive frames. sentially dominated by the amount of occlusions in the stereo pair. Since the method has not been specifically developed for data that Currently the disparity estimation is performed using the method is used for stereoscopic viewing purposes possible improvements presented in [6]. In the remainder of this section an extension to include the minimization of an error function which leads to the improve the temporal consistency is presented. This aspect has least possible amount of visual distortion. For almost rectified mostly not been taken into account by currently available dispar- stereo pairs, however, it is expected that the introduced distortion ity estimation methods. The presented approach follows the lines is small. To obtain optimal results one should also perform a radial of those methods which use the motion between adjacent frames distortion estimation. to impose some additional constraints in the stereo matching pro- cess [7]. Assume that for two consecutive left views the motion 3.2. Color Equalization field mt that maps points at time t to corresponding points at time t + 1 is known, in formulas pt+1 = pt + mt . Such a motion p First correspondences are computed using stereo matching (see field can be computed using optical flow methods [8]. Then, to Section 4). Then the brightness transfer function (BTF) is com- enforce some temporal continuity for the disparities of the cor- puted from the joint histogram using a robust maximum likelihood responding pixels in time Dt (pt ) should not differ substantially estimation for each color channel independently [4]. The BTF can from Dt+1 (pt+1 ) whenever assigning the same disparity appears be used to map the intensities of one image to the other. Example to be appropriate. In essence, if mt ≈ 0, which is the condition p histograms before and after color correction using the proposed for which the point pt is stationary, the assignment of a different method are shown in Figure 3. This method does not account for disparity should be penalized. However, if there is substantial mo- inter channel dependencies as every color channel is treated inde- tion the disparity could change by a few values and the penalty pendently. More involved approaches that model those dependen- should therefore be much lower. To achieve this goal the energy cies may lead to improved results. functional that is minimized in the stereo matching is extended by a consistency term of the form ×106 ×106 ×106 8 8 8 6 6 6 |Dt+1 (pt+1 ) − Dt (pt )| PT · 1 − exp − (1) 4 4 4 γ · mt p 2 2 2 where PT is a user-defined penalty term and γ influences the 0 0 0 strength of the motion magnitude on the penalty. In particular, 0 10 20 0 10 20 0 10 20 ×106 ×106 ×106 the larger γ the smaller the penalty also for low motion. The 8 8 8 penalty can never exceed PT , such that if the motion estimation 6 6 6 fails the stereo optimization has still some freedom to choose a 4 4 4 better match. Furthermore, it is known that optical flow methods 2 2 2 tend to become inaccurate for large motion. Therefore, the adap- tation of the cost function to the motion magnitude seems to be 0 0 0 0 10 20 0 10 20 0 10 20 a reasonable choice since it can handle this type of inaccuracies properly. Nonetheless, the choice of reasonable values for param- Figure 3: Error histograms before and after color correction for R, eters PT and γ is crucial. Indeed, if too strong constraints are G and B color channels. enforced the disparity map is essentially warped according to the motion field.
  • 3. (a) Left (b) Right (c) Disparity (d) Cyclopean (e) Labeling Figure 4: Illustration of the cyclopean (d) and corresponding labeling (e) for input stereo pair (a,b) and disparity (c). Label VB is black, VL is dark gray and VR is light gray. 5. CYCLOPEAN GENERATION Due to the fact that the cyclopean movie is just slightly larger than one of the original views (10-20%), the encoding complexity is The fundamental insight is that both views share a considerable much cheaper than the one for encoding the full 3D movie. amount of information. Indeed, most parts of the image represent the same scene points. Scene points that are not visible from one 6.2. Decoding Complexity viewpoint while being visible from the other are called occluded pixels. In the following discussion it is assumed, without loss of The decoding of the cyclopean movie is much cheaper due to the generality, that the left view is the reference view. The cyclopean small size increase with respect to a single view. Furthermore, the view consists of an image which contains all pixels of the left view complexity of the 3D+F decoder is linear in the size of the cyclo- along with its occluded pixels, plus a labeling which stores the vis- pean. Therefore, the whole decoding process is very efficient. ibility of each pixel. It is constructed row by row and the occluded pixels are inserted at the position where the occlusion occurs. The 7. EXPERIMENTAL RESULTS labeling can assume one of the three values V ∈ {VB , VL , VR }, where VB denotes that the pixel is visible in both views, VL de- All coding experiments are carried out with the H.264/AVC Ref- notes that it is visible in the left view only and VR denotes that erence Software JM 18.2. The encoding parameters are reported it is visible in the right view only. In Figure 4 it is possible to in Table 1. The test movies used for the evaluation are part of see a stereo pair and the corresponding cyclopean representation. In the cyclopean representation some of the spatial and temporal Parameter Value MVC Value 3D+F consistency present in the original imagery is lost lost due to the Profile Stereo High profile High profile misalignment of adjacent scanlines. To recover, at least partially, Temporal prediction hierarchical bi-predictive the spatial and temporal consistency it is possible to perform an Number views 2 1 alignment step. Please note that the conversion to a cyclopean Intra period / GOP Size 16 16 representation and back is a lossy process. Indeed, correspond- Symbol mode CABAC ing pixels are stored only once even though they might be slightly Search range/mode 32 EPZ search different due to color mismatches and different sampling. RD optimization on Subpel ME on 6. ENCODER AND DECODER I/P/B modes on The cyclopean stream is a 2D movie and can therefore be encoded Table 1: Encoder configurations. using any video coder. On the other hand the labeling needs to be encoded losslessly and for this purpose a CABAC [1] style en- the MPEG 3DVC activity [9]. No preprocessing has been done as coder has been used. The prediction of the current label at time t is the sequences are properly rectified and the color mismatches are based on the labels in a small neighborhood that have already been minor. The RD plot for the sequences Kendo, Dancer and Poz- encoded and the previously encoded label at the same position at nan Street are shown in Figure 5. The plots include the bitrates time t − 1. Decoding cyclopean frames is trivial as it requires a required for encoding the visibility labeling which are reported in simple lookup of the visibility labeling. Table 2. Currently the labeling bitrate is independent of the bitrate at which the cyclopean movie is encoded. Therefore, the overhead 6.1. Encoding Complexity The most expensive step in the encoding pipeline is the disparity estimation with a complexity of O(|D|wh), where |D| denotes Sequence Kendo Dancer Poznan Street the disparity range and w and h the width and height of the views Bitrate [kbps] 1210 1655 1696 respectively. The motion estimation step used by video coders has a higher complexity than the proposed disparity estimation step. Table 2: Bitrate of visibility labeling for various sequences.
  • 4. Kendo Dancer Poznan Street (1024x768) (1920x1088) (1920x1088) 43 36 40 42 39 35 41 Y-PSNR [dB] Y-PSNR [dB] Y-PSNR [dB] 38 40 34 37 39 33 36 38 32 37 MVC MVC 35 MVC 3D+F 31 3D+F 34 3D+F 36 35 3D+F DC 30 3D+F DC 33 3D+F DC 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Bitrate [Mbps] Bitrate [Mbps] Bitrate [Mbps] Figure 5: RD plots of MVC, 3D+F and 3D+F with PSNR evaluated on decoded cyclopean (DC) for the sequences Kendo, Dancer and Poznan Street. Both the 3D+F and 3D+F DC curves include the bitrates required for encoding the labeling which are reported in Table 2. at low bitrates is substantial. In the future support for RD opti- 8. CONCLUSION AND FUTURE WORK mization between cyclopean movie and labeling will be explored. The plots in Figure 5 also show the PSNR of the 3D+F algorithm A novel low complexity stereoscopic 3D encoder has been pre- when evaluated with respect to the decoded cyclopean (DC). Re- sented. Preliminary results show that while inferior in terms of call that the cyclopean conversion introduces some distortion and PSNR the perceived visual quality is expected to be very compet- therefore the 3D+F DC curve should give a hint at what visual itive and comparable to that of current state-of-the-art encoders quality one can expect assuming that the conversion to the cyclo- such as MVC. Future work will explore ways of optimizing the pean representation does not introduce any visual artifacts. Distor- RD of cyclopean movies and labeling jointly. tions in the cyclopean conversion are mainly introduced by color differences and sub-pixel shifts as only integer valued disparities 9. REFERENCES are used. Most of this distortions are not visually significant but affect negatively the PSNR. Even though the RD performance of [1] ISO/IEC JTC1/SC29/WG11 and ITU-T SG16/Q.6, “Draft 3D+F is inferior to MVC according to PSNR the visual quality ITU-T Recommendation H.264 and Draft ISO/IEC 14 496- has potential to be very competitive. In Figure 6 the image quality 10 AVC,” Doc. JVT-G050, 2003. at similar bitrates and similar distortions for the Poznan Street se- [2] Ying Chen, Ye-Kui Wang, Kemal Ugur, Miska M. Hannuk- quence are shown. It is possible to see that the quality of MVC at sela, Jani Lainema, and Moncef Gabbouj, “The emerging a similar bitrate to 3D+F is just slightly better even though there is mvc standard for 3d video services,” EURASIP Journal on a difference of 1.95 dB in PSNR. At similar distortion the visual Applied Signal Processing, 2008. quality of MVC is clearly worse than 3D+F. This can be explained [3] A. Fusiello and L. Irsara, “Quasi-Euclidean Uncalibrated by the 3D+F DC curve which at 3567 kbps has similar distortion Epipolar Rectification,” in International Conference on Pat- values to MVC. tern Recognition, 2008. [4] Seon Joo Kim and M. Pollefeys, “Robust radiometric calibra- tion and vignetting correction,” Pattern Analysis and Machine Intelligence, 2008. [5] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms,” International Journal of Computer Vision, 2001. [6] M. Bleyer and M. Gelautz, “Simple but Effective Tree Struc- tures for Dynamic Programming-based Stereo Matching,” in International Conference on Computer Vision Theory and Ap- plications, 2008. [7] M. Dongbo, Y. Sehoon, and A. Vetro, “Temporally consis- tent stereo matching using coherence function,” in 3DTV- Conference, 2010. (a) (b) (c) [8] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, Figure 6: Visual comparison between MVC and 3D+F at similar and R. Szeliski, “A Database and Evaluation Methodology bitrates or similar quality. Image (a) shows the coding results of for Optical Flow,” in International Conference on Computer MVC at 4925 kbps (38.89 dB), image (b) shows the coding re- Vision, 2007. sults of 3D+F at 5263 kbps (36.94 dB) while image (c) shows the [9] ISO/IEC JTC1/SC29/WG11, “Call for Proposals on 3D coding results of MVC at 1525 kbps (36.53 dB). Video Coding Technology,” Doc. N12036, 2011.