SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
2008 IEEE/RSJ International Conference on Intelligent Robots and Systems
Acropolis Convention Center
Nice, France, Sept, 22-26, 2008



        Real-time 3D Object Pose Estimation and Tracking for Natural
                       Landmark Based Visual Servo
                       Changhyun Choi, Seung-Min Baek and Sukhan Lee, Fellow Member, IEEE


   Abstract— A real-time solution for estimating and tracking                                 II. R ELATED W ORK
the 3D pose of a rigid object is presented for image-based
visual servo with natural landmarks. The many state-of-the-art               Papanikolopoulos and Kanade proposed algorithms for
technologies that are available for recognizing the 3D pose of an         robotic real-time visual tracking using sum-of-squared differ-
object in a natural setting are not suitable for real-time servo
due to their time lags. This paper demonstrates that a real-time          ences (SSD) optical flow [5]. Combining vision and control,
solution of 3D pose estimation become feasible by combining a             the system feeds discrete displacements calculated from
fast tracker such as KLT [7] [8] with a method of determining             SSD optical flow to controllers or a discrete Kalman filter.
the 3D coordinates of tracking points on an object at the time            Considering that the vision process was taken about 100 ms
of SIFT based tracking point initiation, assuming that a 3D               in 1993, this work showed the possibility of optical flow
geometric model with SIFT description of an object is known
a-priori. Keeping track of tracking points with KLT, removing             as a real-time tracker. However, the constraints, such as an
the tracking point outliers automatically, and reinitiating the           assumption of given depth and manual selection of tracking
tracking points using SIFT once deteriorated, the 3D pose of an           points, used in the work are not acceptable to modern robots.
object can be estimated and tracked in real-time. This method             For dependable pose tracking, the depth calculation and
can be applied to both mono and stereo camera based 3D                    the tracking points generation should be fully automatic
pose estimation and tracking. The former guarantees higher
frame rates with about 1 ms of local pose estimation, while               processes. Moreover, the system only considered roll angle
the latter assures of more precise pose results but with about            of a target object. Visual servoing system should know not
16 ms of local pose estimation. The experimental investigations           only 3D position (x, y, z) but also 3D orientation (roll, pitch,
have shown the effectiveness of the proposed approach with                yaw) of it.
real-time performance.
                                                                             As CPU becomes faster for a decade, the tracking methods
                      I. I NTRODUCTION                                    using local feature points are recently proposed. Vacchetti
   In visual servo control, real-time performance of object               and Lepetit suggested a real-time solution, which does not
recognition with pose has been regarded as one of the most                have jitter or drift, for tracking objects in 3D by using
important issues for several decades. Especially the literature           standard corner features with a single camera [4]. Their
has required three dimensional (3D) pose estimation in order              approach is robust to large camera displacements, drastic
to handle a target object in 3D space. While modern recog-                aspect changes, and partial occlusions. Drawbacks of the
nition methods, such as SIFT and structured light, which are              system are that the camera should be close enough to one
applicable to accurate pose estimation have been recently                 of keyframes which is a kind of prior knowledge. This
proposed, these are still inappropriate to be applied to vision-          limited starting condition makes practical limitations on pose
based manipulation due to the high computational burden.                  tracking applications in visual servo control. For more robust
For natural control of robot manipulators, we propose a real-             tracking, tracking systems need a recognition scheme which
time solution for tracking 3D rigid objects using a mono or               is invariant to scale, rotation, affine distortion, and change in
stereo camera that combines scale invariant feature based                 3D viewpoint.
matching [1] [2] with Kande-Lucas-Tomasi (KLT) tracker                       The work of Panin and Knoll aptly complemented the
[6] [7] [8] [9]. Since there is no need for a stereo matching             disadvantages of Vacchetti and Lepetit’s work [3]. They
process, mono mode using a mono camera promises better                    proposed a fully automatic real-time solution for 3D object
computational performance. In contrast, stereo mode using                 pose tracking with SIFT feature matching and contour-based
a stereo camera shows more accurate pose results. We                      object tracking. The solution is primarily organized into two
expect that mono mode can be applied when coarse pose                     parts: global initialization module and local curve fitting
results are accepted, for example, a robot approaches a target            algorithm. The former recognizes the initial pose of the
object. Conversely, when truthful pose are necessary, such as             object on the whole incoming image through SIFT feature
grasping a target object, stereo mode will be available.                  matching. The latter estimates pose of a target object through
                                                                          contour matching on an online color image stream with
   This research was performed for the Intelligent Robotics Development   the calculated initial pose and a given contour model. The
Program, one of the 21st Century Frontier R&D Programs funded by the
Ministry of Commerce, Industry and Energy of Korea. Also this work is     approach could track the pose of objects having insufficient
supported in part by the Science and Technology Program of Gyeong-Gi      texture on the surfaces since it uses the contour information
Province as well as in part by the Sungkyunkwan University.               of objects. But, when the colors of a target object and
   Changhyun Choi, Seung-Min Baek and Sukhan Lee are with the School
of Information and Communication Engineering of Sungkyunkwan Univer-      background are similar, it has potential risk of failing of
sity, Suwon, Korea {cchoi,smbaek,lsh}@ece.skku.ac.kr                      pose tracking because Contracting Curve Density (CCD) al-

978-1-4244-2058-2/08/$25.00 ©2008 IEEE.                              3983
Mono / Stereo Camera




                                                      Initial Scene                              Current Scene              Tracking Result




                   Model images (bmp)         Initial Pose Estimation                    Local Pose Estimation           Removing Outliers
                                            (scale invariant feature matching)           (optical flow based tracking)        (RANSAC)        Good


                                                                                                                                     Bad


                        3D points
               Prior knowledge of objects




                                                                                               3D Reference Points




                                                               Fig. 1.     System Architecture



gorithm used for object contour tracking principally relies on                      A. Problem Definition: Mono and Stereo
color statistics near the boundary of the object. In addition,                         Our method is applicable to both mono camera and stereo
there is rotational ambiguity on the pose results according to                      camera. Fig. 2 illustrates two camera settings and coordinate
the shape of contour models such as cylindrical objects or                          frames. {C} and {C ′ } represent camera coordinate frames,
spherical objects.                                                                  and {M } describes model coordinate frame. Large crosses
                                                                                    depict 3D model points with respect to the model coordinate,
                 III. P ROPOSED A PPROACH                                           small crosses indicate 2D model points which are projected
                                                                                    3D model points on image plane S or S ′ . The problem of
                                                                                    mono mode can be defined as,
   In order to overcome the disadvantages of the related
works discussed above, we propose a method that combines
scale invariant features matching with optical flow based                                       Given 2D-3D correspondences and a calibrated
tracking, KLT tracker, for real-time 3D pose tracking. Since                                   mono camera, find the pose of the object with
SIFT features are invariant to scale, rotation, illumination,                                  respect to the camera.
and partial change in 3D viewpoint, SIFT feature matching
makes our system robustly estimate an initial pose of a target                      Similarly, the problem of stereo mode can be defined as,
object. However, the speed of SIFT feature matching on
modern CPU is not so fast enough to be utilized in robotic                                     Given 3D-3D correspondences and a calibrated
arm control1 . To overcome this difficulty, our approach                                        stereo camera, find the pose of the object with
adopts KLT tracker to calculate 3D pose consecutively from                                     respect to the camera.
initially estimated pose with cheap computational cost.
   KLT tracker is already well established and its implemen-                        B. System Overview
tation is available in public computer vision library [10] [9].                        As depicted in Fig. 1 the proposed method is mainly clas-
The advantages of the KLT tracker are in that it is free from                       sified into two parts: initial pose estimation and local pose
prior knowledge and computationally cheap. Like other area                          estimation. For SIFT feature matching, the system retains
correlation-based methods, however, this tracker shows poor                         BMP images of a target object and 3D points calculated
performance in significant change of scene, illumination, and                        from structured light system as prior knowledge. The off-line
occlusions. In addition, KLT tracking points are likely to                          information is utilized in the initial pose estimation. Once an
drift due to the aperture problem of optical flow. Our system                        initial pose is determined, the system automatically generates
automatically remove outliers to get accurate pose results.                         KLT tracking points around the matched SIFT points, and it
                                                                                    saves 3D reference points for further use after calculating
  1 It takes about 300 ms in our test with 640 × 480 RGB image to                   the 3D points of the tracking points. In the local pose
determine a 3D pose of an object.                                                   estimation, the system quickly tracks 3D pose of the target

                                                                                 3984
YM
                                                       M
                                                   Z                                                                                     YM
                                                               {M}
                                                                                                                         ZM
                                                                                   S                     ZC                             {M}
                                                               XM
                      S                    Z   C                                                                                        XM
                                                                                                    S               ZC
                                                                       {C}


           {C}
                                                                                            {C }
                                                                                       XC


                                                                             YC
                          XC                                                                            XC
                                                           : p1:mC                                                            : p1:mC or p1:mC
                                                           :   P1:mM                                                          : P1:mM
                 YC                                                                            YC

                                     (a)                                                                      (b)

       Fig. 2.    Camera coordinate frame {C} and model coordinate frame {M } with 3D model points. (a) Mono camera (b) Stereo camera



Algorithm 1 MONO 3D POSE TRACKING                                                 Algorithm 2 STEREO 3D POSE TRACKING
                                                    M                                                                                M
Input: model images I1:k (x, y), 3D model points J1:k (x, y),                     Input: model images I1:k (x, y), 3D model points J1:k (x, y),
a current scene image S(x, y)                                                     a current scene image S(x, y) and S ′ (x, y)
Initialize:                                                                       Initialize:
  1) Extract m SIFT feature points pC globally on the
                                        1:m                                         1) Extract m SIFT feature points pC globally on the
                                                                                                                          1:m
      S(x, y) and get m 3D model points PM correspond-
                                             1:m                                        S(x, y)
      ing to the extracted SIFT feature points by looking up                        2) Generate n 2D KLT tracking points tC within the
                                                                                                                                 1:n
         M
      J1:k (x, y)                                                                       convex hull of a set of pC 1:m
  2) Determine the initial pose ΦC of the object by using
                                    0                                               3) Determine n 3D reference points TM corresponding
                                                                                                                             1:n
      POSIT algorithm in which pC and PM are served
                                    1:m        1:m
                                                                                                                  M
                                                                                        to tC by looking up J1:k (x, y)
                                                                                              1:n
      as parameters                                                                 4) Calculate n 3D points TC corresponding to tC by
                                                                                                                    1:n                 1:n
  3) Generate n 2D KLT tracking points tC within the
                                               1:n                                      stereo matching on S(x, y) and S ′ (x, y)
      convex hull of a set of pC1:m                                               Loop:
  4) Calculate n 3D reference points TM corresponding to
                                        1:n                                         1) Track tC by using KLT tracker
                                                                                                  1:n
      tC by using the plane approximation based on tC ,
        1:n                                              1:n                        2) Remove outliers by applying RANSAC to produce
      PM , and ΦC
         1:m         0                                                                  tC ′ and TM ′
                                                                                          1:n         1:n
Loop:                                                                               3) If the number of remained tracking points n′ is fewer
  1) Track tC by using KLT tracker
               1:n                                                                      than τstereo , go to Initialize
  2) Remove outliers by applying RANSAC to produce                                  4) Calculate TC ′ by stereo matching
                                                                                                      1:n
      tC ′ and TM ′
        1:n        1:n                                                              5) Determine the current pose ΦC of the object by closed-
  3) If the number of remained tracking points n′ is fewer                              form solution using unit quaternions in which TC ′ 1:n
      than τmono , go to Initialize                                                     and TM ′ are served as parameters.
                                                                                                 1:n
  4) Determine the current pose ΦC of the object by using                         Output: the pose ΦC of the object
      POSIT algorithm in which tC ′ and TM ′ are served
                                    1:n       1:n
      as parameters.
Output: the pose ΦC of the object                                                 I1:k (x, y) = {I1 (x, y), I2 (x, y), ..., Ik (x, y)} are k RGB
                                                                                  model images in which a target object is taken, and
                                                                                    M                  M        M               M
                                                                                  J1:k (x, y) = {J1 (x, y), J2 (x, y), ..., Jk (x, y)} represents
object by using the 3D reference points and KLT tracking                          the 3D points corresponding to each pixel (x, y) of the
points. Outliers are eliminated by RANSAC algorithm until                         RGB model images with respect to the model coordinate
the number of remained tracking points is fewer than a                            frame {M }. S(x, y) and S ′ (x, y) denote the current in-
certain threshold value. When the number of tracking points                       put images. All images used are 640 × 480 resolution.
is fewer than the threshold, the system goes to the initial pose                  pC = {pC , pC , ..., pC }, tC = {tC , tC , ..., tC }, tC ′ =
                                                                                    1:m          1    2      m    1:n       1    2     n    1:n
estimation and restarts SIFT matching globally, otherwise the                     {tC , tC , ..., tC′ }, and ΦC are point vectors or pose vector
                                                                                     1   2         n
system repeats the local pose estimation.                                         of the object with respect to the camera coordinate frame
                                                                                  {C}. Similarly PM , TM , and TM ′ are point vectors in
                                                                                                         1:m  1:n        1:n
C. Algorithm                                                                      the model coordinate frame {M }.
   For clarity, Algorithm 1 and 2 are given to illus-                                Although the minimum number of corresponding points
trate the core of our approach. In Algorithm 1 and 2,                             are 4 points in POSIT algorithm [13] and 3 points in the

                                                                       3985
closed-from solution using unit quaternions [14], the thresh-                          Differences (SAD) stereo correlation is used. Alternatively,
old for the minimum number of correspondences, τmono                                   in mono mode, we trickily determine 3D points from the
(≥ 4) and τstereo (≥ 3), should be adjusted according to                               approximation with the 3D points of prior knowledge. The
the accuracy the application needs.                                                    key idea of the approximation is to treat the surface as locally
                                                                                       flat. Fig. 3 (c) and Fig. 4 illustrate the approximation. In
D. Generating Tracking Points                                                          Fig. 3 (c), bold points indicate 2D KLT tracking points,
                                                                                       and crosses represent the matched SIFT points. Since we
                                                                                       already know the 3D points of the matched SIFT points
                                                                                       from prior knowledge, each 3D point of the tracking point
                                                                                       can be estimated under the assumption that the 3D tracking
                                                                                       point lies on the 3D plane the 3D points of the three nearest
                                                                                       neighboring matched SIFT points make.
                                                                                          Fig. 4 represents a 2D KLT tracking point tC , its 3D
                                                                                                                                             i
                                                                                       point TM and 3D points PM , PM , PM of the three nearest
                                                                                                i                     i1    i2   i3
                                                                                       neighboring matched SIFT points with coordinate frames,
                                                                                       where the subscripts i1 , i2 , and i3 are the selected indices
         (a)                              (b)                       (c)                from 1, 2, ..., m. First, the equation of plane Π can be written
Fig. 3. The target object and the region for generating 2D KLT tracking                as
points. The tracking points are only generated within the convex hull of a                                   ax + by + cz + d = 0                   (1)
set of the matched SIFT points. In mono mode, for calculating the 3D point
of each tracking point the algorithm uses three nearest neighboring matched            Since our goal is determining the 3D point TM correspond-
                                                                                                                                    i
SIFT points. (a) Target object (b) Matched SIFT points and the convex hull             ing to tC , we start from estimating the plane equation Eq.
                                                                                               i
of a set of them on the surface of the target object (c) 2D KLT tracking
points (bold points) within the convex hull                                            1. The normal vector n of the plane can be determined as
                                                                                                    n = (PM − PM ) × (PM − PM )
                                                                                                          i2   i1      i3   i1                     (2)
   To track a moving 3D object by using KLT tracker,
we have to generate tracking points on the area of the                                 where n = (a, b, c)T . The remained coefficient d of the plane
target object. We employ the matched SIFT points used in                               equation is then expressed as
the initial pose estimation for allocating tracking points on                                                  d = −n · PM
                                                                                                                         i1                        (3)
the region of the target object. As shown in Fig. 3 (b),
                                                                                       To scrutinize the relationship between the 2D point tC and
                                                                                                                                             i
our algorithm generates the tracking points only inside the
                                                                                       the 3D point TM of the KLT tracking point, let’s consider
                                                                                                       i
convex hull of a set of the matched SIFT points with the
                                                                                       in perspective projection. Note that TM is in the model
                                                                                                                                i
criteria proposed by Jianbo Shi and Carlo Tomasi [8]. In
                                                                                       coordinate frame, so we have to transform TM into TC
                                                                                                                                        i         i
addition to these criteria, in stereo mode we regard the
                                                                                       according to the initial pose ΦC . Using the homogeneous
                                                                                                                        0
tracking points that have 3D points by the stereo matching
                                                                                       representation of those points, a linear projection equation
as good features to 3D track.
                                                                                       can be written as
                                                                                                                                      
                                                       Pi3M                                                                    x
                                                                                                  u       wu        fx 0 cx 0  
                                                          
                                                               ZM              XM
                                                                                                v  ∼ wv  =  0 fy cy 0 y 
                                                                                                      =                                         (4)
                                                                      {M}                                                            z 
                                                Pi2M                                              1        w         0 0 1 0
                                                                          YM                                                            1
                                                              TiM
                                                                                       where fx and fy represent the focal length (in pixels) in the x
               S                                ZC
                      pi3C                                                             and y direction respectively, TC = (x, y, z)T , tC = (u, v)T ,
                                                                                                                      i                 i
                                                                          Pi1M
                   pi2C                                                                and cx and cy are the principal point. Since our current goal
      {C}
                             tiC
                                   pi1C                                                is determining the 3D point TC corresponding to tC , Eq. 4
                                                                                                                      i                     i
                                                                                       can be expressed as (x, y, z) with respect to (u, v)
                          XC                                                                           wu   fx x + cx z     x
                                                                                                    u=    =             = fx + cx
                                                                                                       w         z          z                      (5)
         YC                                                                                            wv   fy y + cy z     y
                                                                                                    v=    =             = fy + cy
                                                                                                       w         z          z
Fig. 4. Plane Approximation. A 3D point TM corresponding to tC on
                                             i                     i                                      x     1
the image plane S is determined from the 3D plane Π that three 3D points                                    =     (u − cx )
  M , PM , PM of the nearest neighboring matched SIFT points form.
Pi1                                                                                                       z    fx
       i2    i3
                                                                                                          y     1                                  (6)
                                                                                                            = (v − cy )
  When the algorithm generates tracking points, it also deter-                                            z    fy
mines 3D reference points and saves them for the subsequent                            If Eq. 1 is divided by z then the equation becomes
pose estimation. In stereo mode we directly calculate the 3D                                                x     y       d
points by the stereo matching in which Sum of Absolute                                                    a +b +c+ =0                              (7)
                                                                                                            z     z       z
                                                                                    3986
(a)




                                                                           (b)

Fig. 5. Pose tracking results of the experiment. The results of pose and KLT tracking points are represented on the sequence of input images. Pose is
depicted as a green rectangular parallelepiped. (a) Tracking results of mono mode (b) Tracking results of stereo mode (From left to right) 2nd frame, 69th
frame, 136th frame, 203rd frame, and 270th frame.



Therefore, by applying Eq. 6 to Eq. 7 and developing, we                         as mono mode or stereo mode. Since we needed ground truth,
can get                                                                          real trajectories of 3D pose, to evaluate tracking results, we
                               d                                                 used an ARToolkit marker3 . After estimating the pose of
            z= a                b
                 fx (cx − u) + fy (cy − v) − c
                                                                                 the marker we transformed the pose data from the marker
                           z                                                     coordinate system to the object coordinate system.
                      x=     (u − cx )                (8)
                          fx
                           z                                                     A. Tracking Results
                      y = (v − cy )
                          fy                                                        Fig. 5 shows pose tracking results of the object, and the
                                                                                 trajectories are plotted with ground truth in Fig. 6. Note that
   With Eq. 8, we can determine TC and reversely trans-
                                  1:n                                            the positions with respect to x and y are well tracked in both
form TC into TM according to ΦC . Therefore, we can
        1:n      1:n                0                                            modes, while the position of z has some errors with respect to
get the approximate 3D points TM corresponding to the
                                1:n                                              the ground truth. In rotation, mono mode shows bigger errors
2D KLT tracking points tC .
                        1:n                                                      than stereo mode especially in pitch and yaw angles. Table. I
                                                                                 describes RMS errors for all frames of input images. In mode
E. Outlier Handling
                                                                                 mode the maximum positional errors are within about 3 cm,
   The KLT tracking points have a tendency to drift due to                       and the maximum rotational errors are inside approximately
abrupt illumination change, occlusion, and aperture problem.                     16 degrees. In stereo mode the maximum positional errors
We regard drifting points as outliers, and try to eliminate                      are within about 2 cm, and the maximum rotational errors
the outliers because they influence the accuracy of resulting                     are inside around 6 degrees.
poses. We adopt RANSAC algorithm [11] to remove the
outliers during tracking. If the number of remained tracking                                                      TABLE I
points is fewer than a certain number, τmono and τstereo                                 RMS    ERRORS OVER THE WHOLE SEQUENCE OF IMAGE .

in mono mode and stereo mode respectively, the algorithm
goes back to the initial pose estimation and restarts the SIFT                                                 Mono mode         Stereo mode
feature matching.
                                                                                                X (cm)              1.04            0.88
                                                                                                Y (cm)             1.74             1.39
                          IV. E XPERIMENT                                                       Y (cm)              3.10            2.02
   In this section, we present an experimental result that                                      Roll (deg)         5.50             4.78
                                                                                                Pitch (deg)        15.83            5.23
shows abilities of our method to track a 3D object. As                                          Yaw (deg)          16.22            5.87
shown in Fig. 3 (a), the target object was a white mug
that has partial texture on the side surface, and the prior
knowledge of it was prepared in advance. The system has
been implemented in C/C++ using OpenCV and OpenGL                                B. Real-time Performance
libraries on a modern PC (Pentium IV 2.4GHz) and a                                 In order to show our method is capable of being a real-time
Bumblebee R stereo camera of the Point Grey Research Inc                         solution, we present the computation times of the experiment.
with the TriclopsTM Stereo SDK2 . We used the stereo camera                      Fig. 7 demonstrates the computation times of the proposed
  2 http://www.ptgrey.com/products/stereo.asp                                      3 http://www.hitl.washington.edu/artoolkit/



                                                                         3987
Translation (x)                                                                                           Translation (y)                                                                       Translation (z)
                                      50                                                                                                                                       50                                                                                    100
                                                                                                                  Ground Truth                                                                                      Ground Truth                                                                          Ground Truth
                                      40                                                                          Mono Mode                                                    40                                   Mono Mode                                         90                                  Mono Mode
                                                                                                                  Stereo Mode                                                                                       Stereo Mode                                                                           Stereo Mode
            X object position (cm)




                                                                                                                                                     Y object position (cm)




                                                                                                                                                                                                                                            Z object position (cm)
                                      30                                                                                                                                       30                                                                                     80

                                      20                                                                                                                                       20                                                                                     70

                                      10                                                                                                                                       10                                                                                     60

                                       0                                                                                                                                        0                                                                                     50

                                     −10                                                                                                                                      −10                                                                                     40

                                     −20                                                                                                                                      −20                                                                                     30

                                     −30                                                                                                                                      −30                                                                                     20

                                     −40                                                                                                                                      −40                                                                                     10

                                     −50                                                                                                                                      −50                                                                                      0
                                           0         50                                 100         150         200      250                                                        0    50     100         150   200      250                                             0   50     100         150   200      250
                                                                                        Frame Number                                                                                            Frame Number                                                                          Frame Number

                                                                                              (a)                                                                                                     (b)                                                                                   (c)
                                                                                    Rotation (roll)                                                                                           Rotation (pitch)                                                                      Rotation (yaw)
                                                                                                                  Ground Truth                                                                                      Ground Truth                                                                          Ground Truth
                                     150                                                                          Mono Mode                                                   150                                   Mono Mode                                        150                                  Mono Mode
                                                                                                                  Stereo Mode                                                                                       Stereo Mode                                                                           Stereo Mode


                                                                                                                                            Pitch Angle (degree)
   Roll Angle (degree)




                                                                                                                                                                                                                                   Yaw Angle (degree)
                                     100                                                                                                                                      100                                                                                    100


                                      50                                                                                                                                       50                                                                                     50


                                       0                                                                                                                                        0                                                                                      0


                                     −50                                                                                                                                      −50                                                                                    −50


                                −100                                                                                                                                     −100                                                                                  −100


                                −150                                                                                                                                     −150                                                                                  −150

                                           0         50                                 100         150         200      250                                                        0    50     100         150   200      250                                             0   50     100         150   200      250
                                                                                        Frame Number                                                                                            Frame Number                                                                          Frame Number

                                                                                              (d)                                                                                                     (e)                                                                                   (f)

Fig. 6.                                        Estimated trajectories of the object against ground truth. (a) X position (b) Y position (c) Z position (d) Roll angle (e) Pitch angle (f) Yaw angle


                                                                                                    Computation Times
                                                                              350
                                                                                                                                 Real−time
                                                                                                                                                                                                            the initial pose estimation in mono mode and the initial
                                                                                                                                 Result (Mono)
                                                                              300                                                Result (Stereo)                                                            pose estimation with the stereo matching in stereo mode are
                                                     Computation Times (ms)




                                                                              250
                                                                                                                                                                                                            bottlenecks in our system. Since the current stereo matching
                                                                                                                                                                                                            calculates all 3D points on each input frame, it takes amount
                                                                              200
                                                                                                                                                                                                            of computation. We expect a significant improvement of
                                                                              150
                                                                                                                                                                                                            computation time in the stereo matching if it calculates 3D
                                                                              100
                                                                                                                                                                                                            points only on the 2D KLT tracking points.
                                                                               50
                                                                                                                                                                                                                          V. C ONCLUSION AND F UTURE W ORK
                                                                                0
                                                                                    0         50          100     150      200        250                                     300
                                                                                                            Frame Number                                                                                       In this paper, we proposed a method for tracking 3D
                                                                                                                                                                                                            roto-translation of rigid objects using scale invariant feature
Fig. 7.                                        Computation times of the proposed algorithm on the experiment.                                                                                               based matching and Kande-Lucas-Tomasi (KLT) tracker. The
                                                                                                                                                                                                            method has two mode, mono mode using a mono camera and
                                                                                                                                                                                                            stereo mode using a stereo camera. Mono mode guarantees
algorithm. The experiment takes below average 1 ms per                                                                                                                                                      higher frame rate performance, and stereo mode shows better
frame in mono mode and average 16 ms per frame in stereo                                                                                                                                                    pose results. In the initial pose estimation, our system used
mode on a modern PC. The peak at the initial frame is caused                                                                                                                                                prior knowledge of target objects for SIFT feature matching.
by SIFT feature extraction and matching.                                                                                                                                                                    In addition, to be used in the local pose estimation, 3D
                                                                                                            TABLE II
                                                                                                                                                                                                            reference points are calculated by the plane approximation
C OMPUTATION TIMES OF OUR ALGORITHM ON A MODERN PC, P ENTIUM
                                                                                                                                                                                                            in mono mode and the stereo matching in stereo mode.
                                                                                IV 2.4GH Z , ON 640 × 480                         IMAGES .
                                                                                                                                                                                                               Future work will be mainly focused on decreasing the
                                                                                                                                                                                                            computational burden of the initial pose estimation. Our
                                                                                                                                                                                                            approach using KLT tracker has to do SIFT feature extraction
                                                                                                                      Mono mode                                       Stereo mode                           and matching, which needs amount of computation, again if
                                     Image(s) acquisition                                                               35 ms                                                        70 ms                  the KLT tracking points are disappeared or drifted. This can
                                     Stereo matching                                                                       -                                                        210 ms                  be further improved by unifying the contour based tracking
                                     Initial pose estimation (SIFT)                                                     300 ms                                                      300 ms
                                     Local pose estimation (KLT)                                                         1 ms                                                        16 ms                  in order to generate KLT tracking points again inside the
                                                                                                                                                                                                            tracking contour without SIFT feature matching. Moreover,
                                                                                                                                                                                                            recent GPU-based implementation of KLT tracker and SIFT,
  Table. II describes the computational cost of each part                                                                                                                                                   GPU KLT and SiftGPU, respectively, will enhance the frame
of the algorithm. According to the table, we know that                                                                                                                                                      rate of our method significantly.

                                                                                                                                                                                                  3988
VI. ACKNOWLEDGMENTS
   The authors would like to thank KyeongKeun Baek for his
kind help to make 3D mug models as well as Daesik Kim
for his useful comments to devise the algorithm presented in
this paper.
                             R EFERENCES
 [1] D. Lowe. (1999) Object recognition from local scale invariant features.
     In Proc. 7th International Conf. Computer Vision (ICCV99), pages
     1150–1157, Kerkyra, Greece
 [2] D. G. Lowe, Distinctive image features from scale-invariant keypoints,
     Int. J. Comput. Vision, vol. 60, no. 2, pp. 91-110, 2004.
 [3] G. Panin, A. Knoll, Fully Automatic Real-Time 3D Object Tracking
     using Active Contour and Appearance Models, Journal of Multimedia
     2006, Academic Publisher, vol. 1 n. 7, pp. 62-70
 [4] Vacchetti, L.; Lepetit, V.; Fua, P., Stable real-time 3D tracking using
     online and offline information, Transactions on Pattern Analysis and
     Machine Intelligence , vol.26, no.10, pp. 1385-1391, Oct. 2004
 [5] Papanikolopoulos, N.P.; Khosla, P.K.; Kanade, T., Visual tracking of
     a moving target by a camera mounted on a robot: a combination of
     control and vision, Robotics and Automation, IEEE Transactions on ,
     vol.9, no.1, pp.14-35, Feb 1993
 [6] B. Lucas and T. Kanade, An iterative image registration technique
     with an application to stereo vision, in International Joint Conference
     on Artificial Intelligence, pp. 674679, 1981.
 [7] C. Tomasi and T. Kanade, Detection and Tracking of Point Features,
     Technical Report CMU-CS-91-132, Carnegie Mellon University, April
     1991.
 [8] Jianbo Shi; Tomasi, C., Good features to track, Computer Vision
     and Pattern Recognition, 1994. Proceedings CVPR ’94., 1994 IEEE
     Computer Society Conference on , vol., no., pp.593-600, 21-23 Jun
     1994.
 [9] J.-Y. Bouguet, Pyramidal Implementation of the Lucas Kanade Feature
     Tracker: Description of the algorithm, Technical Report, Intel Corpo-
     ration, Microprocessor Research Labs 2000, OpenCV documentation.
[10] Intel,     Open        Source       Computer        Vision      Library,
     http://www.intel.com/research/mrl/research/opencv/.
[11] M. Fischler and R. Bolles, Random sampling consensus: a paradigm
     for model fitting with application to image analysis and automated
     cartography, Commun. Assoc. Comp. Mach., vol. 24, pp. 381395, 1981.
[12] Sudipta N Sinha, Jan-Michael Frahm, Marc Pollefeys and Yakup
     Genc, GPU-Based Video Feature Tracking and Matching, EDGE 2006,
     workshop on Edge Computing Using New Commodity Architectures,
     Chapel Hill, May 2006.
[13] D. DeMenthon and L.S. Davis, Model-Based Object Pose in 25 Lines
     of Code, International Journal of Computer Vision, 15, pp. 123-141,
     June 1995.
[14] Horn, Berthold K. P., Closed-Form Solution of Absolute Orientation
     Using Unit Quaternions, Journal of the Optical Society of America A
     4, No. 4, April 1987, pp. 629-642.




                                                                           3989

Weitere ähnliche Inhalte

Was ist angesagt?

Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor network
ijcga
 
Visual Odometry using Stereo Vision
Visual Odometry using Stereo VisionVisual Odometry using Stereo Vision
Visual Odometry using Stereo Vision
RSIS International
 
Integration of a Structure from Motion into Virtual and Augmented Reality for...
Integration of a Structure from Motion into Virtual and Augmented Reality for...Integration of a Structure from Motion into Virtual and Augmented Reality for...
Integration of a Structure from Motion into Virtual and Augmented Reality for...
Tomohiro Fukuda
 
A NOVEL BACKGROUND SUBTRACTION ALGORITHM FOR PERSON TRACKING BASED ON K-NN
A NOVEL BACKGROUND SUBTRACTION ALGORITHM FOR PERSON TRACKING BASED ON K-NN A NOVEL BACKGROUND SUBTRACTION ALGORITHM FOR PERSON TRACKING BASED ON K-NN
A NOVEL BACKGROUND SUBTRACTION ALGORITHM FOR PERSON TRACKING BASED ON K-NN
csandit
 
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
sipij
 

Was ist angesagt? (20)

Interactive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor networkInteractive full body motion capture using infrared sensor network
Interactive full body motion capture using infrared sensor network
 
mihara_iccp16_presentation
mihara_iccp16_presentationmihara_iccp16_presentation
mihara_iccp16_presentation
 
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
Robust and Efficient Coupling of Perception to Actuation with Metric and Non-...
 
SLAM Zero to One
SLAM Zero to OneSLAM Zero to One
SLAM Zero to One
 
Visual Odometry using Stereo Vision
Visual Odometry using Stereo VisionVisual Odometry using Stereo Vision
Visual Odometry using Stereo Vision
 
Integration of a Structure from Motion into Virtual and Augmented Reality for...
Integration of a Structure from Motion into Virtual and Augmented Reality for...Integration of a Structure from Motion into Virtual and Augmented Reality for...
Integration of a Structure from Motion into Virtual and Augmented Reality for...
 
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...
 
Intelligent indoor mobile robot navigation using stereo vision
Intelligent indoor mobile robot navigation using stereo visionIntelligent indoor mobile robot navigation using stereo vision
Intelligent indoor mobile robot navigation using stereo vision
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
Deep Learning - a Path from Big Data Indexing to Robotic Applications
Deep Learning - a Path from Big Data Indexing to Robotic ApplicationsDeep Learning - a Path from Big Data Indexing to Robotic Applications
Deep Learning - a Path from Big Data Indexing to Robotic Applications
 
The flow of baseline estimation using a single omnidirectional camera
The flow of baseline estimation using a single omnidirectional cameraThe flow of baseline estimation using a single omnidirectional camera
The flow of baseline estimation using a single omnidirectional camera
 
G04743943
G04743943G04743943
G04743943
 
Recognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesRecognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenes
 
Ear Biometrics shritosh kumar
Ear Biometrics shritosh kumarEar Biometrics shritosh kumar
Ear Biometrics shritosh kumar
 
A NOVEL BACKGROUND SUBTRACTION ALGORITHM FOR PERSON TRACKING BASED ON K-NN
A NOVEL BACKGROUND SUBTRACTION ALGORITHM FOR PERSON TRACKING BASED ON K-NN A NOVEL BACKGROUND SUBTRACTION ALGORITHM FOR PERSON TRACKING BASED ON K-NN
A NOVEL BACKGROUND SUBTRACTION ALGORITHM FOR PERSON TRACKING BASED ON K-NN
 
Iw3515281533
Iw3515281533Iw3515281533
Iw3515281533
 
Moving object detection using background subtraction algorithm using simulink
Moving object detection using background subtraction algorithm using simulinkMoving object detection using background subtraction algorithm using simulink
Moving object detection using background subtraction algorithm using simulink
 
Visual odometry _report
Visual odometry _reportVisual odometry _report
Visual odometry _report
 
Disparity map generation based on trapezoidal camera architecture for multi v...
Disparity map generation based on trapezoidal camera architecture for multi v...Disparity map generation based on trapezoidal camera architecture for multi v...
Disparity map generation based on trapezoidal camera architecture for multi v...
 
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
A NOVEL METHOD FOR PERSON TRACKING BASED K-NN : COMPARISON WITH SIFT AND MEAN...
 

Ähnlich wie Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based Visual Servo

10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
mokamojah
 
Arindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentationArindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentation
Arindam Batabyal
 
eric-yilin cv presentation
eric-yilin cv presentationeric-yilin cv presentation
eric-yilin cv presentation
Yujia Huang
 

Ähnlich wie Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based Visual Servo (20)

ei2106-submit-opt-415
ei2106-submit-opt-415ei2106-submit-opt-415
ei2106-submit-opt-415
 
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
 
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
REGISTRATION TECHNOLOGIES and THEIR CLASSIFICATION IN AUGMENTED REALITY THE K...
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3
 
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro..."High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
Leader Follower Formation Control of Ground Vehicles Using Dynamic Pixel Coun...
 
Motion capture document
Motion capture documentMotion capture document
Motion capture document
 
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHMA ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
 
3D reconstruction
3D reconstruction3D reconstruction
3D reconstruction
 
Arindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentationArindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentation
 
Final Paper
Final PaperFinal Paper
Final Paper
 
hpe3d_report.pdf
hpe3d_report.pdfhpe3d_report.pdf
hpe3d_report.pdf
 
eric-yilin cv presentation
eric-yilin cv presentationeric-yilin cv presentation
eric-yilin cv presentation
 
Implementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time VideoImplementation of Object Tracking for Real Time Video
Implementation of Object Tracking for Real Time Video
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camera
 
Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment
 
Design and implementation of video tracking system based on camera field of view
Design and implementation of video tracking system based on camera field of viewDesign and implementation of video tracking system based on camera field of view
Design and implementation of video tracking system based on camera field of view
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based Visual Servo

  • 1. 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems Acropolis Convention Center Nice, France, Sept, 22-26, 2008 Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based Visual Servo Changhyun Choi, Seung-Min Baek and Sukhan Lee, Fellow Member, IEEE Abstract— A real-time solution for estimating and tracking II. R ELATED W ORK the 3D pose of a rigid object is presented for image-based visual servo with natural landmarks. The many state-of-the-art Papanikolopoulos and Kanade proposed algorithms for technologies that are available for recognizing the 3D pose of an robotic real-time visual tracking using sum-of-squared differ- object in a natural setting are not suitable for real-time servo due to their time lags. This paper demonstrates that a real-time ences (SSD) optical flow [5]. Combining vision and control, solution of 3D pose estimation become feasible by combining a the system feeds discrete displacements calculated from fast tracker such as KLT [7] [8] with a method of determining SSD optical flow to controllers or a discrete Kalman filter. the 3D coordinates of tracking points on an object at the time Considering that the vision process was taken about 100 ms of SIFT based tracking point initiation, assuming that a 3D in 1993, this work showed the possibility of optical flow geometric model with SIFT description of an object is known a-priori. Keeping track of tracking points with KLT, removing as a real-time tracker. However, the constraints, such as an the tracking point outliers automatically, and reinitiating the assumption of given depth and manual selection of tracking tracking points using SIFT once deteriorated, the 3D pose of an points, used in the work are not acceptable to modern robots. object can be estimated and tracked in real-time. This method For dependable pose tracking, the depth calculation and can be applied to both mono and stereo camera based 3D the tracking points generation should be fully automatic pose estimation and tracking. The former guarantees higher frame rates with about 1 ms of local pose estimation, while processes. Moreover, the system only considered roll angle the latter assures of more precise pose results but with about of a target object. Visual servoing system should know not 16 ms of local pose estimation. The experimental investigations only 3D position (x, y, z) but also 3D orientation (roll, pitch, have shown the effectiveness of the proposed approach with yaw) of it. real-time performance. As CPU becomes faster for a decade, the tracking methods I. I NTRODUCTION using local feature points are recently proposed. Vacchetti In visual servo control, real-time performance of object and Lepetit suggested a real-time solution, which does not recognition with pose has been regarded as one of the most have jitter or drift, for tracking objects in 3D by using important issues for several decades. Especially the literature standard corner features with a single camera [4]. Their has required three dimensional (3D) pose estimation in order approach is robust to large camera displacements, drastic to handle a target object in 3D space. While modern recog- aspect changes, and partial occlusions. Drawbacks of the nition methods, such as SIFT and structured light, which are system are that the camera should be close enough to one applicable to accurate pose estimation have been recently of keyframes which is a kind of prior knowledge. This proposed, these are still inappropriate to be applied to vision- limited starting condition makes practical limitations on pose based manipulation due to the high computational burden. tracking applications in visual servo control. For more robust For natural control of robot manipulators, we propose a real- tracking, tracking systems need a recognition scheme which time solution for tracking 3D rigid objects using a mono or is invariant to scale, rotation, affine distortion, and change in stereo camera that combines scale invariant feature based 3D viewpoint. matching [1] [2] with Kande-Lucas-Tomasi (KLT) tracker The work of Panin and Knoll aptly complemented the [6] [7] [8] [9]. Since there is no need for a stereo matching disadvantages of Vacchetti and Lepetit’s work [3]. They process, mono mode using a mono camera promises better proposed a fully automatic real-time solution for 3D object computational performance. In contrast, stereo mode using pose tracking with SIFT feature matching and contour-based a stereo camera shows more accurate pose results. We object tracking. The solution is primarily organized into two expect that mono mode can be applied when coarse pose parts: global initialization module and local curve fitting results are accepted, for example, a robot approaches a target algorithm. The former recognizes the initial pose of the object. Conversely, when truthful pose are necessary, such as object on the whole incoming image through SIFT feature grasping a target object, stereo mode will be available. matching. The latter estimates pose of a target object through contour matching on an online color image stream with This research was performed for the Intelligent Robotics Development the calculated initial pose and a given contour model. The Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy of Korea. Also this work is approach could track the pose of objects having insufficient supported in part by the Science and Technology Program of Gyeong-Gi texture on the surfaces since it uses the contour information Province as well as in part by the Sungkyunkwan University. of objects. But, when the colors of a target object and Changhyun Choi, Seung-Min Baek and Sukhan Lee are with the School of Information and Communication Engineering of Sungkyunkwan Univer- background are similar, it has potential risk of failing of sity, Suwon, Korea {cchoi,smbaek,lsh}@ece.skku.ac.kr pose tracking because Contracting Curve Density (CCD) al- 978-1-4244-2058-2/08/$25.00 ©2008 IEEE. 3983
  • 2. Mono / Stereo Camera Initial Scene Current Scene Tracking Result Model images (bmp) Initial Pose Estimation Local Pose Estimation Removing Outliers (scale invariant feature matching) (optical flow based tracking) (RANSAC) Good Bad 3D points Prior knowledge of objects 3D Reference Points Fig. 1. System Architecture gorithm used for object contour tracking principally relies on A. Problem Definition: Mono and Stereo color statistics near the boundary of the object. In addition, Our method is applicable to both mono camera and stereo there is rotational ambiguity on the pose results according to camera. Fig. 2 illustrates two camera settings and coordinate the shape of contour models such as cylindrical objects or frames. {C} and {C ′ } represent camera coordinate frames, spherical objects. and {M } describes model coordinate frame. Large crosses depict 3D model points with respect to the model coordinate, III. P ROPOSED A PPROACH small crosses indicate 2D model points which are projected 3D model points on image plane S or S ′ . The problem of mono mode can be defined as, In order to overcome the disadvantages of the related works discussed above, we propose a method that combines scale invariant features matching with optical flow based Given 2D-3D correspondences and a calibrated tracking, KLT tracker, for real-time 3D pose tracking. Since mono camera, find the pose of the object with SIFT features are invariant to scale, rotation, illumination, respect to the camera. and partial change in 3D viewpoint, SIFT feature matching makes our system robustly estimate an initial pose of a target Similarly, the problem of stereo mode can be defined as, object. However, the speed of SIFT feature matching on modern CPU is not so fast enough to be utilized in robotic Given 3D-3D correspondences and a calibrated arm control1 . To overcome this difficulty, our approach stereo camera, find the pose of the object with adopts KLT tracker to calculate 3D pose consecutively from respect to the camera. initially estimated pose with cheap computational cost. KLT tracker is already well established and its implemen- B. System Overview tation is available in public computer vision library [10] [9]. As depicted in Fig. 1 the proposed method is mainly clas- The advantages of the KLT tracker are in that it is free from sified into two parts: initial pose estimation and local pose prior knowledge and computationally cheap. Like other area estimation. For SIFT feature matching, the system retains correlation-based methods, however, this tracker shows poor BMP images of a target object and 3D points calculated performance in significant change of scene, illumination, and from structured light system as prior knowledge. The off-line occlusions. In addition, KLT tracking points are likely to information is utilized in the initial pose estimation. Once an drift due to the aperture problem of optical flow. Our system initial pose is determined, the system automatically generates automatically remove outliers to get accurate pose results. KLT tracking points around the matched SIFT points, and it saves 3D reference points for further use after calculating 1 It takes about 300 ms in our test with 640 × 480 RGB image to the 3D points of the tracking points. In the local pose determine a 3D pose of an object. estimation, the system quickly tracks 3D pose of the target 3984
  • 3. YM M Z YM {M} ZM S ZC {M} XM S Z C XM S ZC {C} {C} {C } XC YC XC XC : p1:mC : p1:mC or p1:mC : P1:mM : P1:mM YC YC (a) (b) Fig. 2. Camera coordinate frame {C} and model coordinate frame {M } with 3D model points. (a) Mono camera (b) Stereo camera Algorithm 1 MONO 3D POSE TRACKING Algorithm 2 STEREO 3D POSE TRACKING M M Input: model images I1:k (x, y), 3D model points J1:k (x, y), Input: model images I1:k (x, y), 3D model points J1:k (x, y), a current scene image S(x, y) a current scene image S(x, y) and S ′ (x, y) Initialize: Initialize: 1) Extract m SIFT feature points pC globally on the 1:m 1) Extract m SIFT feature points pC globally on the 1:m S(x, y) and get m 3D model points PM correspond- 1:m S(x, y) ing to the extracted SIFT feature points by looking up 2) Generate n 2D KLT tracking points tC within the 1:n M J1:k (x, y) convex hull of a set of pC 1:m 2) Determine the initial pose ΦC of the object by using 0 3) Determine n 3D reference points TM corresponding 1:n POSIT algorithm in which pC and PM are served 1:m 1:m M to tC by looking up J1:k (x, y) 1:n as parameters 4) Calculate n 3D points TC corresponding to tC by 1:n 1:n 3) Generate n 2D KLT tracking points tC within the 1:n stereo matching on S(x, y) and S ′ (x, y) convex hull of a set of pC1:m Loop: 4) Calculate n 3D reference points TM corresponding to 1:n 1) Track tC by using KLT tracker 1:n tC by using the plane approximation based on tC , 1:n 1:n 2) Remove outliers by applying RANSAC to produce PM , and ΦC 1:m 0 tC ′ and TM ′ 1:n 1:n Loop: 3) If the number of remained tracking points n′ is fewer 1) Track tC by using KLT tracker 1:n than τstereo , go to Initialize 2) Remove outliers by applying RANSAC to produce 4) Calculate TC ′ by stereo matching 1:n tC ′ and TM ′ 1:n 1:n 5) Determine the current pose ΦC of the object by closed- 3) If the number of remained tracking points n′ is fewer form solution using unit quaternions in which TC ′ 1:n than τmono , go to Initialize and TM ′ are served as parameters. 1:n 4) Determine the current pose ΦC of the object by using Output: the pose ΦC of the object POSIT algorithm in which tC ′ and TM ′ are served 1:n 1:n as parameters. Output: the pose ΦC of the object I1:k (x, y) = {I1 (x, y), I2 (x, y), ..., Ik (x, y)} are k RGB model images in which a target object is taken, and M M M M J1:k (x, y) = {J1 (x, y), J2 (x, y), ..., Jk (x, y)} represents object by using the 3D reference points and KLT tracking the 3D points corresponding to each pixel (x, y) of the points. Outliers are eliminated by RANSAC algorithm until RGB model images with respect to the model coordinate the number of remained tracking points is fewer than a frame {M }. S(x, y) and S ′ (x, y) denote the current in- certain threshold value. When the number of tracking points put images. All images used are 640 × 480 resolution. is fewer than the threshold, the system goes to the initial pose pC = {pC , pC , ..., pC }, tC = {tC , tC , ..., tC }, tC ′ = 1:m 1 2 m 1:n 1 2 n 1:n estimation and restarts SIFT matching globally, otherwise the {tC , tC , ..., tC′ }, and ΦC are point vectors or pose vector 1 2 n system repeats the local pose estimation. of the object with respect to the camera coordinate frame {C}. Similarly PM , TM , and TM ′ are point vectors in 1:m 1:n 1:n C. Algorithm the model coordinate frame {M }. For clarity, Algorithm 1 and 2 are given to illus- Although the minimum number of corresponding points trate the core of our approach. In Algorithm 1 and 2, are 4 points in POSIT algorithm [13] and 3 points in the 3985
  • 4. closed-from solution using unit quaternions [14], the thresh- Differences (SAD) stereo correlation is used. Alternatively, old for the minimum number of correspondences, τmono in mono mode, we trickily determine 3D points from the (≥ 4) and τstereo (≥ 3), should be adjusted according to approximation with the 3D points of prior knowledge. The the accuracy the application needs. key idea of the approximation is to treat the surface as locally flat. Fig. 3 (c) and Fig. 4 illustrate the approximation. In D. Generating Tracking Points Fig. 3 (c), bold points indicate 2D KLT tracking points, and crosses represent the matched SIFT points. Since we already know the 3D points of the matched SIFT points from prior knowledge, each 3D point of the tracking point can be estimated under the assumption that the 3D tracking point lies on the 3D plane the 3D points of the three nearest neighboring matched SIFT points make. Fig. 4 represents a 2D KLT tracking point tC , its 3D i point TM and 3D points PM , PM , PM of the three nearest i i1 i2 i3 neighboring matched SIFT points with coordinate frames, where the subscripts i1 , i2 , and i3 are the selected indices (a) (b) (c) from 1, 2, ..., m. First, the equation of plane Π can be written Fig. 3. The target object and the region for generating 2D KLT tracking as points. The tracking points are only generated within the convex hull of a ax + by + cz + d = 0 (1) set of the matched SIFT points. In mono mode, for calculating the 3D point of each tracking point the algorithm uses three nearest neighboring matched Since our goal is determining the 3D point TM correspond- i SIFT points. (a) Target object (b) Matched SIFT points and the convex hull ing to tC , we start from estimating the plane equation Eq. i of a set of them on the surface of the target object (c) 2D KLT tracking points (bold points) within the convex hull 1. The normal vector n of the plane can be determined as n = (PM − PM ) × (PM − PM ) i2 i1 i3 i1 (2) To track a moving 3D object by using KLT tracker, we have to generate tracking points on the area of the where n = (a, b, c)T . The remained coefficient d of the plane target object. We employ the matched SIFT points used in equation is then expressed as the initial pose estimation for allocating tracking points on d = −n · PM i1 (3) the region of the target object. As shown in Fig. 3 (b), To scrutinize the relationship between the 2D point tC and i our algorithm generates the tracking points only inside the the 3D point TM of the KLT tracking point, let’s consider i convex hull of a set of the matched SIFT points with the in perspective projection. Note that TM is in the model i criteria proposed by Jianbo Shi and Carlo Tomasi [8]. In coordinate frame, so we have to transform TM into TC i i addition to these criteria, in stereo mode we regard the according to the initial pose ΦC . Using the homogeneous 0 tracking points that have 3D points by the stereo matching representation of those points, a linear projection equation as good features to 3D track. can be written as   Pi3M       x u wu fx 0 cx 0   ZM XM v  ∼ wv  =  0 fy cy 0 y  = (4) {M} z  Pi2M 1 w 0 0 1 0 YM 1 TiM where fx and fy represent the focal length (in pixels) in the x S ZC pi3C and y direction respectively, TC = (x, y, z)T , tC = (u, v)T , i i Pi1M pi2C and cx and cy are the principal point. Since our current goal {C} tiC pi1C is determining the 3D point TC corresponding to tC , Eq. 4 i i can be expressed as (x, y, z) with respect to (u, v) XC wu fx x + cx z x u= = = fx + cx w z z (5) YC wv fy y + cy z y v= = = fy + cy w z z Fig. 4. Plane Approximation. A 3D point TM corresponding to tC on i i x 1 the image plane S is determined from the 3D plane Π that three 3D points = (u − cx ) M , PM , PM of the nearest neighboring matched SIFT points form. Pi1 z fx i2 i3 y 1 (6) = (v − cy ) When the algorithm generates tracking points, it also deter- z fy mines 3D reference points and saves them for the subsequent If Eq. 1 is divided by z then the equation becomes pose estimation. In stereo mode we directly calculate the 3D x y d points by the stereo matching in which Sum of Absolute a +b +c+ =0 (7) z z z 3986
  • 5. (a) (b) Fig. 5. Pose tracking results of the experiment. The results of pose and KLT tracking points are represented on the sequence of input images. Pose is depicted as a green rectangular parallelepiped. (a) Tracking results of mono mode (b) Tracking results of stereo mode (From left to right) 2nd frame, 69th frame, 136th frame, 203rd frame, and 270th frame. Therefore, by applying Eq. 6 to Eq. 7 and developing, we as mono mode or stereo mode. Since we needed ground truth, can get real trajectories of 3D pose, to evaluate tracking results, we d used an ARToolkit marker3 . After estimating the pose of z= a b fx (cx − u) + fy (cy − v) − c the marker we transformed the pose data from the marker z coordinate system to the object coordinate system. x= (u − cx ) (8) fx z A. Tracking Results y = (v − cy ) fy Fig. 5 shows pose tracking results of the object, and the trajectories are plotted with ground truth in Fig. 6. Note that With Eq. 8, we can determine TC and reversely trans- 1:n the positions with respect to x and y are well tracked in both form TC into TM according to ΦC . Therefore, we can 1:n 1:n 0 modes, while the position of z has some errors with respect to get the approximate 3D points TM corresponding to the 1:n the ground truth. In rotation, mono mode shows bigger errors 2D KLT tracking points tC . 1:n than stereo mode especially in pitch and yaw angles. Table. I describes RMS errors for all frames of input images. In mode E. Outlier Handling mode the maximum positional errors are within about 3 cm, The KLT tracking points have a tendency to drift due to and the maximum rotational errors are inside approximately abrupt illumination change, occlusion, and aperture problem. 16 degrees. In stereo mode the maximum positional errors We regard drifting points as outliers, and try to eliminate are within about 2 cm, and the maximum rotational errors the outliers because they influence the accuracy of resulting are inside around 6 degrees. poses. We adopt RANSAC algorithm [11] to remove the outliers during tracking. If the number of remained tracking TABLE I points is fewer than a certain number, τmono and τstereo RMS ERRORS OVER THE WHOLE SEQUENCE OF IMAGE . in mono mode and stereo mode respectively, the algorithm goes back to the initial pose estimation and restarts the SIFT Mono mode Stereo mode feature matching. X (cm) 1.04 0.88 Y (cm) 1.74 1.39 IV. E XPERIMENT Y (cm) 3.10 2.02 In this section, we present an experimental result that Roll (deg) 5.50 4.78 Pitch (deg) 15.83 5.23 shows abilities of our method to track a 3D object. As Yaw (deg) 16.22 5.87 shown in Fig. 3 (a), the target object was a white mug that has partial texture on the side surface, and the prior knowledge of it was prepared in advance. The system has been implemented in C/C++ using OpenCV and OpenGL B. Real-time Performance libraries on a modern PC (Pentium IV 2.4GHz) and a In order to show our method is capable of being a real-time Bumblebee R stereo camera of the Point Grey Research Inc solution, we present the computation times of the experiment. with the TriclopsTM Stereo SDK2 . We used the stereo camera Fig. 7 demonstrates the computation times of the proposed 2 http://www.ptgrey.com/products/stereo.asp 3 http://www.hitl.washington.edu/artoolkit/ 3987
  • 6. Translation (x) Translation (y) Translation (z) 50 50 100 Ground Truth Ground Truth Ground Truth 40 Mono Mode 40 Mono Mode 90 Mono Mode Stereo Mode Stereo Mode Stereo Mode X object position (cm) Y object position (cm) Z object position (cm) 30 30 80 20 20 70 10 10 60 0 0 50 −10 −10 40 −20 −20 30 −30 −30 20 −40 −40 10 −50 −50 0 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 Frame Number Frame Number Frame Number (a) (b) (c) Rotation (roll) Rotation (pitch) Rotation (yaw) Ground Truth Ground Truth Ground Truth 150 Mono Mode 150 Mono Mode 150 Mono Mode Stereo Mode Stereo Mode Stereo Mode Pitch Angle (degree) Roll Angle (degree) Yaw Angle (degree) 100 100 100 50 50 50 0 0 0 −50 −50 −50 −100 −100 −100 −150 −150 −150 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 Frame Number Frame Number Frame Number (d) (e) (f) Fig. 6. Estimated trajectories of the object against ground truth. (a) X position (b) Y position (c) Z position (d) Roll angle (e) Pitch angle (f) Yaw angle Computation Times 350 Real−time the initial pose estimation in mono mode and the initial Result (Mono) 300 Result (Stereo) pose estimation with the stereo matching in stereo mode are Computation Times (ms) 250 bottlenecks in our system. Since the current stereo matching calculates all 3D points on each input frame, it takes amount 200 of computation. We expect a significant improvement of 150 computation time in the stereo matching if it calculates 3D 100 points only on the 2D KLT tracking points. 50 V. C ONCLUSION AND F UTURE W ORK 0 0 50 100 150 200 250 300 Frame Number In this paper, we proposed a method for tracking 3D roto-translation of rigid objects using scale invariant feature Fig. 7. Computation times of the proposed algorithm on the experiment. based matching and Kande-Lucas-Tomasi (KLT) tracker. The method has two mode, mono mode using a mono camera and stereo mode using a stereo camera. Mono mode guarantees algorithm. The experiment takes below average 1 ms per higher frame rate performance, and stereo mode shows better frame in mono mode and average 16 ms per frame in stereo pose results. In the initial pose estimation, our system used mode on a modern PC. The peak at the initial frame is caused prior knowledge of target objects for SIFT feature matching. by SIFT feature extraction and matching. In addition, to be used in the local pose estimation, 3D TABLE II reference points are calculated by the plane approximation C OMPUTATION TIMES OF OUR ALGORITHM ON A MODERN PC, P ENTIUM in mono mode and the stereo matching in stereo mode. IV 2.4GH Z , ON 640 × 480 IMAGES . Future work will be mainly focused on decreasing the computational burden of the initial pose estimation. Our approach using KLT tracker has to do SIFT feature extraction Mono mode Stereo mode and matching, which needs amount of computation, again if Image(s) acquisition 35 ms 70 ms the KLT tracking points are disappeared or drifted. This can Stereo matching - 210 ms be further improved by unifying the contour based tracking Initial pose estimation (SIFT) 300 ms 300 ms Local pose estimation (KLT) 1 ms 16 ms in order to generate KLT tracking points again inside the tracking contour without SIFT feature matching. Moreover, recent GPU-based implementation of KLT tracker and SIFT, Table. II describes the computational cost of each part GPU KLT and SiftGPU, respectively, will enhance the frame of the algorithm. According to the table, we know that rate of our method significantly. 3988
  • 7. VI. ACKNOWLEDGMENTS The authors would like to thank KyeongKeun Baek for his kind help to make 3D mug models as well as Daesik Kim for his useful comments to devise the algorithm presented in this paper. R EFERENCES [1] D. Lowe. (1999) Object recognition from local scale invariant features. In Proc. 7th International Conf. Computer Vision (ICCV99), pages 1150–1157, Kerkyra, Greece [2] D. G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision, vol. 60, no. 2, pp. 91-110, 2004. [3] G. Panin, A. Knoll, Fully Automatic Real-Time 3D Object Tracking using Active Contour and Appearance Models, Journal of Multimedia 2006, Academic Publisher, vol. 1 n. 7, pp. 62-70 [4] Vacchetti, L.; Lepetit, V.; Fua, P., Stable real-time 3D tracking using online and offline information, Transactions on Pattern Analysis and Machine Intelligence , vol.26, no.10, pp. 1385-1391, Oct. 2004 [5] Papanikolopoulos, N.P.; Khosla, P.K.; Kanade, T., Visual tracking of a moving target by a camera mounted on a robot: a combination of control and vision, Robotics and Automation, IEEE Transactions on , vol.9, no.1, pp.14-35, Feb 1993 [6] B. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, in International Joint Conference on Artificial Intelligence, pp. 674679, 1981. [7] C. Tomasi and T. Kanade, Detection and Tracking of Point Features, Technical Report CMU-CS-91-132, Carnegie Mellon University, April 1991. [8] Jianbo Shi; Tomasi, C., Good features to track, Computer Vision and Pattern Recognition, 1994. Proceedings CVPR ’94., 1994 IEEE Computer Society Conference on , vol., no., pp.593-600, 21-23 Jun 1994. [9] J.-Y. Bouguet, Pyramidal Implementation of the Lucas Kanade Feature Tracker: Description of the algorithm, Technical Report, Intel Corpo- ration, Microprocessor Research Labs 2000, OpenCV documentation. [10] Intel, Open Source Computer Vision Library, http://www.intel.com/research/mrl/research/opencv/. [11] M. Fischler and R. Bolles, Random sampling consensus: a paradigm for model fitting with application to image analysis and automated cartography, Commun. Assoc. Comp. Mach., vol. 24, pp. 381395, 1981. [12] Sudipta N Sinha, Jan-Michael Frahm, Marc Pollefeys and Yakup Genc, GPU-Based Video Feature Tracking and Matching, EDGE 2006, workshop on Edge Computing Using New Commodity Architectures, Chapel Hill, May 2006. [13] D. DeMenthon and L.S. Davis, Model-Based Object Pose in 25 Lines of Code, International Journal of Computer Vision, 15, pp. 123-141, June 1995. [14] Horn, Berthold K. P., Closed-Form Solution of Absolute Orientation Using Unit Quaternions, Journal of the Optical Society of America A 4, No. 4, April 1987, pp. 629-642. 3989