Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

SLAM, Visual Odometry, Structure from Motion, Multiple View Stereo

38.529 Aufrufe

Veröffentlicht am

structure from motion, multiple view stereo, visual hull, PMVS, free viewpoint, visual SLAM, relocalization, stereo, depth fusion, mobilefusion, kinectfusion, dynamicfusion, 360 spherical view, cylindrical view, cube map, little planet.

Veröffentlicht in: Ingenieurwesen
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • ★★ How Long Does She Want You to Last? ★★ A recent study proved that the average man lasts just 2-5 minutes in bed (during intercourse). The study also showed that many women need at least 7-10 minutes of intercourse to reach "The Big O" - and, worse still... 30% of women never get there during intercourse. Clearly, most men are NOT fulfilling there women's needs in bed. Now, as I've said many times - how long you can last is no guarantee of being a GREAT LOVER. But, not being able to last 20, 30 minutes or more, is definitely a sign that you're not going to "set your woman's world on fire" between the sheets. Question is: "What can you do to last longer?" Well, one of the best recommendations I can give you today is to read THIS report. In it, you'll discover a detailed guide to an Ancient Taoist Thrusting Technique that can help any man to last much longer in bed. I can vouch 100% for the technique because my husband has been using it for years :) Here's the link to the report ♣♣♣ https://tinyurl.com/rockhardxx
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Njce! Thanks for sharing.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

SLAM, Visual Odometry, Structure from Motion, Multiple View Stereo

  1. 1. SLAM, Visual Odometry, Structure from Motion and Multiple View Stereo Yu Huang yu.huang07@gmail.com Sunnyvale, California
  2. 2. Outline • Calibration • Camera pose estimation • Triangulation • Bundle adjustment • Filtering-based SLAM • Key frame-based SLAM • PTAM/DTAM • SLD-SLAM/ORB-SLAM • Direct Sparse Odometry • MonFusion/MobileFusion • Image-based Relocalization • MVS • Planar/Polar rectification • Stereo matching • Plane sweep stereo • Multiple baseline stereo • Volumetric Stereo/Voxel Coloring • Space carving • (Carved) Visual Hulls • Region growing/depth fusion • Patch-based MVS • Stereo initialization, SfM and MVS • SDF for surface representation • Marching cubes • Poisson surface reconstruction • Texture mapping • Appendix: Immersive Panoramic View
  3. 3. Calibration • Find the intrinsic and extrinsic parameters of a camera • Extrinsic parameters: the camera’s location and orientation in the world. • Intrinsic parameters: the relationships between pixel coordinates and camera coordinates. • Work of Roger Tsai and of Zhengyou Zhang are influential: 3-D node setting and 2-d plane • Basic idea: • Given a set of world points Pi and their image coordinates (ui,vi) • find the projection matrix M • And then find intrinsic and extrinsic parameters. • Calibration Techniques • Calibration using 3D calibration object • Calibration using 2D planer pattern • Calibration using 1D object (line-based calibration) • Self Calibration: no calibration objects • Vanishing points from for orthogonal direction
  4. 4. Calibration • Calibration using 3D calibration object: • Calibration is performed by observing a calibration object whose geometry in 3D space is known with very good precision. • Calibration object usually consists of two or three planes orthogonal to each other, e.g. calibration cube • Calibration can also be done with a plane undergoing a precisely known translation (Tsai approach) • (+) most accurate calibration, simple theory • (-) more expensive, more elaborate setup • 2D plane-based calibration (Zhang approach) • Require observation of a planar pattern at few different orientations • No need to know the plane motion • Set up is easy, most popular approach • Seems to be a good compromise. • 1D line-based calibration: • Relatively new technique. • Calibration object is a set of collinear points, e.g., two points with known distance, three collinear points with known distances, four or more… • Camera can be calibrated by observing a moving line around a fixed point, e.g. a string of balls hanging from the ceiling! • Can be used to calibrate multiple cameras at once. Good for network of cameras.
  5. 5. Fundamental Matrix • Let p be a point in left image, p’ in right image • Epipolar relation • p maps to epipolar line l’ • p’ maps to epipolar line l • Epipolar mapping described by a 3x3 matrix F l’l p p’ This matrix F is called • the “Essential Matrix” – when image intrinsic parameters are known • the “Fundamental Matrix” – more generally (uncalibrated case) Can solve for F from point correspondences • Each (p, p’) pair gives one linear equation in entries of F • 8/5 points give enough to solve for F/E (8/5-point algo)
  6. 6. Camera Pose Estimation • 2D-to-2D matching; • Fundamental matrix estimation: 8-points, 7-points; • Outlier removal: RANSAC (other methods like M-estimation, LMedS, Hough transform); • Essential matrix E if camera intrinsic matrix is known: 5-points. • Extraction of R, t from E: a scaling factor for translation t. 1st Camera matrix 4 possible choices for 2nd camera matrix Solution direction from camera center to point Angle btw the view direction and the
  7. 7. Fundamental Matrix Estimation • Construct A (nx9) from correspondences • Compute SVD of A: A = USVT • Unit-norm solution of Af=0 given by vn (the right-most singular vector of A) • Reshape f into F1 • Compute SVD of F1: F1=UFSFVF T • Set SF(3,3)=0 to get SF’ (The enforces rank(F)=2) • Reconstruct F=UFSF’VF T • Now xTF is epipolar line in image 2, and Fx’ is epipolar line in image 1
  8. 8. RANSAC: RANdom SAmple Consensus Goal: Robustly fit a model to a data set which contains outliers Algorithm: 1. Randomly select a (minimal) subset of data points and instantiate the model; 2. Using this model, classify the all data points as inliers or outliers 3. Repeat 1&2 for iterations 4. Select the largest inlier set, and re-estimate the model from all points in it.
  9. 9. Triangulation: A Linear Solution • Generally, rays Cx and C’x’ will not exactly intersect • Can solve via SVD, finding a least squares solution to a system of equations X x x'   0 PXx   0 XPx 0AX                   TT TT TT TT v u v u 23 13 23 13 pp pp pp pp A Given P, P’ for cameras, x, x’ for matched points • Precondition points and projection matrices • Create matrix A • [U, S, V] = svd(A) • X = V(:, end) Pros and Cons • Works for any number of corresponding images • Not projectively invariant            1 v u x              1 v u x
  10. 10. Triangulation: Non-Linear Solution • Minimize projected error while satisfying xTFx = 0 • Solution is a 6-degree polynomial of t, minimizing
  11. 11. PnP (Perspective n Points): 3D-to-2D Matching • Motion estimation from 3-D-to-2-D correspondences is more accurate than from 3-D- to-3-D correspondences because it minimizes the image reprojection error, instead of the 3-D-to-3D feature position error [Nist’04]. • Efficient PnP: an non-iterative method. First, 3D points expressed as a weighted sum of 4 control points, then 1. The control points coordinates are the (12) unknowns. 2. Project 3D points onto imageBuild a linear system in control points coordinates. 3. Express control points coordinates as linear combination of the null eigenvectors. 4. The weights ( the i ) are the new unknowns (not more than 4). 5. Add rigidity constraints to get quadratic equations in the i . 6. Solve for the i depending on their number (linearization or re-linearization).
  12. 12. PnP (Perspective n Points): 3D-to-2D Matching
  13. 13. Bundle Adjustment or Pose Graph Optimization: Refinement of Both Structure and Camera Pose • Pose graph optimization: optimize the camera pose; • BA: Optimize both 6DOF camera poses and 3D (feature) points; • Nonlinear; • Sparse; • Optimization. • Algorithms: • Schur Complement; • Levenberg-Marquardt iteration;   2 1 1 ,),(    m i n j jiijDE XPxXP
  14. 14. Filtering-based SLAM • SLAM: A process by which a mobile robot can build a map of an environment and at the same time use this map to compute its own location; • Different scenarios: indoor, outdoor, underwater, and airborne systems; • A State Space Method (SSM): both observation and motion (state transition) models • Observation model describes prob. of making an observation (landmarks) when the robot location/orientation and landmark locations (map) are known; • Motion model is a prob. distrib. on state (location and orientation of the robot) transitions with applied control, i.e. a Markov process; • By recursive Bayes estimation, update robot state and environment maps simultaneously; • the map building problem is solved by fusing observations from different locations; • the localization problem is to compute the robot location by estimating its state; • Convergence: correlation btw landmark estimates increase monotonically and more observations made; • Typical methods: Kalman filter/Particle filter; • Robot relative location accuracy = localization accuracy with a given map.
  15. 15. Filtering-based SLAM • EKF-based Filter initialization Map management ( Generate & delete features ) Prediction Measurements acquisition Data association Update State 1 2 ( 1) ( 1)              v t t x y x y Prediction ( 1) ( 1) ( 1) ( ( 1) ) ˆ ( ) ( 1) ( 1)                   W W WR R v W R t t t t t t t t t r v q q ω x v ω Dynamic System Model (Constant Velocity Model) ( ) ( ) ( ) ( ) ( )             W WR v W R t t t t t r q x v ω : 3D position vector : orientation quaternion : linear velocity vector : angular velocity vector : landmark position vectoriy Update 1 1( ) ( ) ˆ( ) ( ) ( ) ( ) ( )           n n t t t t t t t z h x x K z h For Patch matching or tracking at Get measurement at new pos Prediction of measurements Find measurements ˆ( ) ( ( ))         i i v u t h t v h x ( ) 1              i ku kv t k y PK ˆ ( )W tr ˆ ( )WR tq  T 1 ( , ) |     iu vu u S u ( )i th u ( ) i th u ( ) ( )  i it tz h u iS : a covariance matrix for the 2D position of i th landmark
  16. 16. Comparison of SFM and Filtering-based Method SfM-based Filtering-based Initialization 8-point algorithm 5-point algorithm Delayed: SfM Undelayed: Inverse depth parameterization measurement Feature tracking Feature matching Feature tracking Feature matching Estimation Camera pose + Triangulation LBA, GBA Kalman filtering Particle filter Performance Tracking 3-400 points 100 points in real-time
  17. 17. Key frame-based SLAM • Bootstrapping • Compute an initial 3D map • Mostly based on concepts from two-view geometry • Normal mode • Assumes a 3D map is available and incremental camera motion • Track points and use PnP for camera pose estimation • Recovery mode • Assumes a 3D map is available, but tracking failed: no incremental motion • Relocalize camera pose w.r.t. previously reconstructed map • Key-Frame BA: keep subset of frames as keyframes • State: 3D landmarks + camera poses for all key frames • BA can refine state in a thread separate from tracking component
  18. 18. Real Time Urban 3D Reconstruction from Video • Automatic georegistered real time reconstruction of urban scenes; • 2D tracking: salient features, KLT; • 3D tracking: Kalman filter for 2D and GPS/INS, then pose using SfM; • Sparse scene analysis: 3 orthogonal sweeping directions (one for ground, two for facades); • Stereo depth estimation: plane sweep stereo; • Depth map fusion: handle conflicts and errors by visibility, stability and confidence-based; • Model generation: triangular mesh and texture mapped.
  19. 19. MonoFusion: Real-time 3D Reconstruction of Small Scene • Dense 3D reconstruction by single camera; • Sparse feature tracking for 6DoF camera pose estimation; • Key frame based BA; • Dense stereo matching btw two key frames; • Patchmatch stereo. • Depth fusion by SDF-based volumetric integration.
  20. 20. PTAM (Parallel Tracking and Mapping) Use large number (thousands) of points in the map; Don’t update the map every frame: Keyframes; Split the tracking and mapping into two threads; Make pyramid levels for tracking and mapping; • Detect Fast corners; • Use motion model to update camera pose; Key frames/new map points are added conditionally; • At least sufficient baseline and good epipolar search; Map optimization by batch method: Bundle Adjustment. Stereo initialization densely: two clicks for key frames • Use five-point-pose algorithm; Map maintained with data association and outlier retest. Time One frame Find features Draw graphics Update camera pose only Simple & easy Thread #2 Mapping Thread #1 Tracking Update map
  21. 21. PTAM (Parallel Tracking and Mapping) Tracking thread: • Responsible estimation of camera pose and rendering augmented graphics • Must run at 30 Hz • Make as robust and accurate as possible Mapping thread: • Responsible for providing the map • Can take lots of time per key frame • Make as rich and accurate as possible Stereo Initialization Wait for new key frame Add new map points Optimize map Map maintenance Pre-process frame Project points Measure points Update Camera Pose Project points Measure points Update Camera Pose Draw Graphics Coarse stage Fine stage
  22. 22. PTAM (Parallel Tracking and Mapping) • The initial triangulation has an arbitrary scale; • Assume the camera translated 10cm for the initial stereo pair; • Add key frames: • Tracking quality is good; • The number since the last key frame was added is > 20 frames; • The camera is a min distance away from the nearest keypoint in the map; • Stereo for triangulation • Select the closest keyframe in the map for the new keyframe; • Homography for planar scene: (R,t) and plane normal n: H=R+tnt/d; • Mapping: • Key frame ranked by a voting system
  23. 23. DTAM (Dense Tracking and Mapping) Inputs: • 3D texture model of the scene • Pose at previous frame Tracking as a registration problem • First rotation estimation: previous is aligned on the current to estimate a coarse inter-frame rotation; • Estimated pose is used to project the 3D model into 2.5D image; • The 2.5D image is registered with the current frame to find the current camera pose; • Two template matching problems for minimize SSD: direct and inverse compositional. Principle: • S depth hypothesis are considered for each pixel of the reference image Ir • Each corresponding 3D point is projected onto a bundle of images Im • Keep the depth hypothesis that best respects the color consistency from the reference to the bundle of images Formulation: • : pixel position and depth hypothesis • : number of valid reprojection of the pixel in the bundle • : photometric error between reference and current image
  24. 24. DTAM (Dense Tracking and Mapping) Formulation of a variational approach: • First term as regularizer with Huber norm, second term as photometric error; • Problem: optimizing this equation directly requires linearising of cost volume, expensive and cost volume has many local minima. Approximation: • Introduce as an auxiliary variable, can optimized with heuristic search • Second terms brings original and auxiliary variable together Problem (Depth Estimation): Uniform regions in reference image do not give discriminative enough photometric error; Solution (Primal Dual): Assume that depth is smooth on uniform regions, use total variation approach where depth map is the functional to optimize, where photometric error defines the data term and the smoothness constraint defines the regularization. Problem in SSD minimization: Align template T(x) with input I(x). Formulation: Find transform that best maps the pixels of the templates into the ones of the current image, minimize: Reformulation of regularization with primal dual method • Dual variable p is introduced to compute the TV norm:
  25. 25. DTAM (Dense Tracking and Mapping) • A dense model consists of one or more key frames; • Initialized using a point feature-based stereo like PTAM; • Synthesize realistic novel view over wide baselines by projecting the entire model into a virtual camera; • Key frame is added when the #pixels in the previous predicted image without visible surface is less than threshold; • Reconstruction from a large number of frames taken from very close viewpoints.
  26. 26. REMODE: Regularized Monocular Depth Estimation • Combine Bayesian estimation and convex optimization for processing; • Probabilistic depth map handles the uncertainty and allows sensor fusion. • Depth estimated by triangulating a reference view and the last view; • Smoothness of depth map enforced by a regularization term (TV). • Camera pose estimation: Semi-direct monocular visual Odometry (SVO); • Particle/Kalman filter or LS optimization (such as key frames-based or batch Bundle Adjustment) solution; • SVO: Feature correspondence from direct motion estimation with sparse depth map estimate (motion estimation and mapping together), still a sparse model-based alignment method, related to model-based dense method; • Initial guess by direct method and then followed by feature-based refinement with reprojection error; • A prob. depth filter initialized by sparse features, then posterior depth estimate by Bayesian rule and update sequentially. • Scene reconstruction: update after initial estimation with two frames. • Region growing (low textured, sparse data sets) or occlusion-robust photo consistency (dense stereo, 3-d segment.); • Inverse depth: A 6-d vector for parallax, a relation btw disparity and depth. • The ray from the 1st camera position, the current camera orientation and the inverse depth; • Switch from inverse depth to XYZ (3-d point) for high parallax features. • Extend SVO with dense map estimate within Bayesian framework.
  27. 27. REMODE: Regularized Monocular Depth Estimation • Key frames are used as reference for feature alignment and structural refinement. • A key frame is selected if the Euclidean distance of the new frame relative to all key frames is > 12% of average scene depth; • The key frame farthest apart from the current position camera position is removed when a new key frame is added; • The new depth filters are initialized for each image feature for which the corresponding 3d point is to be estimated, with a large uncertainty in depth, when the new key frame is selected; • When the depth uncertainty becomes small enough, a new 3d point is inserted in the map; • The system is bootrapped to get the pose of the first two key frames and the initial map.
  28. 28. SVO: Semi-direct Monocular Visual Odometry
  29. 29. Live Metric 3D Reconstruction with A Mobile Camera • Absolute scale on site for user interactive feedback; • Inertial sensors for scale estimation; • Feature based tracking and mapping + sensor (gyro, accelero); • Two view initialization; • BA for drifting correction; • Automatically select keyframes; • Dense 3D modeling by mobile dev.; • Image mask estimation; • Dense stereo matching; • Depth map filtering.
  30. 30. 3D Modeling on the Go: Interactive 3D Reconstruction of Large-Scale Scenes on Mobile Devices • Reconstruct scenes “on the go” by simply walking around them; • Use monochrome fisheye images (not active depth cameras); • Compute depth maps using plane sweep stereo (GPUs); • Fuse the depth maps into a global model of the environment represented as a truncated signed distance function in a spatially hashed voxel grid; • Importance of filtering outliers in the depth maps;
  31. 31. Semi-Dense SLAM • Continuously estimate a semi-dense inverse depth map for the current frame, which in turn is used to track the motion of the camera using dense image alignment. • Estimate the depth of all pixels which have a non-negligible image gradient. • Use the oldest frame, where the disparity range and the view angle within a certain threshold; • If a disparity search is unsuccessful, the pixel’s “age” is increased, such that subsequent disparity searches use newer frames where the pixel is likely to be still visible. • Each estimate is represented as a Gaussian probability distribution over the inverse depth; • Minimize photometric and geometric disparity errors with respect to pixel-to-inverse depth ratio. • Propagate this information over time, update with new measurements as new images arrive. • Based on the inverse depth estimate for a pixel, the corresponding 3D point is calculated and projected into the new frame, providing an inverse depth estimate in the new frame; • The hypothesis is assigned to the closest integer pixel position – to eliminate discretization errors, the sub-pixel image location of the projected point is kept, and re-used for the next propagation step; • Assign inverse depth value the average of the surrounding inverse depths, weighted by respective inverse variance; • Keep track of validity of each inverse depth hypothesis to handle outliers (removed if needed).
  32. 32. Semi-Dense SLAM
  33. 33. Semi-Dense SLAM • The reference frame is chosen such that it maximizes the stereo accuracy, while keeping the disparity search range as well as the observation angle small; • Key point-based method to get relative camera pose btw initial two frames, then used to initialize the inverse depth map; • A probabilistic method to generate depth update, which take advantage of the fact that small baseline frames are available before the large baseline frames; • Based on inverse depth, 3d point is calculated and projected to the new frame, providing the new inverse depth estimate.
  34. 34. Large Scale Direct SLAM (LSD-SLAM) • Goal: to build large-scale, consistent maps of the environment; • Initialization: a first keyframe with a random depth map and large variance, given sufficient translational camera movement in the first seconds, converge to a correct depth configuration after a couple of keyframe propagations; • Pose estimation based on direct scale-aware image alignment on similarity transform; • Keyframes are replaced when the camera moves too far away from the existing map; • Once a frame is chosen as keyframe, its depth map is initialized by projecting points from the previous keyframe into it, followed by regularization and outlier removal; • The depth map is scaled to have a mean inverse depth of one; then refined by filtering over a large number of pixelwise small-baseline stereo; • 3D reconstruction in real-time as pose-graph of keyframes with semi-dense (gradient) depth; • Probabilistically consistent incorporation of uncertainty of the estimated depth; • Loop Closure is detected by appearance-based mapping (BovW with co-occurrence statistics) .
  35. 35. Large Scale Direct SLAM (LSD-SLAM)
  36. 36. Large Scale Direct SLAM (LSD-SLAM) • Similarity transform for camera pose (scale, rotation and translation); • Global map is a pose graph with key frames as vertices and similarity transform as edges; • If the camera moves too far, a new key frame is initialized by projecting from existing close-by key frames; • The 1st key frame with a random depth map and large variance is initialized; • After a sufficient translation moving, converge to a correct depth configuration; • A new key frame is created from the most recent tracked image if camera moves too far away from the existing map; • Inverse depth is scaled to one: mean inverse depth.
  37. 37. Stereo Large Scale Direct SLAM (S-LSD-SLAM) • Aligns images based on photo consistency of all high contrast pixels, i.e. corners, edges and high texture areas; • Concurrently estimates the depth from two types of stereo cues: Static stereo through the fixed-baseline stereo camera setup as well as temporal multi-view stereo exploiting the camera motion; • Incorporating both disparity sources, estimate depth of pixels under-constrained using fixed-baseline stereo. • Using a fixed baseline, avoids scale-drift that typically occurs in pure monocular SLAM.
  38. 38. MobileFusion • Real-time volumetric surface reconstruction and dense 6DoF camera tracking running ; • Using the RGB camera, scan objects of varying shape, size, and appearance in seconds; • Point-based 3D models and a connected 3D surface model ; • Dense 6DoF tracking, key-frame selection, and dense per-frame stereo matching; • Depth maps are fused volumetrically using a method akin to KinectFusion.
  39. 39. MobileFusion • A scale factor is applied initially for the triangulation; • PnP for 3d-2d matching (initial 3d map is updated); • Keep a pool of about 40 key frames; • Key frame selection for stereo: • A histogram of ratio of all landmarks originally detected in each key frame to those that can be matched successfully in the new frame; • The farthest key frame with a score > threshold, is selected; • The new key frame is created when # landmarks visible in the current frame < threshold;
  40. 40. StereoScan: Dense 3d Reconstruction in Real-time • A sparse feature matcher in conjunction with an efficient and robust visual odometry algorithm; • The reconstruction pipeline combines both techniques with efficient stereo matching and a multi-view linking scheme for generating consistent 3d point clouds; • Dense 3d reconstructions from stereo sequences.
  41. 41. ORB-SLAM: Real-Time Monocular SLAM • Feature-based real time mono SLAM in small/large, indoor/outdoor environments; • Use of the same features for all tasks: tracking, mapping, relocalization and loop closing; • Robust to motion clutter, wide baseline loop closing, relocalization, auto initialization; • Real time loop closing based on optimization of pose graph called the Essential Graph; • Real time camera relocalization with significant invariance to viewpoint and illumination; • Automatic initialization based on model selection that permits to create an initial map of planar and non-planar scenes; • A survival of the fittest strategy selects the points and keyframes of the reconstruction.
  42. 42. ORB-SLAM: Real-Time Monocular SLAM
  43. 43. ORB-SLAM v.s. LSD-SLAM
  44. 44. ORB-SLAM: Adding Semi-Dense Mapping
  45. 45. ORB-SLAM: Adding Semi-Dense Mapping • Key frames can be added or discarded; • 3d-2d matching: PNP; • Relocalization for tracking failure; • key frame database query; • New key frame selection: • > 20 frames since the last relocalization; • Frame tracking: >50 points; • < 90% map points in the current frame tracking; • >10 frames passed since the last key frame insertion; • >20 frames since last local bundle adjustment. • Prune redundant key frames • >90% points visible in > 3 key frames
  46. 46. ORB-SLAM2: Stereo Cameras (RGB-D) • Three main parallel threads: • 1) Tracking to localize the camera with every frame by finding feature matches to local map and minimizing the reprojection error applying motion-only BA; • 2) Local Mapping to manage the local map and optimize it, performing local BA; • 3) Loop Closing to detect large loops and correct the accumulated drift by performing a pose-graph optimization; • This thread launches a 4th thread to perform full BA after the pose-graph optimization, to compute the optimal structure and motion solution.
  47. 47. Direct Sparse Odometry • Combination: Sparse+Indirect, Dense+Indirect, Dense+Direct, Sparse+Direct (proposed); • Continuous optimization of the photometric error over a window of recent frames, including geometry (reverse depth in a reference frame) and camera motion (pose); • Integrated A photometric camera model (calibrated): lens attenuation, gama correction, and known exposure times; • Run in real-time on CPU laptop.
  48. 48. Direct Sparse Odometry • Photometric error defined as weighted SSD over a small neighbor; • Windowed optimization by G-N algo.; • Marginalization of old variables using Schur complement; • Keep a window of up to 7 key frames; • Keep a fixe number 2000 active points equally distributed across space/frames; • Initialization: 2 frame image alignment; • Key frame creation: scene change, camera translation with occlusion and dis-occlusion, exposure time changes; • Key frame marginalization: keep latest two, remove a) <5% points visible, b) maximizing “distance score”. Affine brightness transfer function exposure time Inverse depth factor graph
  49. 49. Direct Sparse Odometry • Candidate selection: • Region-adaptive gradient threshold; • Candidate point tracking: • Discrete search along epipolar line; • Candidate point activation; • Maintain a uniform spatial distribution; • Outlier and occlusion detection; • Not distinct, bigger photometric error; Keyframemanagement
  50. 50. Monocular Visual-Inertial State Estimation • A tightly-coupled, optimization-based, monocular visual-inertial state estimation for robust camera localization in complex indoor and outdoor environments. • No any artificial markers, recover the metric scale using the monocular camera setup. • The whole system is capable of online initialization without relying on any assumptions about the environment. • A lightweight loop closure module that is tightly integrated with the state estimator to eliminate drift.
  51. 51. Monocular Visual-Inertial State Estimation Block diagram for the pipeline of the visual-inertial AR system
  52. 52. Monocular Visual-Inertial State Estimation Illustration of the sliding window formulation. All measurements, including pre- integrated IMU quantities, visual observations, and feature correspondences from loop closures, are fused in a tightly-coupled manner.
  53. 53. Monocular Visual-Inertial State Estimation • IMU pre-integration is a technique to summarize multiple IMU measurements into a single measurement “block”. • It avoids excessive evaluation of IMU measurements and saves significant amount of computation power. • It also enables the use of IMU measurements without knowing the initial velocity and global orientation. • It can consider on-manifold uncertainty propagation. • It can incorporate IMU biases and integrate them with a full SLAM framework. the local frame angular velocity angular acceleration
  54. 54. Monocular Visual-Inertial State Estimation Illustration of the sliding window formulation with loop fusion. xn - keyframes. At t0, query the loop closure database to search loop for the newest keyframe xn. If loop yi is found in the database, it will be put into the window and optimized together at t1. Then go for t2, and yi slides backward along with the window.
  55. 55. Monocular Visual-Inertial State Estimation Illustration of the visual-inertial alignment process. Both metric scale and gravity direction are aligned.
  56. 56. Monocular Visual-Inertial State Estimation
  57. 57. SLAM in VR/AR •Project Tango (Google) • Visual-inertial odometry • Area learning (SLAM) • RGB-D sensor •Hololens (Microsoft) • RGB-D sensor •Magic Leap • Unknown • Vuforia (Qualcomm) • Extended tracking (monocular SLAM) •Apple • Metaio •Oculus • SurrealVision
  58. 58. Visual Odometry Visual Odometry (VO): The process of estimating the ego-motion of an agent using only the input of a single or multiple camera; VO incrementally estimates pose of the agent through examination of the changes that motion induces on the images of its onboard cameras; VO: a special case of SfM (relative camera pose and 3-d structure); • VO focus on 3-d motion of the camera sequentially to recover the full trajectory; Visual SLAM vs. VO: • VO only for recovering path incrementally, pose after pose, and potentially optimizing only over the last n poses of the path (windowed bundle adjustment); • VO only with local consistency of trajectory, and SLAM with global consistency; • VSAM needs understanding when a loop closure occurs and efficiently integrating this new constraint into the current map; • VO can be used as a building block for a complete SLAM.
  59. 59. Real-time Quadrifocal Visual Odometry • Image-based approach to tracking 6 DOF trajectory of a stereo camera pair; • Estimates pose and subsequently dense pixel matching btw temporal image pairs by dense spatial matching btw images of a stereo pair; • Metric 3d structure constraints are imposed by consistently warping corresponding stereo images to generate novel viewpoints in acquisition; • An iterative non-linear trajectory estimation approach is formulated based on a quadrifocal relationship btw the image intensities within adjacent views of the stereo pair; • Reject outliers by M-estimator corresponding to moving objects within the scene or other outliers such as occlusions and illumination changes.
  60. 60. Real-time Quadrifocal Visual Odometry The quadrifocal geometry of a stereo pair at two subsequent time instants. Two points p and p′ are initialised only once at the beginning of the tracking process to be in correspondence. The central pose T is estimated via a non-linear warping function which warps all points in the reference stereo pair to the current image points p′′ and p′′′. The quadrifocal warping function is defined by choosing two lines l and l′ passing through corresponding points in the first image. The extrinsic parameters T′ are assumed known a-priori.
  61. 61. Image-Based Localization Pipeline • Key Frame and its Camera Pose Stored; • Frame id, image feature and camera pose. • Extract Local Features; • Feature original or coded. • Establish 2D-2D or 3D-to-2D Matches; • Single camera: 3d-to-2d; • Stereo or depth camera: 2d-to-2d; • Camera Pose Estimation. • Refine by 3-d point estimation; • 2-d features along with their depths.
  62. 62. Image-based 3d-to-2d Relocalization
  63. 63. Feature Matching-based Relocalization • Feature detection and descriptor for key frames or new frames which require relocalization: • Harris corner or FAST detection; • SURF (64-d vector), SIFT, BRIEF or ORB; • Feature matching for pose recovery: • Brute force or ANN search; • Pose difference is small enough. • Key frame selection: • Pose change enough big;
  64. 64. Randomized Tree-based ANN Search for Relocalization • The kd-tree data structure is based on a recursive subdivision of space into disjoint hyper-rectangular regions called cells; • ANN (approximate NN) search: Limit number of neighboring k-d tree bins to explore; • Randomised kd-tree forest: Split by picking from multiple (5) top dimensions with the highest variance in randomized trees, to increase the chances of finding nearby points; • Best Bin First in K-d tree: a variant of normal K-d tree • A branch-and-bound technique that maintains an estimate of the smallest distance from the query point to any of the data points down all of the open paths (priority queue). • Priority queue-based k-means tree: partition the data at each level into K distinct regions using k-means, then applying it also recursively to the points in each region; • The priority queue is sorted in increasing distance from the query point to the boundary of the branch being added to the queue.
  65. 65. Bag of Visual Words for Image-based Relocalization
  66. 66. BoVW-based Relocalization in FAB-MAP • Codebook building from training data: • Feature extraction: FAST corner detection + SURF/ORB feature descriptor; • Feature clustering for visual vocabulary: incremental K-means; • Construct co-occurrence statistics of visual words by Chow Liu tree. • A naïve Bayes approximation by a maximum weight spanning tree. • Visual pattern from key frames or new frames for relocalization: • Feature extraction: same as above; • Mapping descriptors to bags of words by ANN-based (kd-tree) methods; • Pattern matching for pose recovery: descriptor mapped to codebook; • Sampling in training data or mean field optimization to find the best match.
  67. 67. MVS (Multiple View Stereo) • Multiple view stereo (MVS): reconstruct a 3D object model from a collection of images taken from known camera viewpoints; • 1. Global methods: • A volumetric reconstruction to extract a surface; • Surface evolving iteratively, like space carving; • 2. Local methods: • Depth map merging with consistency constraints; • Sparse feature matching, region expansion and dense reconstruction for surface fitting.
  68. 68. Multiple View Stereo Properties Global MVS Local MVS
  69. 69. Image acquisition Camera pose 3d reconstruction MVS Pipeline Input sequence 2d features 2d track 3d points
  70. 70. Stratified Reconstruction • With no constraints on calibration matrix or the scene, projective reconstruction; • Need additional information to upgrade the reconstruction: Affine; • Then, similarity; • Finally Euclidean. o Preserves intersection and tangency o Preserves parallellism, volume ratios o Preserves angles, ratios of length o Preserves angles, lengths
  71. 71. Stratified Reconstruction Find fundamental matrix F Refine reconstruction Triangulate to get X Projective reconstruction Set projection matrix P, P’ Set of matched points x, x’ plane at infinity image absolute conic use prior knowledge (calibration) to improve results Metric reconstruction
  72. 72. Planar Rectification Bring two views to standard stereo setup (moves epipole to ) (not possible when in/close to image) ~ image size (calibrated) Distortion minimization (uncalibrated)
  73. 73. Polar re-parameterization around epipoles Requires only (oriented) epipolar geometry Preserve length of epipolar lines Choose  so that no pixels are compressed original image rectified image Polar Rectification Works for all relative motions Guarantees minimal image size
  74. 74. Stereo Matching • Feature-based (sparse) and correlation-based (dense); • Constraints • epipolar • ordering • uniqueness • disparity limit image I(x,y) image I´(x´,y´)Disparity map D(x,y) (x´,y´)=(x+D(x,y),y)
  75. 75. PatchMatch Stereo • Idea: First a random initialization of disparities and plane para.s for each pix. and update the estimates by propagating info. from the neighboring pix.s; • Spatial propagation: Check for each pix. the disparities and plane para.s for left and upper neighbors and replace the current estimates if matching costs are smaller; • View propagation: Warp the point in the other view and check the corresponding etimates in the other image. Replace if the matching costs are lower; • Temporal propagation: Propagate the information analogously by considering the etimates for the same pixel at the preceding and consecutive video frame; • Plane refinement: Disparity and plane para.s for each pix. refined by generat. random samples within an interval and updat. estimates if matching costs reduced; • Post-processing: Remove outliers with left/right consistency checking and weighted median filter; Gaps are filled by propagating information from the neighborhood.
  76. 76. PatchMatch Stereo
  77. 77. Plane Sweep Stereo • Choose a virtual view • Sweep family of planes at different depths w.r.t the virtual camera virtual camera composite input image input image each plane defines an image  composite homography
  78. 78. Multiple-baseline Stereo • For a reference image, and along the corresponding epipolar lines of all other images, using inverse depth relative to the first as the search parameter; • For larger baselines, search larger area; • Pros • Using multiple images reduces the ambiguity of matching • Cons • Must choose a reference view; • Occlusions become an issue for large baseline. • Possible solution: use a virtual view I1 I10 I2
  79. 79. Volumetric Stereo/Voxel Coloring • Use a voxel volume to get a view-independent representation; • Goal: Assign RGB values to voxels in V photo-consistent with images Input Images(Calibrated) Discretized Scene Volume
  80. 80. Space Carving • Initialize to a volume V containing the true scene; • Bounding box volume subdivided into voxels • Voxel color hypotheses collected from all visible images • Elimination of inconsistent hypotheses • Repeat until convergence (slow); • Choose a voxel on the current surface • Project to visible input images • Carve if not photo-consistent
  81. 81. Visual Hull • Shape-from-silhouette: • Silhouette defines generalized cone; • Pre-step: silhouette segmentation • Background subtraction • Tracking • Active illumination • Visual hull reconstruction • Voxel-based • Polyhedral • Image-based visual hull rendering Find intersection of silhouettes – Divide volume into voxels – Cut voxels outside any silhouette
  82. 82. Visual Hull - An Example
  83. 83. Visual Hull • Polygon-based visual hull • Silhouette contours approximated as 2D polygons • Precision adjustment: variable # polygon vertices • Contours define polyhedral 3D cones • Intersection of cones: Polygonal Visual Hull
  84. 84. Visual Hull • Image-based visual hull • From all silhouettes • Find tightest depth range for each pixel • From closest image • Determine pixel color from near end of depth range • Visibility check: • Project all pixels’ depth ranges into reference image • Built z-buffer in reference image plane • Desired pixel location on top. • Implicit depth. • Issues: • Background manually got. 1) the desired ray is projected onto a reference image; 2) the intervals where the ray crosses the silhouette are determined; 3) these intervals are lifted back onto the desired ray where they can be intersected with intervals from other reference images.
  85. 85. Carved Visual Hulls • Compute visual hull • Use dynamic programming to find rims and constrain them to be fixed • Carve the visual hull to optimize photo-consistency
  86. 86. 1. Fitting step A local surface patch is fitted, iterating visibility 2. Filter step Visibility is explicitly enforced 3. Expand step Successful patches are used to initialise active boundary Region Growing for MVS
  87. 87. Depth-Map Fusion for MVS • Compute depth hypotheses from each view using neighboring views • Compute completely un-regularised depth-maps independently for each viewpoint; • Merge depth-maps into single volume with use of redundancy • Compute one correlation curve per image, then find its local maximum • Utilize the pixel neighborhood in a 2D discrete MRF: peaks as hypothesis and “unknown depth” label against spatially inconsistent peaks; • Model sensor probabilistically as a Gaussian + Uniform mixture, evolution of estimates in sequential inference. • Extract 3d surface from this volume • Regularise at final stage; • Graph cut-based segmentation with photo consistency; • Surface extracted by marching cubes from binary map.
  88. 88. Comparison of Region Growing and Depth Map Fusion Region growing Depth-map fusion Starts from a cloud of 3d points, and grows small flat patches maximizing photo-consistency Fuses a set of depth-maps computed using occlusion-robust photo-consistency Provides best overall results due to a plane-based photo-consistency Elegant pipeline; Plug-n-play blocks; Easily parallelizable. Many tunable parameters, i.e., difficult to tune to get the optimal results Photo-consistency metric is simple, but not optimal; The metric suffers when images are not well textured or low resolution prossummarycons
  89. 89. Patch-based MVS • Feature extraction, matching, patch expansion/propagation, filtering with visibility consistency; • Works well for various objects and scenes • Surfaces must be Lambertian and well-textured • Problematic for architectural scenes • Limitation of MVS: make use of planarity and orthogonality, geometric priors for interpolation • Manhattan world stereo (piecewise axis-aligned planar scenes)?
  90. 90. Photo-Consistency in MVS • A photo-consistent scene is a scene that exactly reproduces input images from the same camera viewpoints; • Can’t use input cameras and images to tell the difference between a photo-consistent scene and the true scene. Photo-consistent 3-d point/patch 
  91. 91. Regularization for Textureless Regions in MVS • More information is needed, besides of photo-consistency. • User provides a priori knowledge. • Classical assumption: Objects are “smooth.” • Also know as regularizing the problem. • Optimization problem: • Find the “best” smooth consistent object. • Popular used methods: • Level set; • Snakes; • Graph cut; • Exploiting silhouettes.
  92. 92. Initial Stereo, Camera Pose by SfM and Depth by MVS SfM MVS Stereo
  93. 93. Fast Multi-frame Stereo Scene Flow with Motion Segmentation • A multi-frame method for computing scene flow (dense depth and optical flow) and camera ego-motion for a dynamic scene observed from a moving stereo camera rig. • Also it segments out moving objects from the rigid scene. • First estimate the disparity map and the 6-DOF camera motion using stereo matching and visual odometry. • Then identify regions inconsistent with the estimated camera motion and compute per-pixel optical flow only at these regions. • This flow proposal is fused with the camera motion-based flow proposal using fusion moves to obtain the final optical flow and motion segmentation. • This unified framework benefits all four tasks – stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency.
  94. 94. Fast Multi-frame Stereo Scene Flow with Motion Segmentation In the 1st three steps, estimate the disparity D and camera motion P using stereo matching and visual odometry techniques. Then detect moving object regions by using the rigid flow Frig computed from D and P. Optical flow is performed only for the detected regions, and the resulting non-rigid flow Fnon is fused with Frig to obtain final flow F and segmentation S.
  95. 95. Fast Multi-frame Stereo Scene Flow with Motion Segmentation Binocular and epipolar stereo. (a) Initial disparity map. (c) Uncertainity map. (b) Occlusion map. (d) Final disparity estimate by epipolar stereo. Initial segmentation. Detect moving object regions using clues from (a) image residuals weighted by (b) patch intensity variance and (c) prior flow. Also use (d) depth edge and (e) image edge information to obtain (f) initial segmentation.
  96. 96. Fast Multi-frame Stereo Scene Flow with Motion Segmentation Optical flow and flow fusion. Obtain non-rigid flow proposal by (a) performing SGM followed by (b) consistency filtering and (c) hole filing by weighted median filtering. This flow proposal is fused with (d) the rigid flow proposal to obtain (e) the final flow estimate and (f) motion segmentation. Results on KITTI test sequences 002, 006. Black pixels in error heat maps indicate missing ground truth.
  97. 97. Signed Distance Function for Surface Representation • Volumetric integration; • Curless &Levoy (1996)’s method for fusing depth maps into a global surface using the SDF representation; • Importantly the minimiser of L2 cost is average of the SDF measurements which can be computed recursively, enabling massive data fusion; • Samples also weighted to take into account uncertainty of each data.
  98. 98. Marching Cubes • Transform volume data sets to a surface model • Use polygon rendering methods • Use polygon manipulation methods • Deformation • Morphing • Approximate iso-surface with a triangle mesh gen. by comparing iso-value and voxel densities; • Look at each cell of volume 1-by-1, generate triangles inside cell by configuration of voxel values;
  99. 99. Marching Cubes • Imagine all voxels above the iso-value are 1 and others are 0; • Find triangular surface includes all 1s and excludes all 0s; • Each of the 8 cornels of one cell can be 0 or 1, totally we have 28 =256 cases; • Can be reduced to 15 cases use symmetry. • Exact positions of triangle vertices determined by linear interpolation of the isovalue; • Generally can be fast computed by a Lookup table • Have some ambiguous cases
  100. 100. Poisson Surface Reconstruction • Reconstruct surface of model by solving for indicator function of the shape; • There is a relationship btw the normal field and gradient of indicator function; • Integration: Represent points by a vector field , and find function whose gradient best approximates : • Applying divergence operator, we transform this into a Poisson problem:         Mp Mp pM if0 if1  M Indicator function Oriented points 0 0 0 0 0 0 V  V   V  min   VV   
  101. 101. Poisson Surface Reconstruction • Given the Points: • Set octree • Compute vector field • Define a function space • Splat the samples • Compute indicator function • Compute divergence • Solve Poisson equation • Extract iso-surface
  102. 102. Surface Rendering • Splatting (pixel-based) [Mark & Bishop 98] • Splat: the pixel shape projected to reconstruct the new view in the image space. • Footprint: reconstruction kernel. • A depth map represents a cloud of 3d points to be splatted. • Splatting broaden the individual 3d points. Triangulation (mesh-based) [Kanade et al. 97] • Pixel plus depth as vertex of a triangular mesh • Interpolation within meshes. Feature-based reconstruction Depth discontinuity (phantom/rubber sheet) • Breaking all edges with large difference • Removing those corresponding triangles. reference target reference target remove
  103. 103. Folding/Visibility Problem • An occlusion issue: overlapped from multiple projections. • An alternative: mapping the pixels in a back-to-front specific order [McMillan & Bishop 95]. • Unavailability for rendering with multiple reference images. • Z-buffering [Chen & Williams 93] • Winner-take-all; • Soft Z-buffering: top k values (k=3~4) are blended. • Multiple views: the views close to the new view get more emphasized than those views away from the new view (angular distance). rear front reference destination
  104. 104. Hole-Filling Problem • A disocclusion issue: holes by depth discontinuities at the boundary between FG and BG. • Popular methods: • Searching backwards along the epipolar lines [McMillan & Bishop 95] • Complicated for non-convex foreground objects ; • Inpainting or texture synthesis [Tauber et al 07]; • Risk of filling the holes with foreground textures -> depth-aware; • Layered depth image (LDI) [Shade, Gotler & Szeliski 98]: • How to generate a compact LDI is a segmentation task; • Low-pass filtering of the depth image: • Prone to geometric distortions[Fehn04]; • Layered representation: • Boundary matting [Zitnick et al. 04] • Alpha matting for boundary layer ; • Background layer [Criminisi et al. 07]: • Bi-modal disparity histogram-based segmentation. rear front referencedestination depth
  105. 105. Texture Mapping
  106. 106. Texture Mapping • Texture stitching • View-based texture mapping • View interpolation and warping
  107. 107. MVE: Multi-View Reconstruction Environment • A complete end-to-end pipeline for image-based geometry reconstruction: • Structure-from-Motion; • Multi-View Stereo; • Surface Reconstruction. • Open MVG library; • Open Bundler: SfM; • Dense depth map estimation: • Get from uncontrolled images; • MVS methods: • Region growing, depth merging.
  108. 108. Evaluation of MVS’ Performance • Evaluation @vision.middlebury.edu; • DTU Robot Image Data Sets;
  109. 109. Immersive Panorama View Appendix:
  110. 110. Generalized Camera • A general imaging model used to represent an imaging system. • All imaging systems perform a mapping from incoming scene rays to photo-sensitive elements on the image detector: conveniently using a set of virtual sensing elements called raxels. • Raxels include geometric, radiometric and optical properties. • A calibration method that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system. (a) catadioptric system, (b) dioptric wide-angle system, (c) imaging system made of a camera cluster, and (d) compound camera made of individual sensing elements, each including a receptor and a lens.
  111. 111. Generalized Camera (a) A raxel is a virtual replacement for a real photosensitive element. A raxel may have radiometric and optical parameters. (b)The notation for a raxel. A raxel is a virtual photo-sensitive element that measures the light energy of a compact bundle of rays which can be represented as a single principle incoming ray. An imaging system modeled as a set of raxels on a sphere surrounding the imaging system. Each raxel i has a position pi on the sphere, and an orientation qi; aligned with an incoming ray. Multiple raxels may be located at the same point (p1 = p2 = p3), but have different directions.
  112. 112. Generalized Camera (a) A non-perspective imaging system and a perspective catadioptric system consists of a perspective camera a parabolic mirror. The imaging system is mounted on the translating stage. The laptop displays 26 patterns. (b) A sample bit pattern as seen through the parabolic catadioptric system. (a) A perspective imaging system and a calibration system, consisting of a laptop and a translating stage. The axis of perspective camera is normal to the plane of the screen. (b) A sample bit pattern as seen through the perspective system.
  113. 113. Structure from Motion in Generalized Camera • Use a network of cameras as if they were a single imaging device, even when they do not share a common center of projection. A schematic of a multi-camera system for an autonomous vehicle, including a forward facing stereo pair, two cameras facing towards the rear, and a panoramic multi-camera cluster in the middle. The lines represent the rays of light sampled by these cameras.
  114. 114. Structure from Motion in Generalized Camera The generalized imaging model expresses how each pixel samples the light-field. This sampling is assumed to be centered around a ray starting at a point X,Y,Z, with a direction parameterized by (φ, θ), relative to a coordinate system attached to the camera. The simplified model captures only the direction of the ray, parameterized by its Pluecker vectors q, q’. Note: The Pluecker vectors of a line are a pair of 3- vectors: q, q’ , named the direction vector and moment vector. q is a vector of any length in the direction of the line.
  115. 115. Structure from Motion in Generalized Camera Generalized Epipolar Constraint btw two views’ Plueker vectors Generalized Point Reconstruction in the fiducial coordinate system Solve the above equation for α1 by Note: Parameters α1, α2, is the corollary of the depth in typical cameras: Generalized Optic Flow Equation Generalized Differential Epipolar Constraint
  116. 116. Multiple View Geometry in Generalized Camera • Most existing camera types, like pinhole cameras, sensors with radial or more general distortions, catadioptric cameras (central or non-central), etc. • A hierarchy of general camera models: • The most general model has unconstrained projection rays whereas the most constrained model dealt with here is the central model, where all rays pass through a single point. • Intermediate models are what is called axial cameras (all rays touch a single line), and x-slit cameras (rays touch two lines). • A multi-view geometry of completely non-central cameras, leading to the formulation of multi-view matching tensors, analogous to the fundamental/essential matrices, trifocal and quadri-focal tensors of perspective cameras.
  117. 117. Multiple View Geometry in Generalized Camera Examples of imaging systems; (c)–(e) are non-central devices. (a) Catadioptric system. (b) Central camera (e.g. perspective, with or without radial distortion). (c) Camera looking at reflective sphere. (d) Omnivergent imaging system. (e) Stereo system.
  118. 118. Multiple View Geometry in Generalized Camera Camera models, defined by 3D points and lines that have an intersection with all projection rays of a camera.
  119. 119. Multiple View Geometry in Generalized Camera Parameterization of projection rays for different camera models: In central cameras, all rays go through a single point, the optical center: a finite and infinite optical center. In axial cameras, all rays touch a line, the camera axis: a finite and an infinite camera axis. In x-slit cameras, there exist two lines – camera axes – that cut all projection rays: (i) both axes are finite lines or (ii) one of the two axes is a line at infinity.
  120. 120. Multiple View Geometry in Generalized Camera Cases of multi-view matching constraints for central and non-central cameras. Columns “useful” contain entries of the form x-y-z etc. that correspond to sub-matrices of M that give rise to matching constraints linking all views: x-y-z refers to submatrices containing x rows from one camera, y from another etc.
  121. 121. Multiple View Geometry in Generalized Camera Essential matrices for different camera models
  122. 122. Image Mosaic • The mosaic has a natural interpretation in 3D • the images are reprojected onto a common plane • the mosaic is formed on this plane • mosaic is a synthetic wide-angle camera Translation Affine Perspective 3D Rotations 3D rotations
  123. 123. Perspective Stitching • Pick one image, typically the central view (red outline); • Warp the others to its plane: • Pixel-based or feature-based registration; • Global alignment: bundle adjustment. • Blend (global and local): alpha blending/Poisson blending/multi-band blending.
  124. 124. What if a 360 Field of View? • Mapping to non-planar (e.g., cylindrical or spherical) surfaces; • Lens distortion: radial or tangential; • Fisheye lens: a different model than polynomial models (radial); • Local registration for de-ghosting: • optic flow or spline-based, • Model of planar plus parallax;
  125. 125. Cylindrical Panoramas • Map 3D point (X,Y,Z) onto cylinder; • Convert to coordinates on unit cylinder; • Convert to image coordinates on unwrapped cylinder. • A cylindrical image is a rectangular array.
  126. 126. Spherical Projection • Map 3D point (X,Y,Z) onto sphere; • Convert to spherical coordinates; • Convert to spherical image coordinates; • Map onto the unwrapped sphere.
  127. 127. Spherical Projection
  128. 128. Image Blending • Directly averaging the overlapped pixels results in ghosting artifacts • Moving objects, errors in registration, parallax, etc. • Assign one input image each output pixel • One solution: optimal assignment can be found by graph cut • Inconsistency btw pixels from different input images • Different exposure/white balance settings • Photometric distortions (e.g., vignetting) • Poisson blending • Copy the gradient field from the input image; • Reconstruct the final image by solving a Poisson equation; • Laplacian pyramid blending; • Color correction and HDR for seam finding.
  129. 129. Image Blending Color correction Poisson blending Graph cut
  130. 130. Examples of Panoramic View Planar Cylindrical Spherical
  131. 131. Examples of Panoramic View Planar Cylindrical Spherical
  132. 132. Examples of Panoramic View Cylindrical Spherical
  133. 133. Examples of Panoramic View Cylindrical Spherical
  134. 134. View Navigation/Browsing • Moviemaps, omnidirectional camera: registered to a street map; • Object Movies: Quicktime VR; • View interpolation: image-based modeling and rendering • View morphing in Photo Tourism system (used for ‘PhotoSynth’); • Navigation with 6 DOF: path planning and fitting • Camera parameters: SfM; • View scoring: angular deviation, field of view, resolution; • Navigation control: free viewpoint, scene specific or optimized transition btw controls; • Discover orbit detection, panorama, and canonical images; • Scene viewer: choosing and warping the appropriate images in interaction with users; • Warping with proxy planes or 3-D model; • Render first by drawing the background, then the selected image. • Path planning: smooth transition optimized; • Appearance stabilization: geometric/photometric alignment.
  135. 135. 360 Spherical View • Saved as equi-rectangular (or spherical) image: a 360x180 flat projection of a sphere; • Html5/WebGL support viewing a spherical image by vertex + fragment shaders; • 360 viewer/player can map the equi-rectangular image as texture with a spherical surface mesh for interactive viewing; • YouTube supports 360 spherical video playback with 2 DoF (no zooming).
  136. 136. Spherical Video Player or Rendering
  137. 137. ORSpherical Stereo Video Playback Format
  138. 138. Little Planet • Wrap the equi-rectangular image (360x180 spherical panorama) into a sphere: a miniature model of a planet; • Output: hyperbolic, stereographic, or squoculus. Equi-rectangular panorama: Each major compass point is indicated by a separate colored square.
  139. 139. Thanks!

×