SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
Fisheye/Omnidirectional View
in Autonomous Driving II
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• Artistic style transfer for videos and spherical images
• SalNet360: Saliency Maps for omnidirectional images with CNN
• Restricted Deformable Convolution based Road Scene Semantic Segmentation Using
Surround View Cameras
• Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images
• Appendix:
• Spatial Transform Network
• Active Convolution: Learning the Shape of Convolution for Image Classification
• Warped Convolutions: Efficient Invariance to Spatial Transformations
• Deformable Convolutional Networks
Artistic style transfer for videos and spherical images
• Manually re-drawing an image in a certain artistic style takes a professional artist a long time.
• Doing this for a video sequence single-handedly is beyond imagination.
• Two computational approaches, transfer the style from one image (for example, a painting)
to a whole video sequence.
• The first approach, adapts to videos the original image style transfer technique by CNN
(CVPR’16) based on energy minimization.
• Try other ways of initialization and loss functions to generate consistent and stable stylized
video sequences even in cases with large motion and strong occlusion.
• The second approach formulates video stylization as a learning problem.
• Run a deep network architecture and training procedures that allow to stylize arbitrary-
length videos in a consistent and stable way, and nearly in real time.
Artistic style transfer for videos and spherical images
Basic training procedure for a style transfer network with prior image (e.g. last frame warped).
The goal of the network is to produce a new stylized image, where the combination of
perceptual loss and deviation from the prior image (in non-occluded regions) is minimized.
Artistic style transfer for videos and spherical images
Training procedure for the multi-frame approach, shown with three frames. Back-propagating only one frame
already improves the quality of generated videos a lot. Back-propagating more frames would have required
decreasing the size of the network due to memory restrictions.
Artistic style transfer for videos and spherical images
• Virtual reality (VR) applications become increasingly popular, and the demand for
image processing methods applicable to spherical images and videos rises.
• Spherical reality media is typically distributed via a 2D projection.
• The most common format is the equirectangular projection.
• However, this format does not preserve shapes: the distortion of the projection
becomes very large towards the poles of the sphere.
• Such non-uniform distortions are problematic for style transfer.
• Therefore, it works with subdivided spherical images that consist of multiple
rectilinear projections.
• In particular, it uses cubic projection, which represents a spherical image with six
non-distorted square images.
Artistic style transfer for videos and spherical images
Cubemap projection used for stylizing
spherical images. The generated
images must be consistent along the
boundaries of neighboring cube faces.
Every cube face has four neighbors.
For style transfer in this regime, the six cube faces must
be stylized such that their cut edges are consistent, i.e.,
the style transfer must not introduce false discontinuities
along the edges of the cube in the final projection. Since
applications in VR environments must run in real time,
here only consider the fast, network-based approach.
Artistic style transfer for videos and spherical images
Training data generation process for a network to
adapt to perspective transformed border regions.
The extensions for video style transfer and for spherical
images can be combined to process spherical videos.
This yields two constraints: (1)
each cube face should be
consistent along the motion
trajectory; (2) neighboring cube
faces must have consistent
boundaries.
For 1), calculate optical flow for
each cube face separately, then
warp stylized cube faces the
same way as for regular planar
videos. For 2), blend both the
warped image from the last
frame and the transformed
border of already stylized
neighboring cube faces.
Artistic style transfer for videos and spherical images
The left image shows the overlap region of a
cube face from a panoramic image. The right
shows close-ups for two networks. Left: Not
fine-tuned. Right: Fine-tuned. In regions with
little structure (top and middle), the fine-
tuning strategy reduced unnatural artifacts
along the inner edge of the prior image. It
sometimes uses stylistic features to mask the
transition (middle). In regions with more
structure (bot- tom), both networks adapted
well to the given prior.
SalNet360: Saliency Maps for omni-
directional images with CNN
• With the current trend in the Virtual Reality (VR) field, adapting known techniques to this
new kind of media is starting to gain momentum.
• One of the applications for VR headsets is displaying of Omni-directional Images (ODIs).
• These images portray an entire scene as seen from a static point of view, and when viewed
through a VR headset, allow for an immersive user experience.
• The most common method for storing ODIs is by applying equirectangular, cylindrical or
cubic projections and saving them as standard two-dimensional images.
• The prediction of Visual Attention data from any kind of media is of valuable use to content
creators and used to efficiently drive encoding algorithms.
• This is an architectural extension to any CNN to fine-tune traditional 2D saliency prediction
to ODIs in an end-to-end manner.
• To address these issues:
• Subdividing the ODI into undistorted patches.
• Providing the CNN with the spherical coordinates for each pixel in the patches.
SalNet360: Saliency Maps for omni-
directional images with CNN
ODI Saliency Detection Pipeline.
This method takes an ODI as input and splits it into six patches using the pre-processing steps. Each of
these six patches is sent through the CNN. The output of the CNN for all the patches are then combined
using the post-processing technique.
SalNet360: Saliency Maps for omni-
directional images with CNN
Spherical coordinates definition and sliding frustum used to create the patches.
By specifying the field of view per
patch and its resolution, it is
possible to calculate the spherical
coordinates of each pixel in the
patch. These are then used to find
the corresponding pixels in the
ODI by applying the following
equations:
SalNet360: Saliency Maps for omni-
directional images with CNN
Network Architecture
Patches extracted from the ODI.
SalNet360: Saliency Maps for omni-
directional images with CNN
Comparison of the three experimental scenarios. Top row: On the left the input ODI, on the right the
ground truth saliency map blended with the image. Bottom row: From left to right, the result of the three
experimental scenarios: Base CNN, Base CNN + Patches, Base CNN + Patches + Spherical Coords.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
• Understanding the surrounding environment of the vehicle is still one of the challenges for
autonomous driving.
• This does 360-degree road scene semantic segmentation using surround view cameras,
which are widely equipped in existing production cars.
• First, to address large distortion problem in the fisheye images, Restricted Deformable
Convolution (RDC) is proposed for semantic segmentation, which can effectively model
geometric transformations by learning the shapes of convolutional filters conditioned on the
input feature map.
• Second, to obtain a large-scale training set of surround view images, a method called zoom
augmentation is proposed to transform conventional images to fisheye images.
• Finally, an RDC based semantic segmentation model is built; the model is trained for real-
world surround view images through a multi-task learning architecture by combining real-
world images with transformed images.
• It takes ERFNet as the baseline model for segmentation.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
The center of the undistorted image is clear, but the boundaries of the
image are very blurred. And some information is lost during transferring the
pixels of the raw fisheye image into the undistorted image.
“ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation”, 2017
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
Surround view cameras consist of four fisheye cameras mounted on each side of the vehicle.
Cameras in different directions capture images with different image composition.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
RDC is restricted version of deformable convolution. The sampling locations of 3x3 convolutions: (a)
Standard convolution. (b) Dilated convolution with dilation 2. (c) Deformable convolution. (d)
Restricted deformable convolution. The dark points are the actual sampling locations, and the hollow
circles in (c) and (d) are the initial sampling locations. (a) and (b) employ a fixed grid of sampling
locations. (c) and (d) augment the sampling locations with learned 2D offsets (red arrows). The primary
difference between (c) and (d) is that restricted deformable convolution employs a fixed central
sampling location. No offsets are needed to be learned for the central sampling location in (d).
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
A 3 × 3 restricted deformable convolution. The module is initiated with a 3×3 filter with dilation. Offset
fields are learned from the input feature map by a regular convol layer. The channel dimension 2(N − 1)
corresponds N − 1 2D offsets (the red arrows). The actual sampling positions (dark points) are obtained
by adding the 2D offsets. The value of the new position is obtained by using bilinear interpolation to
weight the four nearest points. The yellow arrows denote the BP paths of gradients.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
(a) 3 × 3 regular convolution. (b) Factorized convolutions. (c) Factorized restricted
deformable convolution. The nonlinearities in (b) and (c) are omitted here.
2D filters can be approximated as a combination of 1D filters, for the sake of reducing memory
and computational cost. A basic decomposed layer consists of vertical kernels followed by
horizontal ones, and a nonlinearity is inserted in between 1D convolutions.
For 2D RDC, each learned offset has two components: vertical direction and horizontal direction.
With 2D kernel decomposed into a vertical kernel and a horizontal kernel, the offsets can also be
decomposed into two components of the same directions.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
• Training of deep networks requires a huge number of training images, but training datasets
are always limited.
• Data argumentation methods are adopted to enlarge training data using label-preserving
transformations.
• Many forms are employed to do data augmentation for semantic segmentation, such as
horizontally flipping, scaling, rotation, cropping and color jittering.
• The operation of warping conventional images to fisheye-style images is generally called
zoom augmentation.
• The zoom augmentation can adopt a fixed focal length or a randomly changing focal length.
• Via the zoom augmentation method, an existing conventional image dataset for semantic
segmentation can be transformed into a fisheye- style image dataset.
• The smaller the focal length, the larger the degree of distortions.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
Zoom augmentation. The left are the original color image and annotation. The right are the transformed
images and annotations by zoom augmentation with a focal length changing from 200 to 800.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
The multi-task learning architecture for road scene semantic segmentation. The data are then fed into
three shared-weight sub-networks (the blue blocks). The total loss is the weighted sum of main losses and
auxiliary losses. γ is auxiliary loss weighting to balance the contribution of auxiliary losses. α is the task
weighting of main branch to balance the main losses of different tasks. Similarly, β is the task weighting of
auxiliary branch to balance the auxiliary losses of different task.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
ERFNet-RDC-λ. (a) Non-bt-1D block in ERFNet. (b) Reconstructed non-bt-1D block. The first two
convolutional layers are replaced with RDC layers. (c) The encoder of ERFNet-RDC-λ.
(a) (b) (c)
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
One example of the segmented results produced by ERFNet, ERFNet-DC-8,
ERFNet-FRDC-8, ERFNet-RDC-8. The red pixels denotes false recognitions of the
bus. The ERFNet-RDC-8 nearly detected the whole bus in the image.
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
(a) Results from different models
(b) List of 18 classes names and correspond. colors used for labeling.
Examples of results on the test set of SVScape. The results of front,
rear, left and right view are displayed in (a). The first two rows
show raw image and ground truth, and the following four rows
show the results produced by different models. The last row show
the improvement/error map which denotes the pixels
misclassified by this method in red and the pixels that are
misclassified by the base model ERFNet but correctly predicted by
the proposed method in green. The color code is listed in (b).
Restricted Deformable Convolution based Road Scene
Semantic Segmentation Using Surround View Cameras
The bird’s eye view image semantic segmentation by mapping segmentation
results of raw surround view images to bird’s eye view plane.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
• There is a high demand of 3D data for 360◦ panoramic images and videos, pushed by the
growing availability on the market of specialized hardware for both capturing (e.g., omni-
directional cameras) as well as visualizing in 3D (e.g., head mounted displays) panoramic
images and videos.
• At the same time, 3D sensors able to capture 3D panoramic data are expensive and/or
hardly available.
• To fill this gap, here is a learning approach for panoramic depth map estimation from a
single image.
• Thanks to a specifically developed distortion-aware deformable convolution filter, this
method can be trained by means of conventional perspective images, then used to regress
depth for panoramic images, thus bypassing the effort needed to create annotated
panoramic training dataset.
• It demonstrates for emerging tasks such as panoramic monocular SLAM, panoramic
semantic segmentation and panoramic style transfer.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
From a single input equirectangular image (top left), this method exploits distortion-aware convolutions to
notably reduce distortions in depth prediction that affect conventional CNNs (bottom row). Top right: the same
idea used to predict semantic labels, to obtain panoramic 3D semantic segmentation from a single image.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
The key concept behind the distortion-aware convolution is that the sampling grid is
deformed according to the image distortion model, so that the receptive field is rectified.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
Computation of the adaptive sampling grid for equirectangular image. Each pixel p in the equirectangular
image is transformed into unit sphere coordinates, then the sampling grid is computed on the tangent plane
in unit sphere coordinates, finally the sampling grid is back- projected into equirectangular image to
determine the location of the distorted sampling grid.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
A major advantage of the approach is that standard convolutional architectures can be
used with common datasets for perspective images to train the weights. At test time, the
weights are transferred on the same architecture with distortion-aware convolutional
filters so to process equirectangular images. Although the figure report the case of depth
prediction, it applies the same strategy for the semantic segmentation task.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
Compared methods in experimental evaluation: (a) Standard convolution on equirectangular
image, (b) Standard convolution on 6 rectified images via cube map projection, (c) Distortion-
aware convolution on equirectangular image.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
Example of equirectangular image with/without inpainting and extracted rectified perspective images.
Since the images on this dataset lack color nearby polar regions, they are filled in with zeros. To avoid biasing
the network during training, apply an inpainting algorithm. To create perspective images for training, first
extract images with limited field of view along different directions from the original 360◦ panoramic image.
Directions are sampled on a 20◦ interval along the vertical axis (yaw rotation) and on a 15◦ interval along the
horizontal axis (pitch rotation). Then, rectify them into a standard perspective view. These rectified perspective
images are created by mapping pixels from the equirectangular projection to the perspective projection.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
Depth prediction on Stanford 2D-3D-S dataset. Red circles highlight artifacts due to distortions induced by the
standard convolutional model (a) and by the CubeMap representation (b) that are instead solved by this approach (c).
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
Qualitative comparison of semantic segmentation on Stanford 2D-3D-S dataset. Red circles highlight errors
on polar regions and borders of the CubeMap model that are not present in our distortion-aware approach.
Distortion-Aware Convolutional Filters for Dense
Prediction in Panoramic Images
Application of our distortion-aware convolution for panoramic style transfer.
Spatial Transform Network
• CNNs are still limited are not spatially invariant to the input in a efficient manner.
• A learnable Spatial Transformer, allows spatial manipulation of data within the network.
• This differentiable module can be inserted into existing convolutional architectures, giving
NNs the ability to actively spatially transform feature maps, conditional on the feature map
itself, without any extra training supervision or modification to the optimization process.
• The use of spatial transformers results in models which learn invariance to translation, scale,
rotation and more generic warping for a number of classes of transformations.
• (i) image classification: a spatial transformer that crops out and scale-normalizes the
appropriate region can simplify the subsequent classification task, and lead to superior
classification performance;
• (ii) co-localization: given a set of images containing different instances of the same (but
unknown) class, a spatial transformer can be used to localize them in each image;
• (iii) spatial attention: a spatial transformer can be used for tasks requiring an attention
mechanism, and can be trained purely with backpropagation without reinforcement learning.
Spatial Transform Network
The result of using a spatial transformer as the 1st layer of a fully-connected network trained for distorted
MNIST digit classification. (a) The input to the spatial transformer network is an image of an MNIST digit
that is distorted with random translation, scale, rotation, and clutter. (b) The localization network of the
spatial transformer predicts a transformation to apply to the input image. (c) The output of the spatial
transformer, after applying the transformation. (d) The classification prediction produced by the subsequent
fully-connected network on the output of the spatial transformer. The spatial transformer network (a CNN
including a spatial transformer module) is trained end-to-end with only class labels – no knowledge of the
ground truth transformations is given to the system.
Spatial Transform Network
The spatial transformer mechanism is split into three parts. In order of computation, first a localization
network takes the input feature map, and through a number of hidden layers outputs the parameters
of the spatial transformation that should be applied to the feature map – this gives a transformation
conditional on the input. Then, the predicted transformation parameters are used to create a sampling
grid, which is a set of points where the input map should be sampled to produce the transformed
output. This is done by the grid generator. Finally, the feature map and the sampling grid are taken as
inputs to the sampler, producing the output map sampled from the input at the grid points.
Spatial Transform Network
Two examples of applying the parameterized sampling grid to an image U
producing the output V . (a) The sampling grid is the regular grid G = TI (G),
where I is the identity transformation parameters. (b) The sampling grid is
the result of warping the regular grid with an affine transformation Tθ (G).
Active Convolution: Learning the Shape of
Convolution for Image Classification
• A conv unit, active convolution unit (ACU), no fixed shape to define any form of convolution.
• Its shape can be learned through backpropagation during training.
• This unit has a few advantages.
• First, the ACU is a generalization of convolution; it can define not only all conventional
convolutions, but also convolutions with fractional pixel coordinates; it can freely change
the shape of the convolution, which provides greater freedom to form CNN structures.
• Second, the shape of the convolution is learned while training and there is no need to
tune it by hand.
• Third, the ACU can learn better than a conventional unit, simply by changing the
conventional convolution to an ACU.
• Code is available at https://github.com/jyh2986/Active-Convolution.
Active Convolution: Learning the Shape of
Convolution for Image Classification
Concept of the ACU. Black dots represent each synapse. The
ACUs output is the summation of values in all positions pk
multiplied by weight. The position is parameterized by pk .
The ACU can define more diverse forms of the receptive
fields for convolutions with learnable positions parameters.
Inspired by the nervous system, call one acceptor of the
ACU the synapse. Position parameters can be differentiated,
and the shape can be learned through backpropagation.
Active Convolution: Learning the Shape of
Convolution for Image Classification
• ACU is considered a generalization of the convolution unit.
• Any conventional convolution is represented with ACU by
setting positions of synapses properly and fixing all positions.
• Dilated convolution can be also represented by multiplying
the dilation factor with the position parameters.
• Compared to a conventional convolution, the ACU can
generate fractional dilated convolutions and be used to
directly calculate the results of the interpolated convolution.
• It can also be used to define K synapses without any
restriction (e.g., cross-shaped convolution with five synapses,
or a circular convolution with many synapses).
Comparison of a conventional convolution
unit with the ACU. (a) Conventional
convolution unit with 4 input neurons and
two output neurons. (b) Unlike the
convolution unit, the synapses of the ACU
can be connected at inter-neuron
positions and are movable.
Active Convolution: Learning the Shape of
Convolution for Image Classification
• At the network level, ACU converts a discrete input space to a
continuous one.
• Since the ACU uses bilinear interpolation between adjacent
neurons, synapses can connect inter-neuron spaces.
• This lends greater representational power to convolution units.
• The position parameters control the synapses that connect
neuron spaces, and the synapses can move around the neuron
space to reduce error.
• A convolution unit has a number of learnable filters, and each
filter is convolved with its receptive field.
• ACU has a learnable position parameter θp, which is the set of
positions of the synapses.
Coordinate system of interpolation.
m, n represent the base position of
the convolution αk , and βk is the
displacement of the kth synapse.
Warped Convolutions: Efficient Invariance to
Spatial Transformations
• Warped convolutions, a simple and exact construction, yet has the same computational
complexity that standard convolutions enjoy.
• It consists of a constant image warp followed by a simple convolution, which are standard
blocks in deep learning toolboxes.
• With a carefully crafted warp, the resulting architecture can be made equivariant to a wide
range of two-parameter spatial transformations.
• Continuous convolution:
• Group convolutions:
• Image plane
• Warped convolutions:
• exponential map
Warped Convolutions: Efficient Invariance to
Spatial Transformations
Warped Convolutions: Efficient Invariance to
Spatial Transformations
First row: Sampling grids that
define the warps associated with
different spatial transformations.
Second row: An example image (a)
after warping with each grid (b-d).
Third row: A small translation is
applied to each warped image,
which is then mapped back to the
original space (by an inverse warp).
Translation in one axis of the
appropriate warped space is
equivalent to (b) horizontal scaling;
(c) planar rotation; (d) 3D rotation
around the vertical axis.
Deformable Convolutional Networks
• Two new modules to enhance the transformation modeling capability of CNNs, namely,
deformable convolution and deformable RoI pooling.
• Both are based on the idea of augmenting the spatial sampling locations in the modules
with additional offsets and learning the offsets from the target tasks, without additional
supervision.
• The modules can replace their counterparts in existing CNNs and can be easily trained end-
to-end by standard back-propagation, giving rise to deformable convolutional networks.
• Learning dense spatial transformation in deep CNNs is effective for sophisticated vision tasks
such as object detection and semantic segmentation.
• The code is released at https://github.com/msracver/Deformable-ConvNets.
Deformable Convolutional Networks
Illustration of the sampling locations in 3 × 3 standard and deformable
convolutions. (a) regular sampling grid (green points) of standard convolution.
(b) deformed sampling locations (dark blue points) with augmented offsets
(light blue arrows) in deformable convolution. (c)-(d) are special cases of (b),
showing that the deformable convolution generalizes various transformations
for scale,(anisotropic) aspect ratio and rotation.
Deformable Convolutional Networks
• Both deformable convolution and RoI pooling modules operate on the 2D spatial domain.
• The operation remains the same across the channel dimension.
• Without loss of generality, the modules are described in 2D here for notation clarity.
• The 2D convolution consists of two steps: 1) sampling using a regular grid R over the input
feature map x; 2) summation of sampled values weighted by w.
• RoI pooling converts an input rectangular region of arbitrary size into fixed size features.
• Both deformable convolution and RoI pooling modules have the same input and output.
• First, a deep fully convolutional network generates feature maps over the whole input image.
• Second, a shallow task specific network generates results from the feature maps.
• The DCN idea is augmenting the spatial sampling locations in convolution and RoI pooling
with additional offsets and learning the offsets from target tasks.
Deformable Convolutional Networks
Illustration of 3 × 3 deformable convolution Illustration of 3 × 3 deformable RoI pooling
Deformable Convolutional Networks
Illustration of 3 × 3 deformable Position-Sensitive (PS) RoI pooling
Deformable Convolutional Networks
Illustration of the fixed receptive field in standard convolution (a) and the adaptive receptive field in deformable
convolution (b), using two layers. Top: two activation units on the top feature map, on two objects of different scales
and shapes. The activation is from a 3 × 3 filter. Middle: the sampling locations of the 3 × 3 filter on the preceding
feature map. Another two activation units are highlighted. Bottom: the sampling locations of two levels of 3 × 3 filters
on the preceding feature map. Two sets of locations are highlighted, corresponding to the highlighted units above.
Omnidirectional View Semantic Segmentation

Weitere ähnliche Inhalte

Was ist angesagt?

fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIYu Huang
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learningYu Huang
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IVYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VYu Huang
 
Deep vo and slam ii
Deep vo and slam iiDeep vo and slam ii
Deep vo and slam iiYu Huang
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image IIIYu Huang
 
3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous drivingYu Huang
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataYu Huang
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving IYu Huang
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iiiYu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIYu Huang
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIYu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IVYu Huang
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIYu Huang
 

Was ist angesagt? (20)

fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IV
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
Deep vo and slam ii
Deep vo and slam iiDeep vo and slam ii
Deep vo and slam ii
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iii
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VII
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 

Ähnlich wie Omnidirectional View Semantic Segmentation

Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Sunando Sengupta
 
Sergey A. Sukhanov, "3D content production"
Sergey A. Sukhanov, "3D content production"Sergey A. Sukhanov, "3D content production"
Sergey A. Sukhanov, "3D content production"Mikhail Vink
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
194Martin LeungUnerd Poster
194Martin LeungUnerd Poster194Martin LeungUnerd Poster
194Martin LeungUnerd PosterMartin Leung
 
High quality single shot capture of facial geometry
High quality single shot capture of facial geometryHigh quality single shot capture of facial geometry
High quality single shot capture of facial geometryBrohi Aijaz Ali
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereoBaliThorat1
 
Deblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel EstimationDeblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel EstimationIRJET Journal
 
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...Edge AI and Vision Alliance
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTIRJET Journal
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learningYu Huang
 
03 cameras & their geometry
03 cameras & their geometry03 cameras & their geometry
03 cameras & their geometrySarhat Adam
 
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...Edge AI and Vision Alliance
 
Computer Vision panoramas
Computer Vision  panoramasComputer Vision  panoramas
Computer Vision panoramasWael Badawy
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargeting
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargetingIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargeting
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargetingIEEEBEBTECHSTUDENTPROJECTS
 

Ähnlich wie Omnidirectional View Semantic Segmentation (20)

Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
 
Sergey A. Sukhanov, "3D content production"
Sergey A. Sukhanov, "3D content production"Sergey A. Sukhanov, "3D content production"
Sergey A. Sukhanov, "3D content production"
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
194Martin LeungUnerd Poster
194Martin LeungUnerd Poster194Martin LeungUnerd Poster
194Martin LeungUnerd Poster
 
High quality single shot capture of facial geometry
High quality single shot capture of facial geometryHigh quality single shot capture of facial geometry
High quality single shot capture of facial geometry
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereo
 
DICTA 2017 poster
DICTA 2017 posterDICTA 2017 poster
DICTA 2017 poster
 
Deblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel EstimationDeblurring of License Plate Image using Blur Kernel Estimation
Deblurring of License Plate Image using Blur Kernel Estimation
 
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFT
 
WT in IP.ppt
WT in IP.pptWT in IP.ppt
WT in IP.ppt
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learning
 
03 cameras & their geometry
03 cameras & their geometry03 cameras & their geometry
03 cameras & their geometry
 
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
“Tools for Creating Next-Gen Computer Vision Apps on Snapdragon,” a Presentat...
 
Computer Vision panoramas
Computer Vision  panoramasComputer Vision  panoramas
Computer Vision panoramas
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargeting
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargetingIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargeting
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargeting
 

Mehr von Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 

Mehr von Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 

Kürzlich hochgeladen

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Kürzlich hochgeladen (20)

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

Omnidirectional View Semantic Segmentation

  • 1. Fisheye/Omnidirectional View in Autonomous Driving II Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • Artistic style transfer for videos and spherical images • SalNet360: Saliency Maps for omnidirectional images with CNN • Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras • Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images • Appendix: • Spatial Transform Network • Active Convolution: Learning the Shape of Convolution for Image Classification • Warped Convolutions: Efficient Invariance to Spatial Transformations • Deformable Convolutional Networks
  • 3. Artistic style transfer for videos and spherical images • Manually re-drawing an image in a certain artistic style takes a professional artist a long time. • Doing this for a video sequence single-handedly is beyond imagination. • Two computational approaches, transfer the style from one image (for example, a painting) to a whole video sequence. • The first approach, adapts to videos the original image style transfer technique by CNN (CVPR’16) based on energy minimization. • Try other ways of initialization and loss functions to generate consistent and stable stylized video sequences even in cases with large motion and strong occlusion. • The second approach formulates video stylization as a learning problem. • Run a deep network architecture and training procedures that allow to stylize arbitrary- length videos in a consistent and stable way, and nearly in real time.
  • 4. Artistic style transfer for videos and spherical images Basic training procedure for a style transfer network with prior image (e.g. last frame warped). The goal of the network is to produce a new stylized image, where the combination of perceptual loss and deviation from the prior image (in non-occluded regions) is minimized.
  • 5. Artistic style transfer for videos and spherical images Training procedure for the multi-frame approach, shown with three frames. Back-propagating only one frame already improves the quality of generated videos a lot. Back-propagating more frames would have required decreasing the size of the network due to memory restrictions.
  • 6. Artistic style transfer for videos and spherical images • Virtual reality (VR) applications become increasingly popular, and the demand for image processing methods applicable to spherical images and videos rises. • Spherical reality media is typically distributed via a 2D projection. • The most common format is the equirectangular projection. • However, this format does not preserve shapes: the distortion of the projection becomes very large towards the poles of the sphere. • Such non-uniform distortions are problematic for style transfer. • Therefore, it works with subdivided spherical images that consist of multiple rectilinear projections. • In particular, it uses cubic projection, which represents a spherical image with six non-distorted square images.
  • 7. Artistic style transfer for videos and spherical images Cubemap projection used for stylizing spherical images. The generated images must be consistent along the boundaries of neighboring cube faces. Every cube face has four neighbors. For style transfer in this regime, the six cube faces must be stylized such that their cut edges are consistent, i.e., the style transfer must not introduce false discontinuities along the edges of the cube in the final projection. Since applications in VR environments must run in real time, here only consider the fast, network-based approach.
  • 8. Artistic style transfer for videos and spherical images Training data generation process for a network to adapt to perspective transformed border regions. The extensions for video style transfer and for spherical images can be combined to process spherical videos. This yields two constraints: (1) each cube face should be consistent along the motion trajectory; (2) neighboring cube faces must have consistent boundaries. For 1), calculate optical flow for each cube face separately, then warp stylized cube faces the same way as for regular planar videos. For 2), blend both the warped image from the last frame and the transformed border of already stylized neighboring cube faces.
  • 9. Artistic style transfer for videos and spherical images The left image shows the overlap region of a cube face from a panoramic image. The right shows close-ups for two networks. Left: Not fine-tuned. Right: Fine-tuned. In regions with little structure (top and middle), the fine- tuning strategy reduced unnatural artifacts along the inner edge of the prior image. It sometimes uses stylistic features to mask the transition (middle). In regions with more structure (bot- tom), both networks adapted well to the given prior.
  • 10. SalNet360: Saliency Maps for omni- directional images with CNN • With the current trend in the Virtual Reality (VR) field, adapting known techniques to this new kind of media is starting to gain momentum. • One of the applications for VR headsets is displaying of Omni-directional Images (ODIs). • These images portray an entire scene as seen from a static point of view, and when viewed through a VR headset, allow for an immersive user experience. • The most common method for storing ODIs is by applying equirectangular, cylindrical or cubic projections and saving them as standard two-dimensional images. • The prediction of Visual Attention data from any kind of media is of valuable use to content creators and used to efficiently drive encoding algorithms. • This is an architectural extension to any CNN to fine-tune traditional 2D saliency prediction to ODIs in an end-to-end manner. • To address these issues: • Subdividing the ODI into undistorted patches. • Providing the CNN with the spherical coordinates for each pixel in the patches.
  • 11. SalNet360: Saliency Maps for omni- directional images with CNN ODI Saliency Detection Pipeline. This method takes an ODI as input and splits it into six patches using the pre-processing steps. Each of these six patches is sent through the CNN. The output of the CNN for all the patches are then combined using the post-processing technique.
  • 12. SalNet360: Saliency Maps for omni- directional images with CNN Spherical coordinates definition and sliding frustum used to create the patches. By specifying the field of view per patch and its resolution, it is possible to calculate the spherical coordinates of each pixel in the patch. These are then used to find the corresponding pixels in the ODI by applying the following equations:
  • 13. SalNet360: Saliency Maps for omni- directional images with CNN Network Architecture Patches extracted from the ODI.
  • 14. SalNet360: Saliency Maps for omni- directional images with CNN Comparison of the three experimental scenarios. Top row: On the left the input ODI, on the right the ground truth saliency map blended with the image. Bottom row: From left to right, the result of the three experimental scenarios: Base CNN, Base CNN + Patches, Base CNN + Patches + Spherical Coords.
  • 15. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras • Understanding the surrounding environment of the vehicle is still one of the challenges for autonomous driving. • This does 360-degree road scene semantic segmentation using surround view cameras, which are widely equipped in existing production cars. • First, to address large distortion problem in the fisheye images, Restricted Deformable Convolution (RDC) is proposed for semantic segmentation, which can effectively model geometric transformations by learning the shapes of convolutional filters conditioned on the input feature map. • Second, to obtain a large-scale training set of surround view images, a method called zoom augmentation is proposed to transform conventional images to fisheye images. • Finally, an RDC based semantic segmentation model is built; the model is trained for real- world surround view images through a multi-task learning architecture by combining real- world images with transformed images. • It takes ERFNet as the baseline model for segmentation.
  • 16. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras The center of the undistorted image is clear, but the boundaries of the image are very blurred. And some information is lost during transferring the pixels of the raw fisheye image into the undistorted image. “ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation”, 2017
  • 17. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras Surround view cameras consist of four fisheye cameras mounted on each side of the vehicle. Cameras in different directions capture images with different image composition.
  • 18. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras RDC is restricted version of deformable convolution. The sampling locations of 3x3 convolutions: (a) Standard convolution. (b) Dilated convolution with dilation 2. (c) Deformable convolution. (d) Restricted deformable convolution. The dark points are the actual sampling locations, and the hollow circles in (c) and (d) are the initial sampling locations. (a) and (b) employ a fixed grid of sampling locations. (c) and (d) augment the sampling locations with learned 2D offsets (red arrows). The primary difference between (c) and (d) is that restricted deformable convolution employs a fixed central sampling location. No offsets are needed to be learned for the central sampling location in (d).
  • 19. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras A 3 × 3 restricted deformable convolution. The module is initiated with a 3×3 filter with dilation. Offset fields are learned from the input feature map by a regular convol layer. The channel dimension 2(N − 1) corresponds N − 1 2D offsets (the red arrows). The actual sampling positions (dark points) are obtained by adding the 2D offsets. The value of the new position is obtained by using bilinear interpolation to weight the four nearest points. The yellow arrows denote the BP paths of gradients.
  • 20. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras (a) 3 × 3 regular convolution. (b) Factorized convolutions. (c) Factorized restricted deformable convolution. The nonlinearities in (b) and (c) are omitted here. 2D filters can be approximated as a combination of 1D filters, for the sake of reducing memory and computational cost. A basic decomposed layer consists of vertical kernels followed by horizontal ones, and a nonlinearity is inserted in between 1D convolutions. For 2D RDC, each learned offset has two components: vertical direction and horizontal direction. With 2D kernel decomposed into a vertical kernel and a horizontal kernel, the offsets can also be decomposed into two components of the same directions.
  • 21. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras • Training of deep networks requires a huge number of training images, but training datasets are always limited. • Data argumentation methods are adopted to enlarge training data using label-preserving transformations. • Many forms are employed to do data augmentation for semantic segmentation, such as horizontally flipping, scaling, rotation, cropping and color jittering. • The operation of warping conventional images to fisheye-style images is generally called zoom augmentation. • The zoom augmentation can adopt a fixed focal length or a randomly changing focal length. • Via the zoom augmentation method, an existing conventional image dataset for semantic segmentation can be transformed into a fisheye- style image dataset. • The smaller the focal length, the larger the degree of distortions.
  • 22. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras Zoom augmentation. The left are the original color image and annotation. The right are the transformed images and annotations by zoom augmentation with a focal length changing from 200 to 800.
  • 23. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras The multi-task learning architecture for road scene semantic segmentation. The data are then fed into three shared-weight sub-networks (the blue blocks). The total loss is the weighted sum of main losses and auxiliary losses. γ is auxiliary loss weighting to balance the contribution of auxiliary losses. α is the task weighting of main branch to balance the main losses of different tasks. Similarly, β is the task weighting of auxiliary branch to balance the auxiliary losses of different task.
  • 24. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras ERFNet-RDC-λ. (a) Non-bt-1D block in ERFNet. (b) Reconstructed non-bt-1D block. The first two convolutional layers are replaced with RDC layers. (c) The encoder of ERFNet-RDC-λ. (a) (b) (c)
  • 25. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras One example of the segmented results produced by ERFNet, ERFNet-DC-8, ERFNet-FRDC-8, ERFNet-RDC-8. The red pixels denotes false recognitions of the bus. The ERFNet-RDC-8 nearly detected the whole bus in the image.
  • 26. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras (a) Results from different models (b) List of 18 classes names and correspond. colors used for labeling. Examples of results on the test set of SVScape. The results of front, rear, left and right view are displayed in (a). The first two rows show raw image and ground truth, and the following four rows show the results produced by different models. The last row show the improvement/error map which denotes the pixels misclassified by this method in red and the pixels that are misclassified by the base model ERFNet but correctly predicted by the proposed method in green. The color code is listed in (b).
  • 27. Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras The bird’s eye view image semantic segmentation by mapping segmentation results of raw surround view images to bird’s eye view plane.
  • 28. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images • There is a high demand of 3D data for 360◦ panoramic images and videos, pushed by the growing availability on the market of specialized hardware for both capturing (e.g., omni- directional cameras) as well as visualizing in 3D (e.g., head mounted displays) panoramic images and videos. • At the same time, 3D sensors able to capture 3D panoramic data are expensive and/or hardly available. • To fill this gap, here is a learning approach for panoramic depth map estimation from a single image. • Thanks to a specifically developed distortion-aware deformable convolution filter, this method can be trained by means of conventional perspective images, then used to regress depth for panoramic images, thus bypassing the effort needed to create annotated panoramic training dataset. • It demonstrates for emerging tasks such as panoramic monocular SLAM, panoramic semantic segmentation and panoramic style transfer.
  • 29. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images From a single input equirectangular image (top left), this method exploits distortion-aware convolutions to notably reduce distortions in depth prediction that affect conventional CNNs (bottom row). Top right: the same idea used to predict semantic labels, to obtain panoramic 3D semantic segmentation from a single image.
  • 30. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images The key concept behind the distortion-aware convolution is that the sampling grid is deformed according to the image distortion model, so that the receptive field is rectified.
  • 31. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images Computation of the adaptive sampling grid for equirectangular image. Each pixel p in the equirectangular image is transformed into unit sphere coordinates, then the sampling grid is computed on the tangent plane in unit sphere coordinates, finally the sampling grid is back- projected into equirectangular image to determine the location of the distorted sampling grid.
  • 32. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images A major advantage of the approach is that standard convolutional architectures can be used with common datasets for perspective images to train the weights. At test time, the weights are transferred on the same architecture with distortion-aware convolutional filters so to process equirectangular images. Although the figure report the case of depth prediction, it applies the same strategy for the semantic segmentation task.
  • 33. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images Compared methods in experimental evaluation: (a) Standard convolution on equirectangular image, (b) Standard convolution on 6 rectified images via cube map projection, (c) Distortion- aware convolution on equirectangular image.
  • 34. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images Example of equirectangular image with/without inpainting and extracted rectified perspective images. Since the images on this dataset lack color nearby polar regions, they are filled in with zeros. To avoid biasing the network during training, apply an inpainting algorithm. To create perspective images for training, first extract images with limited field of view along different directions from the original 360◦ panoramic image. Directions are sampled on a 20◦ interval along the vertical axis (yaw rotation) and on a 15◦ interval along the horizontal axis (pitch rotation). Then, rectify them into a standard perspective view. These rectified perspective images are created by mapping pixels from the equirectangular projection to the perspective projection.
  • 35. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images Depth prediction on Stanford 2D-3D-S dataset. Red circles highlight artifacts due to distortions induced by the standard convolutional model (a) and by the CubeMap representation (b) that are instead solved by this approach (c).
  • 36. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images Qualitative comparison of semantic segmentation on Stanford 2D-3D-S dataset. Red circles highlight errors on polar regions and borders of the CubeMap model that are not present in our distortion-aware approach.
  • 37. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images Application of our distortion-aware convolution for panoramic style transfer.
  • 38.
  • 39. Spatial Transform Network • CNNs are still limited are not spatially invariant to the input in a efficient manner. • A learnable Spatial Transformer, allows spatial manipulation of data within the network. • This differentiable module can be inserted into existing convolutional architectures, giving NNs the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimization process. • The use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping for a number of classes of transformations. • (i) image classification: a spatial transformer that crops out and scale-normalizes the appropriate region can simplify the subsequent classification task, and lead to superior classification performance; • (ii) co-localization: given a set of images containing different instances of the same (but unknown) class, a spatial transformer can be used to localize them in each image; • (iii) spatial attention: a spatial transformer can be used for tasks requiring an attention mechanism, and can be trained purely with backpropagation without reinforcement learning.
  • 40. Spatial Transform Network The result of using a spatial transformer as the 1st layer of a fully-connected network trained for distorted MNIST digit classification. (a) The input to the spatial transformer network is an image of an MNIST digit that is distorted with random translation, scale, rotation, and clutter. (b) The localization network of the spatial transformer predicts a transformation to apply to the input image. (c) The output of the spatial transformer, after applying the transformation. (d) The classification prediction produced by the subsequent fully-connected network on the output of the spatial transformer. The spatial transformer network (a CNN including a spatial transformer module) is trained end-to-end with only class labels – no knowledge of the ground truth transformations is given to the system.
  • 41. Spatial Transform Network The spatial transformer mechanism is split into three parts. In order of computation, first a localization network takes the input feature map, and through a number of hidden layers outputs the parameters of the spatial transformation that should be applied to the feature map – this gives a transformation conditional on the input. Then, the predicted transformation parameters are used to create a sampling grid, which is a set of points where the input map should be sampled to produce the transformed output. This is done by the grid generator. Finally, the feature map and the sampling grid are taken as inputs to the sampler, producing the output map sampled from the input at the grid points.
  • 42. Spatial Transform Network Two examples of applying the parameterized sampling grid to an image U producing the output V . (a) The sampling grid is the regular grid G = TI (G), where I is the identity transformation parameters. (b) The sampling grid is the result of warping the regular grid with an affine transformation Tθ (G).
  • 43. Active Convolution: Learning the Shape of Convolution for Image Classification • A conv unit, active convolution unit (ACU), no fixed shape to define any form of convolution. • Its shape can be learned through backpropagation during training. • This unit has a few advantages. • First, the ACU is a generalization of convolution; it can define not only all conventional convolutions, but also convolutions with fractional pixel coordinates; it can freely change the shape of the convolution, which provides greater freedom to form CNN structures. • Second, the shape of the convolution is learned while training and there is no need to tune it by hand. • Third, the ACU can learn better than a conventional unit, simply by changing the conventional convolution to an ACU. • Code is available at https://github.com/jyh2986/Active-Convolution.
  • 44. Active Convolution: Learning the Shape of Convolution for Image Classification Concept of the ACU. Black dots represent each synapse. The ACUs output is the summation of values in all positions pk multiplied by weight. The position is parameterized by pk . The ACU can define more diverse forms of the receptive fields for convolutions with learnable positions parameters. Inspired by the nervous system, call one acceptor of the ACU the synapse. Position parameters can be differentiated, and the shape can be learned through backpropagation.
  • 45. Active Convolution: Learning the Shape of Convolution for Image Classification • ACU is considered a generalization of the convolution unit. • Any conventional convolution is represented with ACU by setting positions of synapses properly and fixing all positions. • Dilated convolution can be also represented by multiplying the dilation factor with the position parameters. • Compared to a conventional convolution, the ACU can generate fractional dilated convolutions and be used to directly calculate the results of the interpolated convolution. • It can also be used to define K synapses without any restriction (e.g., cross-shaped convolution with five synapses, or a circular convolution with many synapses). Comparison of a conventional convolution unit with the ACU. (a) Conventional convolution unit with 4 input neurons and two output neurons. (b) Unlike the convolution unit, the synapses of the ACU can be connected at inter-neuron positions and are movable.
  • 46. Active Convolution: Learning the Shape of Convolution for Image Classification • At the network level, ACU converts a discrete input space to a continuous one. • Since the ACU uses bilinear interpolation between adjacent neurons, synapses can connect inter-neuron spaces. • This lends greater representational power to convolution units. • The position parameters control the synapses that connect neuron spaces, and the synapses can move around the neuron space to reduce error. • A convolution unit has a number of learnable filters, and each filter is convolved with its receptive field. • ACU has a learnable position parameter θp, which is the set of positions of the synapses. Coordinate system of interpolation. m, n represent the base position of the convolution αk , and βk is the displacement of the kth synapse.
  • 47. Warped Convolutions: Efficient Invariance to Spatial Transformations • Warped convolutions, a simple and exact construction, yet has the same computational complexity that standard convolutions enjoy. • It consists of a constant image warp followed by a simple convolution, which are standard blocks in deep learning toolboxes. • With a carefully crafted warp, the resulting architecture can be made equivariant to a wide range of two-parameter spatial transformations. • Continuous convolution: • Group convolutions: • Image plane • Warped convolutions: • exponential map
  • 48. Warped Convolutions: Efficient Invariance to Spatial Transformations
  • 49. Warped Convolutions: Efficient Invariance to Spatial Transformations First row: Sampling grids that define the warps associated with different spatial transformations. Second row: An example image (a) after warping with each grid (b-d). Third row: A small translation is applied to each warped image, which is then mapped back to the original space (by an inverse warp). Translation in one axis of the appropriate warped space is equivalent to (b) horizontal scaling; (c) planar rotation; (d) 3D rotation around the vertical axis.
  • 50. Deformable Convolutional Networks • Two new modules to enhance the transformation modeling capability of CNNs, namely, deformable convolution and deformable RoI pooling. • Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from the target tasks, without additional supervision. • The modules can replace their counterparts in existing CNNs and can be easily trained end- to-end by standard back-propagation, giving rise to deformable convolutional networks. • Learning dense spatial transformation in deep CNNs is effective for sophisticated vision tasks such as object detection and semantic segmentation. • The code is released at https://github.com/msracver/Deformable-ConvNets.
  • 51. Deformable Convolutional Networks Illustration of the sampling locations in 3 × 3 standard and deformable convolutions. (a) regular sampling grid (green points) of standard convolution. (b) deformed sampling locations (dark blue points) with augmented offsets (light blue arrows) in deformable convolution. (c)-(d) are special cases of (b), showing that the deformable convolution generalizes various transformations for scale,(anisotropic) aspect ratio and rotation.
  • 52. Deformable Convolutional Networks • Both deformable convolution and RoI pooling modules operate on the 2D spatial domain. • The operation remains the same across the channel dimension. • Without loss of generality, the modules are described in 2D here for notation clarity. • The 2D convolution consists of two steps: 1) sampling using a regular grid R over the input feature map x; 2) summation of sampled values weighted by w. • RoI pooling converts an input rectangular region of arbitrary size into fixed size features. • Both deformable convolution and RoI pooling modules have the same input and output. • First, a deep fully convolutional network generates feature maps over the whole input image. • Second, a shallow task specific network generates results from the feature maps. • The DCN idea is augmenting the spatial sampling locations in convolution and RoI pooling with additional offsets and learning the offsets from target tasks.
  • 53. Deformable Convolutional Networks Illustration of 3 × 3 deformable convolution Illustration of 3 × 3 deformable RoI pooling
  • 54. Deformable Convolutional Networks Illustration of 3 × 3 deformable Position-Sensitive (PS) RoI pooling
  • 55. Deformable Convolutional Networks Illustration of the fixed receptive field in standard convolution (a) and the adaptive receptive field in deformable convolution (b), using two layers. Top: two activation units on the top feature map, on two objects of different scales and shapes. The activation is from a 3 × 3 filter. Middle: the sampling locations of the 3 × 3 filter on the preceding feature map. Another two activation units are highlighted. Bottom: the sampling locations of two levels of 3 × 3 filters on the preceding feature map. Two sets of locations are highlighted, corresponding to the highlighted units above.