Camera-Based Road Lane Detection by Deep Learning II

Camera-Based Road Lane
Detection by Deep Learning II
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California

Outline
• Ultra Fast Structure-aware Deep Lane Detection
• Learning Lightweight Lane Detection CNNs by
Self Attention Distillation
• Key Points Estimation and Point Instance
Segmentation Approach for Lane Detection
• Learning to Cluster for Proposal-Free Instance
Segmentation
• Lane Detection and Classification using
Cascaded CNNs
• PolyLaneNet: Lane Estimation via Deep
Polynomial Regression
• Lane Detection: Light Conditions Style Transfer
• End-to-end Lane Detection through
Differentiable Least-Squares Fitting
• Robust Lane Detection from Continuous Driving
Scenes Using Deep Neural Networks
• LineNet: a Zoomable CNN for Crowdsourced
HD Maps Modeling in Urban Environments
• Efficient Road Lane Marking Detection with
Deep Learning
• End to End Video Segmentation for Driving:
Lane Detection For Autonomous Car
• 3D-LaneNet: E2E 3D multiple lane detection
• End-to-End Lane Marker Detection via Row-
wise Classification

Ultra Fast Structure-aware Deep Lane Detection
• ArXiv 2004.11757
• Inspired by human perception, the recognition of lanes under severe occlusion and extreme
lighting conditions is mainly based on contextual and global information.
• Motivated by this observation, propose a novel, simple, yet effective formulation aiming at
extremely fast speed and challenging scenarios.
• Specifically, treat the process of lane detection as a row-based selecting problem using
global features.
• With the help of row-based selecting, formulation could significantly reduce the
computational cost.
• Using a large receptive field on global features, could also handle the challenging scenarios.
• Moreover, based on the formulation, also propose a structural loss to explicitly model the
structure of lanes.
• A light-weight version could even achieve 300+ frames per second with the same resolution,
which is at least 4x faster than previous state-of-the-art methods.
• code is available at https://github.com/cfzd/Ultra-Fast-Lane-Detection.

Illustration of selecting on the left and right lane. In the right
part, the selecting of a row is shown in detail. Row anchors
are the predefined row locations, and formulation is defined
as horizontally selecting on each of row anchor. On the
right of the image, a background gridding cell is introduced
to indicate no lane in this row.

Illustration of our formulation and conventional segmentation. Our formulation is
selecting locations (grids) on rows, while segmentation is classifying every pixel. The
dimensions used for classifying are also different, which is marked with red. The
proposed formulation significantly reduces the computational cost. Besides, the
proposed formulation uses global feature as input, which has larger receptive field
than segmentation, thus addressing the no-visual-clue problem
formulation Segmentation

Overall architecture. The auxiliary branch is shown in the upper part, which is only valid when
training. The feature extractor is shown in the blue box. The classification-based prediction and
auxiliary segmentation task are illustrated in the green and orange boxes, respectively. The group
classification is conducted on each row anchor.

Learning Lightweight Lane Detection CNNs by
• https://github.com/cardwing/Codes-for-Lane-Detection
• Without learning from much richer context, lane detection models often fail in challenging
scenarios, e.g., severe occlusion, ambiguous lanes, and poor lighting conditions.
• present a novel knowledge distillation approach, i.e., Self Attention Distillation (SAD), which
allows a model to learn from itself and gains substantial improvement without any additional
supervision or labels.
• Specifically, observe that attention maps extracted from a model trained to a reasonable
level would encode rich contextual information.
• The valuable contextual information can be used as a form of ‘free’ supervision for
further representation learning through performing top down and layer-wise attention
distillation within the network itself.
• SAD can be easily incorporated in any feedforward convolutional neural networks (CNN) and
does not increase the inference time.
• validate SAD on three popular lane detection benchmarks (TuSimple, CULane and BDD100K)
using lightweight models such as ENet, ResNet18 and ResNet-34.

Attention maps of the ENet [17] before and after applying self attention distillation. Here,
we extract the attention maps from the four stages/blocks following the design of ENet
model. Note that self attention distillation is added in the 40 K episodes.

Attention maps of the block 4 of the ENet model
using different mapping functions.
An instantiation of using SAD. E1∼E4 comprise the encoder of ENet, D1 and D2 comprise the decoder of
ENet. add a small network to predict the existence of lanes, denoted as P1. AT-GEN is the attention generator.

Attention maps of ENet with and without self attention distillation. Both networks with and
without SAD are trained up to 60K episodes. SAD is applied to ENet at 40K training episodes.

Key Points Estimation and Point Instance
• https://github.com/koyeongmin/PINet
• Current methods have critical deficiencies such as the limited number of detectable lanes
and high false positive.
• In especial, high false positive can cause wrong and dangerous control.
• In this paper, propose a lane detection method for the arbitrary number of lanes using the
deep learning method, which has the lower number of false positives than other recent lane
detection methods.
• The architecture of the proposed method has the shared feature extraction layers and
several branches for detection and embedding to cluster lanes.
• The proposed method can generate exact points on the lanes, and cast a clustering problem
for the generated points as a point cloud instance segmentation problem.
• The proposed method is more compact because it generates fewer points than the original
image pixel size.
• proposed post processing method eliminates outliers successfully and increases the
performance notably.

The proposed framework. Given an input image, PINet
predict three value, confidence, offset, and feature. From
confidence and offset outputs, exact points on the lanes can
be predicted, and the feature output distinguishes the
predicted points into each instance. Finally, the post
processing module is applied, and it generates smooth lane.

The detailed network training procedure. It has three main parts. 512x256 size input data is compressed
by the resizing layer, and the compressed input is passed to feature extraction layer. Three output
branches are applied at end of each hourglass block, and they predict confidence, offset, and instance
feature for each grid. Loss function can be calculated from outputs of each hourglass block.

The hourglass block and
bottleneck layer architecture.
The hourglass block consist
three types of bottleneck
layers, same, up-sampling,
down-sampling. Output
branches are applied at end
of hourglass layer, and the
confidence output is
forwarded to next block.

The result of the post processing. (a)
is input image, and (b) is raw output
out PINet. In (b), the blue lane consist
of some outliers and other lane can
be distinguished. In (c), the result of
the proposed post processing
method, outliers are eliminated, and
only smooth longest lanes remain.

The explanation about the post processing. There are no
other point in the margin that is made by the straight line
connecting point S and A, but margin of point S and B
consist of 2 other points. As a result, point B is selected.

Learning to Cluster for Proposal-Free Instance Segmentation
• https://github.com/GT-RIPL/L2C
• This work proposed a novel learning objective to train a deep neural network to perform
end-to-end image pixel clustering.
• applied the approach to instance segmentation, which is at the intersection of image
semantic segmentation and object detection.
• utilize the most fundamental property of instance labeling – the pairwise relationship
between pixels – as the supervision to formulate the learning objective, then apply it to train
a fully convolutional network (FCN) for learning to perform pixel-wise clustering.
• The resulting clusters can be used as the instance labeling directly.
• To support labeling of an unlimited number of instance, further formulate ideas from graph
coloring theory into the proposed learning objective.
• The evaluation on the Cityscapes dataset demonstrates strong performance and therefore
proof of the concept.
• Moreover, approach won the second place in the lane detection competition of 2017 CVPR
Autonomous Driving Challenge and was the top performer without using external data.

address the labeling problem by formulating a novel learning objective. It guides the
fully convolutional networks to learn to perform instance labeling

The example outputs of lane detection.
The colors represent different instance IDs.
The outputs for each pixel is a 6 + 1
dimensional vector, which represents the
probability distribution of this pixel being
assigned to a certain ID. learning
objective guides distribution function to
output a similar distribution for the pixels
on the same lane line, and vise versa.
During testing time, the pixel will be
assigned to an ID with highest probability.
Given a pair of pixels pi and pj , their corresponding
output distributions are denoted as Pi = f(pi) =
[ti,1..ti,n] and Pj = f(pj ) = [tj,1..tj,n], where n is the
number of indices available for labeling.

The concept of how graph coloring is related to instance ID assignment

The network architecture

The visualization of the lane detection on Tusimple dataset (validation split). The red lines in top row are
predictions, while the green lines are the ground-truth. The second row shows the raw outputs from
network. The colors represent the assigned IDs.

Lane Detection and Classification using
Cascaded CNNs
• https://github.com/fabvio/Cascade-LD
• https://github.com/fabvio/TuSimple-lane-classes
• As many other computer vision based tasks, convolutional neural networks
(CNNs) represent the state-of-the-art technology to indentify lane
boundaries.
• However, the position of the lane boundaries w.r.t. the vehicle may not
suffice for a reliable positioning, as for path planning or localization
information regarding lane types may also be needed.
• In this work, present an end-to-end system for lane boundary identification,
clustering and classification, based on two cascaded neural networks, that
runs in real-time.
• To build the system, 14336 lane boundaries instances of the TuSimple
dataset for lane detection have been labelled using 8 different classes.

Cascaded CNNs

Cascaded CNNs
From top to bottom: original image, instance segmentation,
classification. For instance segmentation, different colors
represent different boundaries. For classification, green
represents dashed lanes, yellow double-dashed, red
continuous.

PolyLaneNet: Lane Estimation via Deep
• https://github.com/lucastabelini/PolyLaneNet
• Since methods for lane detection have to work in real time (+30 FPS), they
not only have to be effective (i.e., have high accuracy) but they also have to
be efficient (i.e., fast).
• In this work, present a novel method for lane detection that uses as input an
image from a forward-looking camera mounted in the vehicle and outputs
polynomials representing each lane marking in the image, via deep
polynomial regression.
• The proposed method is shown to be competitive with existing state-of-the-
art methods in the TuSimple dataset, while maintaining its efficiency (115
FPS).
• Additionally, extensive qualitative results on two additional public datasets
are presented, alongside with limitations in the evaluation metrics used by
recent works for lane detection.

Overview of the proposal method. From left to right: the model
receives as input an image from a forward-looking camera and
outputs information about each lane marking in the image

Lane Detection in Low-light Conditions Using an Efficient
Data Enhancement : Light Conditions Style Transfer
• https://github.com/Chenzhaowei13/Light-Condition-Style-Transfer
• Although multi-task learning and contextual-information-based methods have
been proposed to solve lane detection, they either require additional manual
annotations or introduce extra inference overhead respectively.
• In this paper, propose a style-transfer-based data enhancement method, which
uses Generative Adversarial Networks (GANs) to generate images in low-light
conditions, that increases the environmental adaptability of the lane detector.
• solution consists of three parts: the proposed SIM-CycleGAN, light conditions style
transfer and lane detection network.
• It does not require additional manual annotations nor extra inference overhead.
• validated methods on the lane detection benchmark CULane using ERFNet.
• Empirically, lane detection model trained using method demonstrated adaptability
in low-light conditions and robustness in complex scenarios.

The main framework of our method. The proposed SIM-CycleGAN is shown on the left.
The generator GA transfer images from suitable light conditions to low-light conditions,
while the generator GB transfer in the opposite way. The discriminator DA and DB feed
the single scalar value(real or fake) back to generators. The middle section shows light
conditions style transfer from suitable light conditions to low-light conditions by the
trained SIM-CycleGAN. Lane detection model is shown on the right, whose baseline is
ERFNet. We add lane exist branch for better performance.

Generator architecture, composed of convolution layers,
residual blocks and deconvolution layers. Convolution layers
record the changing information of scale in the encoding
process and maps it to the corresponding operation in the
decoding process.

lane detection model architecture. The decoder outputs
probability maps of different lane markings, and the second
branch predicts the existence of lane.

The probability maps from our method and other methods. The brightness of the pixel indicates
the probability of this pixel belonging to lanes. It can be clearly seen from this figure, in low-light
conditions, the probability maps generated by our method is more pronounced and more accurate.

End-to-end Lane Detection
through Differentiable Least-Squares Fitting
• A method to train a lane detector in an e2e manner, directly regressing the lane parameters.
• The architecture consists of two components: a deep network that predicts a segmentation-like
weight map for each lane line, and a differentiable least-squares fitting module that returns for
each map the parameters of the best-fitting curve in the weighted least-squares sense.
• These parameters can subsequently be supervised with a loss function of choice.
• It relies on that it is possible to backpropagate through a least-squares fitting procedure.
• This leads to an end-to-end method where the features are optimized for the true task of
interest: the network implicitly learns to generate features that prevent instabilities during the
model fitting step, as opposed to two-step pipelines that need to handle outliers with heuristics.
• Additionally, the system is not just a black box but offers a degree of interpretability because the
intermediately generated segmentation-like weight maps can be inspected and visualized.
• Code: http://github.com/wvangansbeke/LaneDetection_End2End.

• Lane detection is typically tackled with a two-step pipeline in which a segmentation mask of
the lane markings is predicted first, and a lane line model (like a parabola or spline) is fitted
to the post-processed mask next.
• The problem with such a two-step approach is that the parameters of the network are not
optimized for the true task of interest (estimating the lane curvature parameters) but for a
proxy task (segmenting the lane markings), resulting in sub-optimal performance.
Overview of the architecture

Least-squares fitting.
Weighted least-squares fitting

Robust Lane Detection from Continuous Driving
• https://github.com/qinnzou/Robust-Lane-Detection
• Most methods focus on detecting the lane from one single image, and often lead to
unsatisfactory performance in handling some extremely-bad situations such as
heavy shadow, severe mark degradation, serious vehicle occlusion, and so on.
• In fact, lanes are continuous line structures on the road.
• Consequently, the lane that cannot be accurately detected in one current frame may
potentially be inferred out by incorporating information of previous frames.
• To this end, investigate lane detection by using multiple frames of a continuous
driving scene, and propose a hybrid deep architecture by combining the
convolutional neural network (CNN) and the recurrent neural network (RNN).
• Specifically, information of each frame is abstracted by a CNN block, and the CNN
features of multiple continuous frames, holding the property of time-series, are then
fed into the RNN block for feature learning and lane prediction.
• Extensive experiments on two large-scale datasets demonstrate that, the proposed
method outperforms the competing methods in lane detection.

Architecture

Encoder network in (a) UNet-ConvLSTM and
(b) SegNet-ConvLSTM. Skip connections exist
between convolutional layers in encoder and
their matching layers in decoder.

Visual comparison of the lane-detection results. Row 1: ground truth. Row 2: SegNet. Row
3: U-Net. Row 4: SegNetConvLSTM. Row 5: U-Net-ConvLSTM. Row 6: original image.

LineNet: a Zoomable CNN for Crowdsourced High
Definition Maps Modeling in Urban Environments
• HD maps play an important role in modern traffic scenes.
• Development of HD maps coverage grows slowly
because of the cost limitation.
• To model HD maps, a CNN with a prediction layer and a
zoom module, called LineNet, is designed for SoA lane
detection in an unordered crowdsourced image dataset.
• TTLane, is a dataset for efficient lane detection in urban
road modeling applications.
• Combining LineNet and TTLane, a pipeline to model HD
maps with crowdsourced data.
• The maps can be constructed precisely even with
inaccurate crowdsourced data. Annotation of (a). dash lanes. (b). double lanes.
(c). occlusion segments. (d). road boundaries.

l Use a pre-trained ResNet model
with dilated convolution as the
feature extractor.
l A dilated convolution strategy
helps to increase receptive field,
which is essential when
detecting dashed lanes.
l The Line Prediction (LP) layer is
designed for accurate lane
positioning and classification.

Different branches’ outputs of the LP layer, with two samples.

• The Zoom Module is the second feature of LineNet.
• With this, LineNet can alter the FoV to an arbitrarily
size without changing network structure.
• It splits the data flow through the CNN into two
streams: (i) a thumbnail CNN; and (ii) a high-
resolution cropped CNN.
This figure illustrates the zooming process. Three
columns represent three different zoom levels
(more zoom levels can be added if necessary).

• To achieve nice and smooth lines, points were clustered together and fitted into lines.
• The clustering algorithm named DBSCAN was used with hierarchical distance(HDis).
• The line position from the LP layer was collected and combined with a zoom level.
• The combination is denoted as a tuple a = (x, y, z), where (x, y) is the image coordinate from
line position outputs, and z is the stage’s zoom ratio used to predict the line position.
Line points are gradually clustered
together from near to far.

road modeling

Efficient Road Lane Marking Detection with
Deep Learning
• A Lane Marking Detector (LMD) using a deep CNN to extract robust lane marking features.
• To improve its performance for lower complexity, the dilated convolution is adopted.
• A shallower and thinner structure is designed to decrease the computational cost.
• Post-processing algo to construct 3rd-order polynomial models to fit into the curved lanes.
Flowchart of the proposed LMD system.

Efficient Road Lane Marking Detection with
Deep Learning

End to End Video Segmentation for Driving :
• Statistics show that unintended lane departure is a leading cause of worldwide motor vehicle
collisions, making lane detection the most promising and challenge task for self-driving.
• People are combining deep learning with computer vision to solve self-driving problems.
• a Global Convolution Networks (GCN) model is used to address both classification and
localization issues for semantic segmentation of lane.
• Using color-based segmentation is presented and the usability of the model is evaluated.
• A residual-based boundary refinement and Adam optimization is also used to achieve state-
of-art performance.
• As normal cars could not afford GPUs on the car, and training session for a particular road
could be shared by several cars.
• A real time video transfer system to get video from the car, get the model trained in edge
server (which is equipped with GPUs), and send the trained model back to the car.

An overview of the whole pipeline.

3D-LaneNet: E2E 3D multiple lane detection
• This network directly predicts the 3D layout of lanes in a road scene from a single image.
• It is a first attempt to address this task with on-board sensing instead of relying on pre-
mapped environments.
• 3D-LaneNet, applies two new concepts: intra-network inverse-perspective mapping (IPM)
and anchor-based lane representation.
• The intra- network IPM projection facilitates a dual-representation info. flow in both regular image-
view and top-view.
• An anchor-per-column output representation enables e2e approach replacing common heuristics
such as clustering and outlier rejection.
• It outputs in each longitudinal road slice, the confidence that a lane passes through the slice
and its 3D curve in camera coordinates.
• Each output is associated to an anchor in analogy to single-shot, anchor-based object
detection methods such as SSD and YOLO.
• It explicitly handles complex situations such as lane merges and splits.

(a) Schematic illustration of
the end-to-end approach
and lane detection result
example on top-view. (b)
Projection of the result on
the original image.

Camera position and road projection plane.
Assume known intrinsic camera parameters
(e.g. focal length, center of projection) .
Also assume that the camera is installed at zero degrees roll
relative to the local ground plane.
Lane centerlines are marked in blue and
delimiters in yellow dotted curves.
To define the task as detecting either the set
of lane centerlines and/or lane delimiters
given the image.

The dual context module.
A main building block in the architecture is the projective
transformation layer. This layer is a specific realization,
with slight variations, of the spatial transformer module.
It performs a differentiable sampling of an input feature
map UI , corresponding spatially to the image plane, to an
output feature map UTcorresponding spatially to a virtual
top view of the scene. The differential sampling is achieved
through a grid for transforming an image to top-view.
The dual context module uses the projective transformation
layer to create highly descriptive feature maps. Info. flows
from multi-channel feature maps UI and VT corresponding to
image-view and top- view respectively.

3D-LaneNet network architecture.
VGG16

Output representation. Note that the number of
anchors (N ) equals the output layer width.
Note: Per anchor, the network outputs 3 types (t) of lane
descriptors (confidence and geometry), the first two (c1, c2)
represent lane centerlines and the third type (d) a lane
delimiter. Assigning 2 possible centerlines per anchor yields
the network support for merges and splits which may often
result in having the centerlines of two lanes coincide at Yref
and separating at different road positions. The topology of
lane delimiters is generally more complicated compared to
centerlines and it cannot capture all situations.
To define the anchors by equally spaced vertical (longitudinal)
lines in x-positions.

Random synthetic data generation. (a) Surface (b) Road topology and curvature
(c) Road on surface (d) Rendered scenes.

Examples of 3D lane centerline estimation results (with confidence > 0.5) on test images from the synthetic-3D-
lanes dataset. Ground truth (blue) and method result (red) shown in each image alongside a 3D visualization.

End-to-End Lane Marker Detection via Row-wise Classification
• The conventional approaches for the lane marker detection problem perform a pixel-
level dense prediction task followed by sophisticated post-processing that is inevitable
since lane markers are typically represented by a collection of line segments without
thickness.
• In this paper, propose a method performing direct lane marker vertex prediction in an
end-to-end manner, i.e., without any post-processing step that is required in the pixel-
level dense prediction task.
• Specifically, translate the lane marker detection problem into a row-wise classification
task, which takes advantage of the innate shape of lane markers but, surprisingly, has
not been explored well.
• In order to compactly extract sufficient information about lane markers which spread
from the left to the right in an image, devise a novel layer, utilized to successively
compress horizontal components so enables an end-to-end lane marker detection
system where the final lane marker positions are simply obtained via argmax operations
in testing time.
• Experimental results demonstrate on two popular lane marker detection benchmarks,
i.e., TuSimple and CULane.

The E2E-LMD framework for lane marker detection

The E2E-LMD architecture for lane marker detection. We extend general encoder-decoder architectures by adding successive
horizontal reduction modules for end-to-end lane marker detection. Numbers under each block denote spatial resolution and
channels. (a) Arrows with HRM denote a horizontal reduction module of (b). Arrows with Conv are output convolution with 1 × 1.
Dashed arrows denote the global average pooling with a fully connected layer. (b) HRM is utilized to compress the horizontal
representation. r denotes the pooling ratio for width part. Conv kernel size k is set as 3 except the last HRM layer which set as 1.

Camera-Based Road Lane Detection by Deep Learning II

Camera-Based Road Lane Detection by Deep Learning II

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Camera-Based Road Lane Detection by Deep Learning II

Ähnlich wie Camera-Based Road Lane Detection by Deep Learning II (20)

Mehr von Yu Huang

Mehr von Yu Huang (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Camera-Based Road Lane Detection by Deep Learning II