SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
Driving Behavior for ADAS
and Autonomous Driving VII
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents
• INFER: INtermediate representations for FuturE pRediction
• Deep Imitative Models for Flexible Inference, Planning, and Control
• Multi-Agent Tensor Fusion for Contextual Trajectory Prediction
• AGen: Adaptable Generative Prediction Networks for Autonomous Driving
• Conditional Generative Neural System for Probabilistic Trajectory Prediction
• Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian
Generative Modeling
• Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction
via Adversarial Learning
DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
• This is a Deep Stochastic IOC (Inverse Optimal Control) RNN Encoder- decoder framework,
DESIRE, for the task of future predictions of interacting agents in dynamic scenes.
• DESIRE predicts future locations of objects in multiple scenes by 1) accounting for the multi-
modal nature of prediction (i.e., given the same context, future may vary), 2) foreseeing the
future outcomes and make a strategic prediction, and 3) reasoning not only from the past
motion history, but also from the scene context as well as the interactions among the agents.
• DESIRE achieves these computationally efficient in a single E2E trainable NN model.
• The model first obtains a diverse set of hypothetical future prediction samples employing a
conditional variational auto-encoder (CVAE), ranked and refined by the following RNN
scoring-regression module.
• Samples are scored by accounting for accumulated future rewards, which enables better
long-term strategic decisions similar to IOC frameworks.
• An RNN scene context fusion module jointly captures past motion histories, the semantic
scene context and interactions among multiple agents.
• A feedback mechanism iterates over ranking and refinement to boost prediction accuracy.
CVPR2017
DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
(a) A driving scenario: The white van may steer into left or right while trying to avoid a collision to other
dynamic agents. DESIRE produces accurate future predictions (shown as blue paths) by tackling multi-
modality of future prediction while accounting for a rich set of both static and dynamic scene contexts. (b)
DESIRE generates a diverse set of hypothetical prediction samples, and then ranks and refines them
through a deep IOC network.
DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
• Sample Generation Module
• Future prediction can be inherently ambiguous and has uncertainties as multiple plausible
scenarios can be explained under the same past situation (e.g., a vehicle heading toward an
intersection can make different turns);
• Thus, learning a deterministic function f that directly maps past trajectories to future trajectories
will under-represent potential prediction space and easily over-fit to training data.
• Moreover, a naively trained network with a simple loss will produce predictions that average out
all possible outcomes.
• This sample generation module produces a set of diverse hypotheses critical to capturing the
multimodality of the pre- diction task, through a effective combination of CVAE and RNN
encoder-decoder.
• RNNs are implemented using gated recurrent units (GRU) to learn long-term dependencies, yet
they can be easily replaced with other popular RNNs like long short-term memory units (LSTM).
• The CVAE module generates diverse set of future trajectories based on a past trajectory.
• Two loss terms: reconstruction loss and KLD loss.
DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
The overview of DESIRE. First, DESIRE generates multiple plausible prediction samples Yˆ via a CVAE-based
RNN encoder-decoder (Sample Generation Module). Then the following module assigns a reward to the
prediction samples at each time-step sequentially as IOC frameworks and learns displacements vector ∆Yˆ to
regress the prediction hypotheses (Ranking and Refinement Module). The regressed prediction samples are
refined by iterative feedback. The final prediction is the sample with the maximum accumulated future
reward. Note that the flow via aquamarine-colored paths is only available during the training phase.
DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
• Ranking and Refinement Module
• Predicting a distant future can be far more challenging than predicting one close by.
• To tackle this, adopt the concept of decision-making process in reinforcement learning (RL) where
an agent is trained to choose its actions that maximizes long-term rewards to achieve its goal.
• Instead of designing manually, however, IOC learns an unknown reward function.
• It designs an RNN model that assigns rewards to each prediction hypothesis and measures their
goodness based on the accumulated long-term rewards.
• Thereafter, also directly refine prediction hypotheses by learning displacements to the actual
prediction through another FC layer.
• Lastly, the module receives iterative feedbacks from regressed predictions and keeps adjusting so
that it produces precise predictions at the end.
• There are two loss terms in training the IOC ranking and refinement module:
• Cross-entropy loss;
• Regression loss.
• The total loss:
DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
• Scene Context Fusion
• It is important that the RNN must contain the
information about 1) individual past motion
context, 2) semantic scene context and 3) the
interaction between multiple agents, in order to
provide proper hidden representations that can
score and refine a prediction;
• It implements a spatial grid based pooling layer
similar to the SP layer in social LSTM.
• Instead of using the max pooling, operation with
rectangular grids, adopt log-polar grids with an
average pooling.
• Combined with CNN features, the SCF module
provides the RNN decoder with both static and
dynamic scene information.
• It learns consistency between semantics of
agents and scenes for reliable prediction.
Details of Scene Context Fusion (SCF) unit in
RNN Decoder2. Note that the input to the
GRU cell at each time-step integrates multiple
cues (i.e., the dynamics of agents, scene
context and interaction between agents).
DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
KITTI results (left 3 rows): The row 1&2 in (b) show highly reactive nature of RNN ED-SI (i.e., prediction turns after it hits
near non-drivable area). On the contrary, DESIRE shows its long-term prediction capability by considering potential future
rewards. DESIRE-SI also produces more convincing predictions in the presence of other vehicles. Stanford Drone Data
results (right 3 rows): The row 1 shows the multi-modal nature of the prediction problem. While the cyclist is making a
right turn, it is also possible that he turns around the round-about (denoted with arrow). DESIRE-SI predicts such equally
possible future as the top prediction, while covering the ground truth future within top 10 predictions. The row 2&3 also
show that DESIRE-SI provides superior predictions by reasoning about both static and dynamic scene contexts.
INFER: INtermediate representations for
FuturE pRediction
• 2019.3
• In urban driving scenarios, forecasting future trajectories of surrounding vehicles is of
paramount importance.
• While several approaches for the problem have been proposed, the best-performing ones
tend to require extremely detailed input representations (e.g. image sequences).
• But, such methods do not generalize to datasets they have not been trained on.
• Here is intermediate representations that are particularly well-suited for future prediction.
• As opposed to using texture (color) information, it relies on semantics and train an AR model
to accurately predict future trajectories of traffic participants (vehicles).
• Using semantics provides a significant boost over techniques that operate over raw pixel
intensities/disparities.
• Uncharacteristic of state-of-the-art approaches, this represents and models generalize to
completely different datasets, collected across several cities, and also across countries where
people drive on opposite sides of the road (left-handed vs right-handed driving).
• Code and data: https://rebrand.ly/INFER-results.
INFER: INtermediate representations for
FuturE pRediction
• The design philosophy is based on the following three desired characteristics that
knowledge representation systems must possess:
• 1) Representational adequacy: to adequately represent task- relevant information.
• 2) Inferential adequacy: to infer traits not be inferred from the original unprocessed data.
• 3) Generalizability: necessarily generalize to other data distributions (for the same task).
• The model takes as input an intermediate representation of the scene semantics
(intermediate, because it is neither too primitive, e.g. raw pixel intensities, nor too abstract
e.g. velocities, steering angles).
• Using these intermediate representations, predict the plausible future locations of the
Vehicles of Interest (VoI).
• The proposed representation does not rely heavily on the camera viewing angle, as camera
mounting parameters (height, viewing angle, etc.) vary across datasets, and this approach
hopes to be robust to such variations.
INFER: INtermediate representations for
FuturE pRediction
First generate intermediate representations by fusing monocular images with depth information (from either stereo
or Lidar), obtaining semantic and instance segmentation from monocular image, followed by an orthographic
projection to bird’s-eye view. The generated intermediate representations are fed through the network, and finally it
results in prediction of the target vehicle’s trajectory registered in the sensor coordinate frame.
INFER: INtermediate representations for
FuturE pRediction
• It formulates trajectory prediction as a per-cell regression over an occupancy grid.
• It uses the intermediate representations to simplify the objective and help the network
generalize better.
• It trains an autoregressive model that outputs the VoI’s position on an occupancy grid,
conditioned on the previous intermediate representations.
• It uses a simple Encoder- Decoder model connected by a convolutional LSTM to learn
temporal dynamics.
• It adds skip connections between corresponding encoder and decoder branches.
• The proposed trajectory prediction scheme takes as input a sequence of intermediate
representations and produces a single channel output occupancy grid.
• The training objective comprises two terms: reconstruction loss term, and safety loss term.
INFER: INtermediate representations for
FuturE pRediction
The qualitative results from the validation fold of KITTI showcase the efficacy of INFER-Skip in using the intermediate
representation to predict complex trajectories. For example, in the left most plot, the network is able to accurately
predict the unseen second curve in the trajectory (predicted and ground truth trajectories are shown in red and blue
color, respectively). The green and red 3D bounding boxes indicate start of preconditioning and start of prediction of the
vehicle of interest (VoI), respectively. It is worth noting that the predicted trajectories are well within the lane (dark gray)
and road region (cyan), while avoiding collisions with the obstacles (magenta).
Deep Imitative Models for Flexible Inference,
Planning, and Control
• Imitation learning produces behavioral policies with limited flexibility to accommodate new
goals at test-time.
• In contrast, model-based reinforcement learning (MBRL) can plan to arbitrary goals using a
predictive dynamics model learned from data.
• It proposes “imitative models” to combine the benefits of imitation learning and MBRL.
• Imitative models are probabilistic predictive models able to plan interpretable expert-like
trajectories to achieve arbitrary goals.
• Inference with them resembles trajectory optimization in model-based reinforcement
learning, and learning them resembles imitation learning.
• This method substantially outperforms six direct imitation learning approaches (five of them
prior work) and an MBRL approach in a dynamic simulated autonomous driving task, and
can be learned efficiently from a fixed set of expert demonstrations without additional online
data collection.
2019.6
Deep Imitative Models for Flexible Inference,
Planning, and Control
To apply the algorithm to navigation in CARLA. Left: Image depicting the current scene, in which the light recently
turned from green to red. Left-Middle: Plot showing LIDAR observations of the agent, the goals it received from a
route planner, and the plan produced by the method. The model smoothly chooses between goals based on its
prior of expert behavior. Here, the stationary agents chooses to accelerate to follow the vehicle ahead. Right-Middle:
Image depicting an intersection scene. Right: LIDAR observations, goals, cost map of simulated potholes, and a
variety of plans the method produces, colored by the planner’s preference. Although the imitative model never
observed pothole-avoidance behavior, it is able to plan a reasonable, on-road path around them with a test-time
cost map. Its preferred plan enters the intersection and steers around a pothole.
Deep Imitative Models for Flexible Inference,
Planning, and Control
• Reinforcement learning (RL) algorithms automatically learn desirable behaviors from raw
sensory inputs with minimal engineering; However, RL generally requires online learning: the
agent must collect more data with its latest strategy, use it to update a model, and repeat.
• Deploying a partially-trained policy on a real-world autonomous system, can be dangerous.
• Learning behavior should happen offline from expert demonstrations.
• How to incorporate such demo into an autonomous car, to perform a variety of tasks?
• One is imitation learning (IL), learning policies that stay near the expert’s distribution.
• Another is model-based methods, which can use the data to fit a dynamics model, and in
principle can be used with planning algorithms to achieve any user-specified goal.
• However, model-based (MB) and model-free RL algorithms are vulnerable to distributional
drift: when acting accord. to the learned model or policy, the agent visits states different
from those in training, and in those unlikely to determine an effective course of action.
• This is problematic when the data intentionally excludes adverse events such as crashes.
• Therefore, MBRL algorithms usually require online collection and training.
Deep Imitative Models for Flexible Inference,
Planning, and Control
Imitative planning to goals: multi-goal
waypoint planning enables fine-grained
control of the plans.
Costs can be assigned to “potholes” only seen at test-time;
expert demonstrations with potholes were never observed.
The planner prefers routes around the potholes.
Deep Imitative Models for Flexible Inference,
Planning, and Control
• IL algorithms use expert demonstration data and, despite similar drift shortcomings, can
sometimes learn effective policies without online data collection. However, standard IL offers
little task flexibility since it only predicts low-level behavior.
• While several works augmented IL with goal conditioning, these goals must be specified in
advance during training, and are typically simple (e.g., turning left or right).
• The goal is to devise an algorithm that combines the advantages of IL and MBRL by offering
the flexibility to achieve new user-specified goals and the ability to learn from offline data.
• By learning a deep conditional probabilistic forecasting model from expert data, it captures
the distribution of expert behaviors without using manually designed reward functions.
• To plan to a goal, this method infers the most probable expert state trajectory under a
posterior distribution induced by the model and a task-specifying goal distribution.
• By incorporating a model-based representation, it can easily plan to previously unseen user-
specified goals while behaving similar to the expert, and can be flexibly repurposed to
perform a variety of test-time tasks without any additional training.
Deep Imitative Models for Flexible Inference,
Planning, and Control
Illustration of the method applied to autonomous driving. This method trains an imitative model
from a dataset of expert examples. After training, the model is repurposed as an imitative planner.
At test-time, a route planner provides waypoints to the imitative planner, which computes expert-
like paths to each goal. The best plan is chosen according to the planning objective and provided
to a low-level PID-controller in order to produce steering and throttle actions.
Deep Imitative Models for Flexible Inference,
Planning, and Control
Tolerating bad waypoints. The planner prefers waypoints in the distribution of expert
behavior (on the road at a reasonable distance). Columns 1, 2: Planning with ½ decoy
waypoints. Columns 3,4: Planning with all waypoints on the wrong side of the road.
Deep Imitative Models for Flexible Inference,
Planning, and Control
Test-time plans prefer steering around potholes.
Table: Robustness to waypoint noise and test-time pothole adaptation.
The method is robust to waypoints on the wrong side of the road, and fairly
robust to decoy waypoints. The method is flexible enough to safely produce
behavior not demonstrated (pothole avoidance) by incorporating a test-
time cost. Ten episodes are collected in each Town.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• Accurate prediction of others’ trajectories is essential for autonomous driving.
• Trajectory prediction is challenging because it requires reasoning about agents’ past
movements, social interactions among varying numbers and kinds of agents, constraints
from the scene context, and the stochasticity of human behavior.
• This approach models these interactions and constraints jointly within a Multi-Agent
Tensor Fusion (MATF) network.
• Specifically, the model encodes multiple agents’ past trajectories and the scene context
into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent
interactions while retaining the spatial structure of agents and the scene context.
• The model decodes recurrently to multiple agents’ future trajectories, using adversarial
loss to learn stochastic predictions.
• Experiments on both highway driving and pedestrian crowd datasets show that the model
achieves state-of- the-art prediction accuracy.
2019.7
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• There are two parallel encoding streams in the MATF architecture.
• One encodes the past trajectories of each individual agent xi independently using single agent
LSTM encoders, and another encodes the static scene context image c with a CNN.
• Each LSTM encoder shares the same set of parameters, so the architecture is invariant to the
number of agents in the scene.
• The outputs of the LSTM encoders are 1-D agent state vectors {x′1 , x′2 , .., x′n } without
temporal structure.
• The output of the scene context encoder CNN is a scaled feature map c′ retaining the spatial
structure of the bird’s-eye view static scene context image.
• Next, the two encoding streams are concatenated spatially into a Multi-Agent Tensor.
• Agent encodings {x′1, x′2, .., x′n} are placed into one bird’s-eye view spatial tensor, which is
initialized to 0 and is of the same shape (width and height) as the encoded scene image c′.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• The dimension axis of the encodings fits into the channel axis of the tensor.
• The agent encodings are placed into the spatial tensor with respect to their positions at the
last time step of their past trajectories.
• This tensor is then concatenated with the encoded scene image in the channel dimension to
get a combined tensor. If multiple agents are placed into the same cell in the tensor due to
discretization, element-wise max pooling is performed.
• The Multi-Agent Tensor is fed into fully convolutional layers, which learn to represent
interactions among multiple agents and between agents and the scene context, while
retaining spatial locality, to produce a fused Multi-Agent Tensor.
• Specifically, these layers operate at multiple spatial resolution scale levels by adopting U-
Net-like architectures to model interaction at different spatial scales.
• The output feature map of this fused model c′′ has exactly the same shape as c′ in width and
height to retain the spatial structure of the encoding.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
The Multi-Agent Tensor encoding is a spatial
feature map of the scene context and multiple
agents from an overhead perspective, including
agent channels (above) and context channels
(below). Agents’ feature vectors (red) output
from single- Agent LSTM encoders are placed
spatially w.r.t. agents’ coordinates to form the
agent channels. The agent channels are aligned
spatially with the context channels (a context
feature map) output from scene context
encoding layers to retain the spatial structure.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• To decode each agent’s predicted trajectory, agent- specific representations with fused
interaction features for each agent {x1′′ , x2′′ , .., xn′′ } are sliced out according to their
coordinates from the fused Multi-Agent Tensor output c′′.
• These agent-specific representations are then added as a residual to the original encoded
agent vectors to form final agent encoding vectors {x1′ + x1′′ , x2′ + x2′′ , ..., xn′ + xn′′ }, which
encode all the information from the past trajectories of the agents themselves, the static
scene context, and the interaction features among multiple agents.
• In this way, this approach allows each agent to get a different social and contextual
embedding focused on itself.
• Importantly, the model gets these embeddings for multiple agents using shared feature
extractors instead of operating n times for n agents.
• Finally, for each agent in the scene, its final vector xi′ + xi′′ is decoded to future trajectory
prediction yiˆ by LSTM decoders.
• Similar to the encoders for each agent, parameters are shared to guarantee that the network
can generalize well when the number of agents in the scene varies.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
Illustration of the Multi-Agent Tensor Fusion (MATF) architecture.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
Qualitative results from Massachusetts driving dataset. Past trajectories are shown in different colors for each
vehicle, followed by 100 sampled future trajectories. Ground truth future trajectories are shown in black, and lane
centers are shown in gray. (a) A complex scenario involving five vehicles; MATF accurately predicts the trajectory and
velocity profile for all. (b) MATF correctly predicts that the red vehicle will complete a lane change. (c) MATF
captures the uncertainty over whether the red vehicle will take the highway exit. (d) As soon as the purple vehicle
passes a highway exit, MATF predicts it will not take that exit. (e) Here, MATF fails to predict the precise ground truth
trajectory; however, the red vehicle is predicted to initiate a lane change maneuver in a very small number of
sampled trajectories.
AGen: Adaptable Generative Prediction
Networks for Autonomous Driving
• In highly interactive driving scenarios, accurate prediction of other road participants is critical
for safe and efficient navigation of autonomous cars.
• Prediction is challenging due to the difficulty in modeling various driving behavior, or
learning such a model.
• The model should be interactive and reflect individual differences.
• Imitation learning methods, such as parameter sharing generative adversarial imitation
learning (PS-GAIL), are able to learn interactive models.
• However, the learned models average out individual differences.
• When used to predict trajectories of individual vehicles, these models are biased.
• An adaptable generative prediction framework (AGen), performs online adaptation of the
offline learned models to recover individual differences for better prediction.
• In particular, combine the recursive least square parameter adaptation algorithm (RLS-
PAA) with the offline learned model from PS-GAIL.
• RLS-PAA has analytical solutions and is able to adapt the model for every single vehicle
efficiently online.
IVS 2019
AGen: Adaptable Generative Prediction
Networks for Autonomous Driving
Offline model learning extracts features for
average driving behavior. Online model adaptation
can perturb the average model to fit the behavior
of a specific driver at a specific time. In particular,
take the offline pretrained policy network of PS-
GAIL as the feature extractor for averaged driving
behavior, while adapting individual vehicle
behavior using RLS-PAA online.
Heterogeneity among drivers needs to be explicitly
accounted to improve prediction accuracy in real
world scenarios. As mentioned earlier, it is
intractable to fit a policy network for every
individual vehicle. To make heterogeneous
prediction scalable, combine offline model
learning with online model adaptation.
AGen: Adaptable Generative Prediction
Networks for Autonomous Driving
(a) Offline training using PS-GAIL. Critic computes
the diff btw the expert trajectory and the roll-out
trajectory from the policy network. PS-GAIL
iteratively updates the policy to minimize the diff
and the critic to maximize the diff.
(b) Online adaptation using RLS-PAA. The critic computes the 2-
norm diff btw the the expert trajectory and the roll-out
trajectory from the policy network. RLS-PAA updates the policy
network to minimize the diff. Either 1-step or 2-step adaptation.
AGen: Adaptable Generative Prediction
Networks for Autonomous Driving
Predicted 2 s trajectories for 22 agents after 3 s adaptions.
Average position RMSE over time
in the 22-agent scenario
Conditional Generative Neural System for
Probabilistic Trajectory Prediction
• Effective understanding of the environment and accurate trajectory prediction of
surrounding dynamic obstacles are critical for intelligent systems such as autonomous
vehicles and wheeled mobile robotics navigating in complex scenarios to achieve safe and
high-quality decision making, motion planning and control.
• Due to the uncertain nature of the future, it is desired to make inference from a probability
perspective instead of deterministic prediction.
• They propose a conditional generative neural system (CGNS) for probabilistic trajectory
prediction to approximate the data distribution, with which realistic, feasible and diverse
future trajectory hypotheses can be sampled.
• The system combines the strengths of conditional latent space learning and variational
divergence minimization, and leverages both static context and interaction information with
soft attention mechanisms.
• Also propose a regularization method for incorporating soft constraints into deep neural
networks with differentiable barrier functions, which can regulate and push the generated
samples into the feasible regions.
2019.7
Conditional Generative Neural System for
Probabilistic Trajectory Prediction
Typical urban traffic scenarios with large uncertainty and interactions
among multiple entities. The shaded areas represent the reachable
sets of possible trajectories. (a) Unsignalized roundabout with four-
way yield signs; (b) Unsignalized intersection with four-way stop signs.
Conditional Generative Neural System for
Probabilistic Trajectory Prediction
• Requirements to generate diverse, realistic future trajectories:
• 1) Context-aware: The system should be able to forecast trajectories which are inside the
traversable regions and collision-free with static obstacles in the environment. For instance,
when the vehicles navigate in a roundabout they need to advance along the curves and
avoid collisions with road boundaries.
• 2) Interaction-aware: The system needs to generate reason- able trajectories compliant to
traffic or social rules, which takes into account interactions and reactions among multiple
entities. For instance, when the vehicles approach an unsignalized intersection, they need to
anticipate others’ possible intentions and motions as well as the influences of their own
behaviors on surrounding entities.
• 3) Feasibility-aware: The system should anticipate naturalistic and physically-feasible
trajectories which are compliant to vehicle kinematics or dynamics constraints, although
these constraints can be ignored for pedestrians due to the large flexibility of their motions.
• 4) Probabilistic prediction: Since the future is full of uncertainty, the system should be able to
learn an approximated distribution of future trajectories close to data distribution and
generate diverse samples which represent various possible behavior patterns.
Conditional Generative Neural System for
Probabilistic Trajectory Prediction
Overviewof theproposedconditionalgenerativeneuralsystem(CGNS)whichconsistsoffourkeycontributions:
(a)adeepfeatureextractorwithsoftattentionmechanism,whichextractsmulti-levelfeaturesfromscene
contextimagesequencesandtrajectories;(b)Anencodertolearnconditionallatentspacerepresentations;(c)A
generator(decoder)tosamplefuturetrajectoryhypotheses;(d)Adiscriminatortodistinguishpredicted
trajectoriesfromgroundtruth.
Conditional Generative Neural System for
Probabilistic Trajectory Prediction
Fig. 3. The visualization of the context image masks and trajectory block attention masks. Particularly, in the
trajectory masks, there are four rows representing 4 historical time steps and 6 columns representing 6
vehicles in the scene. The 1st column corresponds to the predicted vehicle and the others corresponds to
surrounding ones. Brighter colors indicate larger attention weights. The predicted vehicles are indicated with
red bounding boxes. In all the cases, the image masks have a large weight around the predicted vehicle and the
area of its heading direction. In the 1st three cases, only the historical trajectories of the predicted vehicle are
assigned large attention weights, which implies that the other vehicles have little effect in these situations.
However, in the last 3 cases, more attention is paid to other vehicles since there exist strong interactions which
increases the inter-dependency.
Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
• Coordination recognition and subtle pattern prediction of future trajectories play a significant
role when modeling interactive behaviors of multiple agents.
• Due to the essential property of uncertainty in the future evolution, deterministic predictors are
not sufficiently safe and robust.
• In order to tackle the task of probabilistic prediction for multiple, interactive entities, propose a
coordination and trajectory prediction system (CTPS), which has a hierarchical structure
including a macro-level coordination recognition module and a micro-level subtle pattern
prediction module which solves a probabilistic generation task.
• Two types of representation of the coordination variable: categorized and real-valued.
• Bayesian deep learning into generative models to generate diversified prediction hypotheses.
• The proposed system is tested on multiple driving datasets in various traffic scenarios, which
achieves better performance than baseline approaches in terms of a set of evaluation metrics.
• Using the categorized coordination can better capture multi-modality and generate more
diversified samples than the real-valued coordination, while the latter can generate prediction
hypotheses with smaller errors with a sacrifice of sample diversity.
• NNs with weight uncertainty is able to generate samples with larger variance and diversity.
2019.5
Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
Typical highway and urban driving scenarios
where two or more entities coordinate and
interact with each other. The shaded areas
represent possible future motions which
consider multi-modality. (a) Ramp merging and
lane change behaviors on highway scenarios;
(b) Unsignalized roundabout with yield signs;
(c) Unsignalized intersection with stop signs.
Although the contexts are different, they can
be treated as generalized merging scenarios.
Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
• The multi-modal conditional distribution of future trajectories for interactive agents can be
factorized into categorized and real-valued.
• This factorization naturally divides the system into a coordination recognition module
(macro-level) and a subtle pattern prediction module (micro-level).
• The coordination c can not only be categorized to represent meaningful semantics, but also
be real-value vectors to encode the underlying representations.
• If c is categorized, the micro-level module takes c in as an indicator through one-hot
encoding; if c is a real-valued variable, the micro-level module takes c in as an additional
input feature.
• The macro-level module is based on a variational recurrent neural network (VRNN)
followed by a probabilistic classifier.
• And the micro-level module is based on a Coordination-Bayesian Conditional Generative
Adversarial Network (C-BCGAN).
Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
Overview of CTPS: (a) Coordination recognition module: The coordination variable can be discrete categories or continuous
real-valued vectors. The discrete distribution of categorized coordination is obtained by a probabilistic classifier based on latent
features extracted by VRNN. The continuous distribution of real-valued coordination is obtained by maximizing mutual info based
on a VAE-style model. Choose either formulation according to the objective and emphasis in particular tasks; (b) Subtle pattern
prediction module: The model is based on the proposed C-BCGAN in which the generator takes as input the historical info,
coordinator indicator as well as a noise from the normal distribution. Weight uncertainties are incorporated in both generator and
discriminator network.
Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
The visualization of prediction results in the highway scenario. (a) Generation with learned coordination; (b)
Generation with real-valued coordination. Note that, to only predict the longitudinal motions for surrounding
vehicles but both longitudinal and lateral motions for the center vehicle. That is the reason why the predicted
trajectories of surrounding vehicles do not have lateral deviation.
Interaction-aware Multi-agent Tracking and Probabilistic
Behavior Prediction via Adversarial Learning
• In order to enable high-quality decision making and motion planning of intelligent systems such as
robotics and autonomous vehicles, accurate probabilistic predictions for surrounding interactive
objects is a crucial prerequisite.
• Although many research studies have been devoted to making predictions on a single entity, it remains
an open challenge to forecast future behaviors for multiple interactive agents simultaneously.
• In this work, take advantage of the Generative Adversarial Network (GAN) due to its capability of
distribution learning and propose a generic multi-agent probabilistic prediction and tracking
framework which takes the interactions among multiple entities into account, in which all the entities
are treated as a whole.
• However, since GAN is very hard to train, make an empirical research and present the relationship
between training performance and hyperparameter values with a numerical case study.
• The results imply that the proposed model can capture both the mean, variance and multi- modalities
of the ground truth distribution.
• Moreover, apply the proposed approach to a real-world task of vehicle behavior prediction to
demonstrate its effectiveness and accuracy.
• The proposed model trained by adversarial learning can achieve a better prediction performance
than other SoA models trained by traditional supervised learning which maximizes the data likelihood.
• The well-trained model can also be utilized as an implicit proposal distribution for particle filtered
based Bayesian state estimation.
2019.4
Interaction-aware Multi-agent Tracking and Probabilistic
Behavior Prediction via Adversarial Learning
The general diagram of the proposed model, which consists of a generator network and a discriminator network.
Interaction-aware Multi-agent Tracking and Probabilistic
Behavior Prediction via Adversarial Learning
• To apply the proposed approach to solve a trajectory prediction task of interactive on-road
vehicles as an illustrative example, although it can be utilized to solve many other tasks such
as interactive pedestrian trajectory prediction and human-robot interactions.
A typical highway scenario is investigated where the gray car is the ego vehicle which aims
at forecasting future motions of its surrounding vehicles (red, green and yellow ones). The
observations of environment can be obtained by on-board sensors. The approach can also
be adopted in overhead traffic surveillance systems with camera- based monitors.
Interaction-aware Multi-agent Tracking and Probabilistic
Behavior Prediction via Adversarial Learning
Visualization of cases. (a) lane change left; (b) lane change right. The red dash lines are ground truth trajectories.
Driving Behavior for ADAS and Autonomous Driving VII

Weitere ähnliche Inhalte

Was ist angesagt?

Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep LearningYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIPedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIYu Huang
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learningYu Huang
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIYu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingYu Huang
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XDriving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIPedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIYu Huang
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IVYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVYu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivYu Huang
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataYu Huang
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIYu Huang
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIYu Huang
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 

Was ist angesagt? (20)

Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Pedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIPedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving II
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XI
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
Driving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XDriving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving X
 
Pedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIPedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving III
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IV
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xiv
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving III
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 

Ähnlich wie Driving Behavior for ADAS and Autonomous Driving VII

IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIsam Al Jawarneh, PhD
 
Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Deep Multi-View Spatial-Temporal Network for Taxi Demand PredictionDeep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Deep Multi-View Spatial-Temporal Network for Taxi Demand Predictionivaderivader
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
An Assessment of Image Matching Algorithms in Depth Estimation
An Assessment of Image Matching Algorithms in Depth EstimationAn Assessment of Image Matching Algorithms in Depth Estimation
An Assessment of Image Matching Algorithms in Depth EstimationCSCJournals
 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...IRJET Journal
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNZihao(Gerald) Zhang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
mzyautonavigationsystem.pptx
mzyautonavigationsystem.pptxmzyautonavigationsystem.pptx
mzyautonavigationsystem.pptxmzy201030
 
[20240325_LabSeminar_Huy]Spatial-Temporal Fusion Graph Neural Networks for Tr...
[20240325_LabSeminar_Huy]Spatial-Temporal Fusion Graph Neural Networks for Tr...[20240325_LabSeminar_Huy]Spatial-Temporal Fusion Graph Neural Networks for Tr...
[20240325_LabSeminar_Huy]Spatial-Temporal Fusion Graph Neural Networks for Tr...thanhdowork
 
Driving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIDriving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIYu Huang
 
NERC_panoptes_final2
NERC_panoptes_final2NERC_panoptes_final2
NERC_panoptes_final2Manomit Bal
 
Adaptive Intelligent Mobile Robotics
Adaptive Intelligent Mobile RoboticsAdaptive Intelligent Mobile Robotics
Adaptive Intelligent Mobile Roboticsahmad bassiouny
 

Ähnlich wie Driving Behavior for ADAS and Autonomous Driving VII (20)

IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al Jawarneh
 
Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Deep Multi-View Spatial-Temporal Network for Taxi Demand PredictionDeep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
An Assessment of Image Matching Algorithms in Depth Estimation
An Assessment of Image Matching Algorithms in Depth EstimationAn Assessment of Image Matching Algorithms in Depth Estimation
An Assessment of Image Matching Algorithms in Depth Estimation
 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
Esa science coffee october 2020
Esa science coffee   october 2020Esa science coffee   october 2020
Esa science coffee october 2020
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
mzyautonavigationsystem.pptx
mzyautonavigationsystem.pptxmzyautonavigationsystem.pptx
mzyautonavigationsystem.pptx
 
[20240325_LabSeminar_Huy]Spatial-Temporal Fusion Graph Neural Networks for Tr...
[20240325_LabSeminar_Huy]Spatial-Temporal Fusion Graph Neural Networks for Tr...[20240325_LabSeminar_Huy]Spatial-Temporal Fusion Graph Neural Networks for Tr...
[20240325_LabSeminar_Huy]Spatial-Temporal Fusion Graph Neural Networks for Tr...
 
Driving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIDriving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIII
 
NERC_panoptes_final2
NERC_panoptes_final2NERC_panoptes_final2
NERC_panoptes_final2
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
 
Adaptive Intelligent Mobile Robotics
Adaptive Intelligent Mobile RoboticsAdaptive Intelligent Mobile Robotics
Adaptive Intelligent Mobile Robotics
 

Mehr von Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image VYu Huang
 

Mehr von Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 

Kürzlich hochgeladen

Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 

Kürzlich hochgeladen (20)

Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 

Driving Behavior for ADAS and Autonomous Driving VII

  • 1. Driving Behavior for ADAS and Autonomous Driving VII Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents • INFER: INtermediate representations for FuturE pRediction • Deep Imitative Models for Flexible Inference, Planning, and Control • Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • AGen: Adaptable Generative Prediction Networks for Autonomous Driving • Conditional Generative Neural System for Probabilistic Trajectory Prediction • Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling • Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction via Adversarial Learning
  • 3. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents • This is a Deep Stochastic IOC (Inverse Optimal Control) RNN Encoder- decoder framework, DESIRE, for the task of future predictions of interacting agents in dynamic scenes. • DESIRE predicts future locations of objects in multiple scenes by 1) accounting for the multi- modal nature of prediction (i.e., given the same context, future may vary), 2) foreseeing the future outcomes and make a strategic prediction, and 3) reasoning not only from the past motion history, but also from the scene context as well as the interactions among the agents. • DESIRE achieves these computationally efficient in a single E2E trainable NN model. • The model first obtains a diverse set of hypothetical future prediction samples employing a conditional variational auto-encoder (CVAE), ranked and refined by the following RNN scoring-regression module. • Samples are scored by accounting for accumulated future rewards, which enables better long-term strategic decisions similar to IOC frameworks. • An RNN scene context fusion module jointly captures past motion histories, the semantic scene context and interactions among multiple agents. • A feedback mechanism iterates over ranking and refinement to boost prediction accuracy. CVPR2017
  • 4. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents (a) A driving scenario: The white van may steer into left or right while trying to avoid a collision to other dynamic agents. DESIRE produces accurate future predictions (shown as blue paths) by tackling multi- modality of future prediction while accounting for a rich set of both static and dynamic scene contexts. (b) DESIRE generates a diverse set of hypothetical prediction samples, and then ranks and refines them through a deep IOC network.
  • 5. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents • Sample Generation Module • Future prediction can be inherently ambiguous and has uncertainties as multiple plausible scenarios can be explained under the same past situation (e.g., a vehicle heading toward an intersection can make different turns); • Thus, learning a deterministic function f that directly maps past trajectories to future trajectories will under-represent potential prediction space and easily over-fit to training data. • Moreover, a naively trained network with a simple loss will produce predictions that average out all possible outcomes. • This sample generation module produces a set of diverse hypotheses critical to capturing the multimodality of the pre- diction task, through a effective combination of CVAE and RNN encoder-decoder. • RNNs are implemented using gated recurrent units (GRU) to learn long-term dependencies, yet they can be easily replaced with other popular RNNs like long short-term memory units (LSTM). • The CVAE module generates diverse set of future trajectories based on a past trajectory. • Two loss terms: reconstruction loss and KLD loss.
  • 6. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents The overview of DESIRE. First, DESIRE generates multiple plausible prediction samples Yˆ via a CVAE-based RNN encoder-decoder (Sample Generation Module). Then the following module assigns a reward to the prediction samples at each time-step sequentially as IOC frameworks and learns displacements vector ∆Yˆ to regress the prediction hypotheses (Ranking and Refinement Module). The regressed prediction samples are refined by iterative feedback. The final prediction is the sample with the maximum accumulated future reward. Note that the flow via aquamarine-colored paths is only available during the training phase.
  • 7. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents • Ranking and Refinement Module • Predicting a distant future can be far more challenging than predicting one close by. • To tackle this, adopt the concept of decision-making process in reinforcement learning (RL) where an agent is trained to choose its actions that maximizes long-term rewards to achieve its goal. • Instead of designing manually, however, IOC learns an unknown reward function. • It designs an RNN model that assigns rewards to each prediction hypothesis and measures their goodness based on the accumulated long-term rewards. • Thereafter, also directly refine prediction hypotheses by learning displacements to the actual prediction through another FC layer. • Lastly, the module receives iterative feedbacks from regressed predictions and keeps adjusting so that it produces precise predictions at the end. • There are two loss terms in training the IOC ranking and refinement module: • Cross-entropy loss; • Regression loss. • The total loss:
  • 8. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents • Scene Context Fusion • It is important that the RNN must contain the information about 1) individual past motion context, 2) semantic scene context and 3) the interaction between multiple agents, in order to provide proper hidden representations that can score and refine a prediction; • It implements a spatial grid based pooling layer similar to the SP layer in social LSTM. • Instead of using the max pooling, operation with rectangular grids, adopt log-polar grids with an average pooling. • Combined with CNN features, the SCF module provides the RNN decoder with both static and dynamic scene information. • It learns consistency between semantics of agents and scenes for reliable prediction. Details of Scene Context Fusion (SCF) unit in RNN Decoder2. Note that the input to the GRU cell at each time-step integrates multiple cues (i.e., the dynamics of agents, scene context and interaction between agents).
  • 9. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents KITTI results (left 3 rows): The row 1&2 in (b) show highly reactive nature of RNN ED-SI (i.e., prediction turns after it hits near non-drivable area). On the contrary, DESIRE shows its long-term prediction capability by considering potential future rewards. DESIRE-SI also produces more convincing predictions in the presence of other vehicles. Stanford Drone Data results (right 3 rows): The row 1 shows the multi-modal nature of the prediction problem. While the cyclist is making a right turn, it is also possible that he turns around the round-about (denoted with arrow). DESIRE-SI predicts such equally possible future as the top prediction, while covering the ground truth future within top 10 predictions. The row 2&3 also show that DESIRE-SI provides superior predictions by reasoning about both static and dynamic scene contexts.
  • 10. INFER: INtermediate representations for FuturE pRediction • 2019.3 • In urban driving scenarios, forecasting future trajectories of surrounding vehicles is of paramount importance. • While several approaches for the problem have been proposed, the best-performing ones tend to require extremely detailed input representations (e.g. image sequences). • But, such methods do not generalize to datasets they have not been trained on. • Here is intermediate representations that are particularly well-suited for future prediction. • As opposed to using texture (color) information, it relies on semantics and train an AR model to accurately predict future trajectories of traffic participants (vehicles). • Using semantics provides a significant boost over techniques that operate over raw pixel intensities/disparities. • Uncharacteristic of state-of-the-art approaches, this represents and models generalize to completely different datasets, collected across several cities, and also across countries where people drive on opposite sides of the road (left-handed vs right-handed driving). • Code and data: https://rebrand.ly/INFER-results.
  • 11. INFER: INtermediate representations for FuturE pRediction • The design philosophy is based on the following three desired characteristics that knowledge representation systems must possess: • 1) Representational adequacy: to adequately represent task- relevant information. • 2) Inferential adequacy: to infer traits not be inferred from the original unprocessed data. • 3) Generalizability: necessarily generalize to other data distributions (for the same task). • The model takes as input an intermediate representation of the scene semantics (intermediate, because it is neither too primitive, e.g. raw pixel intensities, nor too abstract e.g. velocities, steering angles). • Using these intermediate representations, predict the plausible future locations of the Vehicles of Interest (VoI). • The proposed representation does not rely heavily on the camera viewing angle, as camera mounting parameters (height, viewing angle, etc.) vary across datasets, and this approach hopes to be robust to such variations.
  • 12. INFER: INtermediate representations for FuturE pRediction First generate intermediate representations by fusing monocular images with depth information (from either stereo or Lidar), obtaining semantic and instance segmentation from monocular image, followed by an orthographic projection to bird’s-eye view. The generated intermediate representations are fed through the network, and finally it results in prediction of the target vehicle’s trajectory registered in the sensor coordinate frame.
  • 13. INFER: INtermediate representations for FuturE pRediction • It formulates trajectory prediction as a per-cell regression over an occupancy grid. • It uses the intermediate representations to simplify the objective and help the network generalize better. • It trains an autoregressive model that outputs the VoI’s position on an occupancy grid, conditioned on the previous intermediate representations. • It uses a simple Encoder- Decoder model connected by a convolutional LSTM to learn temporal dynamics. • It adds skip connections between corresponding encoder and decoder branches. • The proposed trajectory prediction scheme takes as input a sequence of intermediate representations and produces a single channel output occupancy grid. • The training objective comprises two terms: reconstruction loss term, and safety loss term.
  • 14. INFER: INtermediate representations for FuturE pRediction The qualitative results from the validation fold of KITTI showcase the efficacy of INFER-Skip in using the intermediate representation to predict complex trajectories. For example, in the left most plot, the network is able to accurately predict the unseen second curve in the trajectory (predicted and ground truth trajectories are shown in red and blue color, respectively). The green and red 3D bounding boxes indicate start of preconditioning and start of prediction of the vehicle of interest (VoI), respectively. It is worth noting that the predicted trajectories are well within the lane (dark gray) and road region (cyan), while avoiding collisions with the obstacles (magenta).
  • 15. Deep Imitative Models for Flexible Inference, Planning, and Control • Imitation learning produces behavioral policies with limited flexibility to accommodate new goals at test-time. • In contrast, model-based reinforcement learning (MBRL) can plan to arbitrary goals using a predictive dynamics model learned from data. • It proposes “imitative models” to combine the benefits of imitation learning and MBRL. • Imitative models are probabilistic predictive models able to plan interpretable expert-like trajectories to achieve arbitrary goals. • Inference with them resembles trajectory optimization in model-based reinforcement learning, and learning them resembles imitation learning. • This method substantially outperforms six direct imitation learning approaches (five of them prior work) and an MBRL approach in a dynamic simulated autonomous driving task, and can be learned efficiently from a fixed set of expert demonstrations without additional online data collection. 2019.6
  • 16. Deep Imitative Models for Flexible Inference, Planning, and Control To apply the algorithm to navigation in CARLA. Left: Image depicting the current scene, in which the light recently turned from green to red. Left-Middle: Plot showing LIDAR observations of the agent, the goals it received from a route planner, and the plan produced by the method. The model smoothly chooses between goals based on its prior of expert behavior. Here, the stationary agents chooses to accelerate to follow the vehicle ahead. Right-Middle: Image depicting an intersection scene. Right: LIDAR observations, goals, cost map of simulated potholes, and a variety of plans the method produces, colored by the planner’s preference. Although the imitative model never observed pothole-avoidance behavior, it is able to plan a reasonable, on-road path around them with a test-time cost map. Its preferred plan enters the intersection and steers around a pothole.
  • 17. Deep Imitative Models for Flexible Inference, Planning, and Control • Reinforcement learning (RL) algorithms automatically learn desirable behaviors from raw sensory inputs with minimal engineering; However, RL generally requires online learning: the agent must collect more data with its latest strategy, use it to update a model, and repeat. • Deploying a partially-trained policy on a real-world autonomous system, can be dangerous. • Learning behavior should happen offline from expert demonstrations. • How to incorporate such demo into an autonomous car, to perform a variety of tasks? • One is imitation learning (IL), learning policies that stay near the expert’s distribution. • Another is model-based methods, which can use the data to fit a dynamics model, and in principle can be used with planning algorithms to achieve any user-specified goal. • However, model-based (MB) and model-free RL algorithms are vulnerable to distributional drift: when acting accord. to the learned model or policy, the agent visits states different from those in training, and in those unlikely to determine an effective course of action. • This is problematic when the data intentionally excludes adverse events such as crashes. • Therefore, MBRL algorithms usually require online collection and training.
  • 18. Deep Imitative Models for Flexible Inference, Planning, and Control Imitative planning to goals: multi-goal waypoint planning enables fine-grained control of the plans. Costs can be assigned to “potholes” only seen at test-time; expert demonstrations with potholes were never observed. The planner prefers routes around the potholes.
  • 19. Deep Imitative Models for Flexible Inference, Planning, and Control • IL algorithms use expert demonstration data and, despite similar drift shortcomings, can sometimes learn effective policies without online data collection. However, standard IL offers little task flexibility since it only predicts low-level behavior. • While several works augmented IL with goal conditioning, these goals must be specified in advance during training, and are typically simple (e.g., turning left or right). • The goal is to devise an algorithm that combines the advantages of IL and MBRL by offering the flexibility to achieve new user-specified goals and the ability to learn from offline data. • By learning a deep conditional probabilistic forecasting model from expert data, it captures the distribution of expert behaviors without using manually designed reward functions. • To plan to a goal, this method infers the most probable expert state trajectory under a posterior distribution induced by the model and a task-specifying goal distribution. • By incorporating a model-based representation, it can easily plan to previously unseen user- specified goals while behaving similar to the expert, and can be flexibly repurposed to perform a variety of test-time tasks without any additional training.
  • 20. Deep Imitative Models for Flexible Inference, Planning, and Control Illustration of the method applied to autonomous driving. This method trains an imitative model from a dataset of expert examples. After training, the model is repurposed as an imitative planner. At test-time, a route planner provides waypoints to the imitative planner, which computes expert- like paths to each goal. The best plan is chosen according to the planning objective and provided to a low-level PID-controller in order to produce steering and throttle actions.
  • 21. Deep Imitative Models for Flexible Inference, Planning, and Control Tolerating bad waypoints. The planner prefers waypoints in the distribution of expert behavior (on the road at a reasonable distance). Columns 1, 2: Planning with ½ decoy waypoints. Columns 3,4: Planning with all waypoints on the wrong side of the road.
  • 22. Deep Imitative Models for Flexible Inference, Planning, and Control Test-time plans prefer steering around potholes. Table: Robustness to waypoint noise and test-time pothole adaptation. The method is robust to waypoints on the wrong side of the road, and fairly robust to decoy waypoints. The method is flexible enough to safely produce behavior not demonstrated (pothole avoidance) by incorporating a test- time cost. Ten episodes are collected in each Town.
  • 23. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • Accurate prediction of others’ trajectories is essential for autonomous driving. • Trajectory prediction is challenging because it requires reasoning about agents’ past movements, social interactions among varying numbers and kinds of agents, constraints from the scene context, and the stochasticity of human behavior. • This approach models these interactions and constraints jointly within a Multi-Agent Tensor Fusion (MATF) network. • Specifically, the model encodes multiple agents’ past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context. • The model decodes recurrently to multiple agents’ future trajectories, using adversarial loss to learn stochastic predictions. • Experiments on both highway driving and pedestrian crowd datasets show that the model achieves state-of- the-art prediction accuracy. 2019.7
  • 24. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • There are two parallel encoding streams in the MATF architecture. • One encodes the past trajectories of each individual agent xi independently using single agent LSTM encoders, and another encodes the static scene context image c with a CNN. • Each LSTM encoder shares the same set of parameters, so the architecture is invariant to the number of agents in the scene. • The outputs of the LSTM encoders are 1-D agent state vectors {x′1 , x′2 , .., x′n } without temporal structure. • The output of the scene context encoder CNN is a scaled feature map c′ retaining the spatial structure of the bird’s-eye view static scene context image. • Next, the two encoding streams are concatenated spatially into a Multi-Agent Tensor. • Agent encodings {x′1, x′2, .., x′n} are placed into one bird’s-eye view spatial tensor, which is initialized to 0 and is of the same shape (width and height) as the encoded scene image c′.
  • 25. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • The dimension axis of the encodings fits into the channel axis of the tensor. • The agent encodings are placed into the spatial tensor with respect to their positions at the last time step of their past trajectories. • This tensor is then concatenated with the encoded scene image in the channel dimension to get a combined tensor. If multiple agents are placed into the same cell in the tensor due to discretization, element-wise max pooling is performed. • The Multi-Agent Tensor is fed into fully convolutional layers, which learn to represent interactions among multiple agents and between agents and the scene context, while retaining spatial locality, to produce a fused Multi-Agent Tensor. • Specifically, these layers operate at multiple spatial resolution scale levels by adopting U- Net-like architectures to model interaction at different spatial scales. • The output feature map of this fused model c′′ has exactly the same shape as c′ in width and height to retain the spatial structure of the encoding.
  • 26. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction The Multi-Agent Tensor encoding is a spatial feature map of the scene context and multiple agents from an overhead perspective, including agent channels (above) and context channels (below). Agents’ feature vectors (red) output from single- Agent LSTM encoders are placed spatially w.r.t. agents’ coordinates to form the agent channels. The agent channels are aligned spatially with the context channels (a context feature map) output from scene context encoding layers to retain the spatial structure.
  • 27. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • To decode each agent’s predicted trajectory, agent- specific representations with fused interaction features for each agent {x1′′ , x2′′ , .., xn′′ } are sliced out according to their coordinates from the fused Multi-Agent Tensor output c′′. • These agent-specific representations are then added as a residual to the original encoded agent vectors to form final agent encoding vectors {x1′ + x1′′ , x2′ + x2′′ , ..., xn′ + xn′′ }, which encode all the information from the past trajectories of the agents themselves, the static scene context, and the interaction features among multiple agents. • In this way, this approach allows each agent to get a different social and contextual embedding focused on itself. • Importantly, the model gets these embeddings for multiple agents using shared feature extractors instead of operating n times for n agents. • Finally, for each agent in the scene, its final vector xi′ + xi′′ is decoded to future trajectory prediction yiˆ by LSTM decoders. • Similar to the encoders for each agent, parameters are shared to guarantee that the network can generalize well when the number of agents in the scene varies.
  • 28. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction Illustration of the Multi-Agent Tensor Fusion (MATF) architecture.
  • 29. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction Qualitative results from Massachusetts driving dataset. Past trajectories are shown in different colors for each vehicle, followed by 100 sampled future trajectories. Ground truth future trajectories are shown in black, and lane centers are shown in gray. (a) A complex scenario involving five vehicles; MATF accurately predicts the trajectory and velocity profile for all. (b) MATF correctly predicts that the red vehicle will complete a lane change. (c) MATF captures the uncertainty over whether the red vehicle will take the highway exit. (d) As soon as the purple vehicle passes a highway exit, MATF predicts it will not take that exit. (e) Here, MATF fails to predict the precise ground truth trajectory; however, the red vehicle is predicted to initiate a lane change maneuver in a very small number of sampled trajectories.
  • 30. AGen: Adaptable Generative Prediction Networks for Autonomous Driving • In highly interactive driving scenarios, accurate prediction of other road participants is critical for safe and efficient navigation of autonomous cars. • Prediction is challenging due to the difficulty in modeling various driving behavior, or learning such a model. • The model should be interactive and reflect individual differences. • Imitation learning methods, such as parameter sharing generative adversarial imitation learning (PS-GAIL), are able to learn interactive models. • However, the learned models average out individual differences. • When used to predict trajectories of individual vehicles, these models are biased. • An adaptable generative prediction framework (AGen), performs online adaptation of the offline learned models to recover individual differences for better prediction. • In particular, combine the recursive least square parameter adaptation algorithm (RLS- PAA) with the offline learned model from PS-GAIL. • RLS-PAA has analytical solutions and is able to adapt the model for every single vehicle efficiently online. IVS 2019
  • 31. AGen: Adaptable Generative Prediction Networks for Autonomous Driving Offline model learning extracts features for average driving behavior. Online model adaptation can perturb the average model to fit the behavior of a specific driver at a specific time. In particular, take the offline pretrained policy network of PS- GAIL as the feature extractor for averaged driving behavior, while adapting individual vehicle behavior using RLS-PAA online. Heterogeneity among drivers needs to be explicitly accounted to improve prediction accuracy in real world scenarios. As mentioned earlier, it is intractable to fit a policy network for every individual vehicle. To make heterogeneous prediction scalable, combine offline model learning with online model adaptation.
  • 32. AGen: Adaptable Generative Prediction Networks for Autonomous Driving (a) Offline training using PS-GAIL. Critic computes the diff btw the expert trajectory and the roll-out trajectory from the policy network. PS-GAIL iteratively updates the policy to minimize the diff and the critic to maximize the diff. (b) Online adaptation using RLS-PAA. The critic computes the 2- norm diff btw the the expert trajectory and the roll-out trajectory from the policy network. RLS-PAA updates the policy network to minimize the diff. Either 1-step or 2-step adaptation.
  • 33. AGen: Adaptable Generative Prediction Networks for Autonomous Driving Predicted 2 s trajectories for 22 agents after 3 s adaptions. Average position RMSE over time in the 22-agent scenario
  • 34. Conditional Generative Neural System for Probabilistic Trajectory Prediction • Effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are critical for intelligent systems such as autonomous vehicles and wheeled mobile robotics navigating in complex scenarios to achieve safe and high-quality decision making, motion planning and control. • Due to the uncertain nature of the future, it is desired to make inference from a probability perspective instead of deterministic prediction. • They propose a conditional generative neural system (CGNS) for probabilistic trajectory prediction to approximate the data distribution, with which realistic, feasible and diverse future trajectory hypotheses can be sampled. • The system combines the strengths of conditional latent space learning and variational divergence minimization, and leverages both static context and interaction information with soft attention mechanisms. • Also propose a regularization method for incorporating soft constraints into deep neural networks with differentiable barrier functions, which can regulate and push the generated samples into the feasible regions. 2019.7
  • 35. Conditional Generative Neural System for Probabilistic Trajectory Prediction Typical urban traffic scenarios with large uncertainty and interactions among multiple entities. The shaded areas represent the reachable sets of possible trajectories. (a) Unsignalized roundabout with four- way yield signs; (b) Unsignalized intersection with four-way stop signs.
  • 36. Conditional Generative Neural System for Probabilistic Trajectory Prediction • Requirements to generate diverse, realistic future trajectories: • 1) Context-aware: The system should be able to forecast trajectories which are inside the traversable regions and collision-free with static obstacles in the environment. For instance, when the vehicles navigate in a roundabout they need to advance along the curves and avoid collisions with road boundaries. • 2) Interaction-aware: The system needs to generate reason- able trajectories compliant to traffic or social rules, which takes into account interactions and reactions among multiple entities. For instance, when the vehicles approach an unsignalized intersection, they need to anticipate others’ possible intentions and motions as well as the influences of their own behaviors on surrounding entities. • 3) Feasibility-aware: The system should anticipate naturalistic and physically-feasible trajectories which are compliant to vehicle kinematics or dynamics constraints, although these constraints can be ignored for pedestrians due to the large flexibility of their motions. • 4) Probabilistic prediction: Since the future is full of uncertainty, the system should be able to learn an approximated distribution of future trajectories close to data distribution and generate diverse samples which represent various possible behavior patterns.
  • 37. Conditional Generative Neural System for Probabilistic Trajectory Prediction Overviewof theproposedconditionalgenerativeneuralsystem(CGNS)whichconsistsoffourkeycontributions: (a)adeepfeatureextractorwithsoftattentionmechanism,whichextractsmulti-levelfeaturesfromscene contextimagesequencesandtrajectories;(b)Anencodertolearnconditionallatentspacerepresentations;(c)A generator(decoder)tosamplefuturetrajectoryhypotheses;(d)Adiscriminatortodistinguishpredicted trajectoriesfromgroundtruth.
  • 38. Conditional Generative Neural System for Probabilistic Trajectory Prediction Fig. 3. The visualization of the context image masks and trajectory block attention masks. Particularly, in the trajectory masks, there are four rows representing 4 historical time steps and 6 columns representing 6 vehicles in the scene. The 1st column corresponds to the predicted vehicle and the others corresponds to surrounding ones. Brighter colors indicate larger attention weights. The predicted vehicles are indicated with red bounding boxes. In all the cases, the image masks have a large weight around the predicted vehicle and the area of its heading direction. In the 1st three cases, only the historical trajectories of the predicted vehicle are assigned large attention weights, which implies that the other vehicles have little effect in these situations. However, in the last 3 cases, more attention is paid to other vehicles since there exist strong interactions which increases the inter-dependency.
  • 39. Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling • Coordination recognition and subtle pattern prediction of future trajectories play a significant role when modeling interactive behaviors of multiple agents. • Due to the essential property of uncertainty in the future evolution, deterministic predictors are not sufficiently safe and robust. • In order to tackle the task of probabilistic prediction for multiple, interactive entities, propose a coordination and trajectory prediction system (CTPS), which has a hierarchical structure including a macro-level coordination recognition module and a micro-level subtle pattern prediction module which solves a probabilistic generation task. • Two types of representation of the coordination variable: categorized and real-valued. • Bayesian deep learning into generative models to generate diversified prediction hypotheses. • The proposed system is tested on multiple driving datasets in various traffic scenarios, which achieves better performance than baseline approaches in terms of a set of evaluation metrics. • Using the categorized coordination can better capture multi-modality and generate more diversified samples than the real-valued coordination, while the latter can generate prediction hypotheses with smaller errors with a sacrifice of sample diversity. • NNs with weight uncertainty is able to generate samples with larger variance and diversity. 2019.5
  • 40. Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling Typical highway and urban driving scenarios where two or more entities coordinate and interact with each other. The shaded areas represent possible future motions which consider multi-modality. (a) Ramp merging and lane change behaviors on highway scenarios; (b) Unsignalized roundabout with yield signs; (c) Unsignalized intersection with stop signs. Although the contexts are different, they can be treated as generalized merging scenarios.
  • 41. Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling • The multi-modal conditional distribution of future trajectories for interactive agents can be factorized into categorized and real-valued. • This factorization naturally divides the system into a coordination recognition module (macro-level) and a subtle pattern prediction module (micro-level). • The coordination c can not only be categorized to represent meaningful semantics, but also be real-value vectors to encode the underlying representations. • If c is categorized, the micro-level module takes c in as an indicator through one-hot encoding; if c is a real-valued variable, the micro-level module takes c in as an additional input feature. • The macro-level module is based on a variational recurrent neural network (VRNN) followed by a probabilistic classifier. • And the micro-level module is based on a Coordination-Bayesian Conditional Generative Adversarial Network (C-BCGAN).
  • 42. Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling Overview of CTPS: (a) Coordination recognition module: The coordination variable can be discrete categories or continuous real-valued vectors. The discrete distribution of categorized coordination is obtained by a probabilistic classifier based on latent features extracted by VRNN. The continuous distribution of real-valued coordination is obtained by maximizing mutual info based on a VAE-style model. Choose either formulation according to the objective and emphasis in particular tasks; (b) Subtle pattern prediction module: The model is based on the proposed C-BCGAN in which the generator takes as input the historical info, coordinator indicator as well as a noise from the normal distribution. Weight uncertainties are incorporated in both generator and discriminator network.
  • 43. Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling The visualization of prediction results in the highway scenario. (a) Generation with learned coordination; (b) Generation with real-valued coordination. Note that, to only predict the longitudinal motions for surrounding vehicles but both longitudinal and lateral motions for the center vehicle. That is the reason why the predicted trajectories of surrounding vehicles do not have lateral deviation.
  • 44. Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction via Adversarial Learning • In order to enable high-quality decision making and motion planning of intelligent systems such as robotics and autonomous vehicles, accurate probabilistic predictions for surrounding interactive objects is a crucial prerequisite. • Although many research studies have been devoted to making predictions on a single entity, it remains an open challenge to forecast future behaviors for multiple interactive agents simultaneously. • In this work, take advantage of the Generative Adversarial Network (GAN) due to its capability of distribution learning and propose a generic multi-agent probabilistic prediction and tracking framework which takes the interactions among multiple entities into account, in which all the entities are treated as a whole. • However, since GAN is very hard to train, make an empirical research and present the relationship between training performance and hyperparameter values with a numerical case study. • The results imply that the proposed model can capture both the mean, variance and multi- modalities of the ground truth distribution. • Moreover, apply the proposed approach to a real-world task of vehicle behavior prediction to demonstrate its effectiveness and accuracy. • The proposed model trained by adversarial learning can achieve a better prediction performance than other SoA models trained by traditional supervised learning which maximizes the data likelihood. • The well-trained model can also be utilized as an implicit proposal distribution for particle filtered based Bayesian state estimation. 2019.4
  • 45. Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction via Adversarial Learning The general diagram of the proposed model, which consists of a generator network and a discriminator network.
  • 46. Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction via Adversarial Learning • To apply the proposed approach to solve a trajectory prediction task of interactive on-road vehicles as an illustrative example, although it can be utilized to solve many other tasks such as interactive pedestrian trajectory prediction and human-robot interactions. A typical highway scenario is investigated where the gray car is the ego vehicle which aims at forecasting future motions of its surrounding vehicles (red, green and yellow ones). The observations of environment can be obtained by on-board sensors. The approach can also be adopted in overhead traffic surveillance systems with camera- based monitors.
  • 47. Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction via Adversarial Learning Visualization of cases. (a) lane change left; (b) lane change right. The red dash lines are ground truth trajectories.