Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

Kaggle Lyft Motion Prediction
4th place solution

● We are #4 out of 935 teams, in competitive situation.
Competition result:
PFN!

● Kaggle: Lyft Motion Prediction for Autonomous Vehicles 
● l5kit Data HP: Data - Lyft 
Competition/Dataset page

● Focus on “Motion Prediction” part
○ Given bird-eye-view image (No natural images)
○ Predict 3 possible trajectories with confidence.
Competition introduction
Competition Scope Image from https://self-driving.lyft.com/level5/data/

● It was focusing “Perception” part
○ https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles
○ Detect car as 3d object
Last year competition: Lyft 3D Object Detection
Image from https://self-driving.lyft.com/level5/data/ Image from https://www.kaggle.com/tarunpaparaju/lyft-competition-understanding-the-data

● Information in the bird-eye-view
○ Label of passengers (e.g. car, bicycle and pedestrian...)
○ Status of traffic light
○ Road information (e.g. pedestrian crossings and direction)
○ Location and timestamp...
Competition introduction
These information
can be gathered into
single image using
l5kit library

● Total dataset size: 1118 hours, 26344 km
● Road length: 6.8 miles
● Train (89GB), Validation (11GB), Test Dataset (3GB):
○ Big data: Approx 200M, 190K, 71K Agents to predict motion.
Lyft level5 Data description
Image from https://arxiv.org/pdf/2006.14480.pdf
“One Thousand and One Hours: Self-driving Motion Prediction Dataset”

● Route on google map
● Not so long distance, around Lyft office (Actually, CNN can “memorize” the place from image)
EDA using google earth
１．Station ２．Intersection
２．←Paper fig
２．Signals

● Many straight roads
● Some complicated intersections...
EDA using google earth

● More & more EDA, Train/Valid/Test stat is almost same!
No extrapolation found in this dataset…
○ Agent type distribution：CAR 91%, CYCLIST 2%, PEDESTRIAN 7%
○ Date ：From 2019 October to 2020 March
○ Time ：Daytime, From 7am to 7pm
○ Place：All road is included in train/valid/test
● Less effort is necessary “how to handle & train data”
→ Pure programming skill & ML techniques were important.
More EDA, No extrapolation found in this dataset...
Time
https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/189516
Date

● Structured numpy array + zarr is used to save data on disk.
● structured array: https://numpy.org/doc/stable/user/basics.rec.html
Raw Data format
● zarr: https://zarr.readthedocs.io/en/stable/
○ It can save structured array on disk

● l5kit is provided as baseline: https://github.com/lyft/l5kit
○ (Complicated) data preprocessing part is already implemented
○ Rasterizer
■ Semantic → protocol buffer is used inside MapAPI to draw semantic Map
■ Satellite → Draw satellite image.
● Most kaggle competition : 0 → 1
This competition : 1 → 10
L5kit library
Rasterizer
(base implementation
provided by Lyft)
Raw data (zarr)
- World coordinate
in time
- Extent (size)
- Yaw
CNN
Predict future
coordinates
(3 trajectories)
Typical approach already supported by l5kit Image

Short Summary
● Distributed training: 8 V100 GPUs * 5 days for 1 epoch

● 1. Use train_full.zarr
● 2. l5kit==1.1.0
● 3. Set min_history=0, min_future=10 in AgentDataset
● 4. Cosine annealing for LR decrease until 0, with training 1 epoch
→ That’s enough to win the prize! (Private LB: 10.274)
● 5. Ensemble with GMM (Gaussian Mixture Models)
→ Further boosted score by 0.8 (Private LB: 9.475)
Short Summary

● How to predict probabilistic behavior?
● Suggested Baseline kernel “Lyft: Training with multi-mode confidence”
○ Single model outputs 3 trajectories with the confidence at the same time
○ Train using competition evaluation metric loss directly
○ 1st place solution also originate from our approach (link)
Approach/Solution:

Approach/Metric:
• In this competition, model outputs 3 hypotheses (trajectories). 
– ground truth: 
– hypotheses: 
• Assume the ground truth positions to be modeled by a mixture of Normal distributions. 
 
 
 
 
• LB score is calculated by following metric and we directory used it as loss function of
CNN.

● To utilize all possible data? → Let’s use train_full.zarr without down sampling
○ But size is big!….
○ 89 GB
○ 191,177,863 record with default setting
→ Need distributed training!
※ It was important to use all the data, to get good score in the competition.
Use train_full.zarr dataset

● torch.distributedis used
○ 8 V100 GPUs * 5 days for 1 epoch
● Practically, need to modify AgentDataset to cache index arrays in disk
○ AgentDataset is copied in DataLoader when num_workers is set.
■ 8 multiprocesses * 4 num_workers = 32 copy is created
■ On-memory usage of AgentDataset is huge! Cannot fit in RAM.
● cumulative_sizesattribute was the bottleneck.
○ Cache track_id, scene_index, state_indexinto zarr to
reduce on-memory usage.
Distributed training

● Pointed out in “We did it all wrong” discussion:
○ The target_positions value need to be rotated in the same way with the image,
specified by agent’s “yaw”
Use l5kit==1.1.0
l5kit==1.0.6 target_positions l5kit==1.1.0 target_positions

● Use chopped dataset: Only use 100-th frame from each scene.
○ This is how test data is made.
○ But it discards all ground truth data,
instead, set agent_mask in AgentDataset to make validation data.
● Check validation/test dataset carefully
○ We Noticed that it contains at least 10 future frames & 0 history frames.
→ Next page
Validation strategy

● Set min_history=0, min_future=10 in AgentDataset
○ MOST IMPORTANT!
○ Public LB Score jumps to 13.059 here.
Align training dataset to validation/test dataset

● Tried several models
● Worked Well:
○ Resnet18
○ Resnet50
○ SEResNeXt50
○ ecaresnet18
● Not working well: Big, deeper models tend to have worse performance...
○ ResNet101
○ ResNet152
CNN Models

● Trained hyperparameters
○ Batch size 12 * 8 processes
○ Adam optimizer
○ Cosing annealing with 1 epoch (Better than Exponential decay)
Training with cosine annealing

● Used albumentationslibrary, tried several augmentations.
○ Tried Cutout, Blur, Downscale
○ Other augmentation used in natural image, ex flip, was not appropriate this time
● Only cutout is adopted for final model.
Augmentation: 1. Image based augmentation
Cutout Blur DownscaleOriginal image

● Modified BoxRasterizer to add augmentation
○ 1. Random Agent drop
○ 2. Agent extent size scaling
● We could not find clear improvement during our experiment.
Final model does not use this augmentation...
Augmentation: 2. Rasterizer level augmentation
Several agents
are dropped
Host car size
is different

● How to ensemble models? 
○ In this competition, we train model to predict three trajectories (x1,x2,x3) and
three confidences (c1,c2,c3). 
○ Simple ensemble methods such as averaging do not work. 
 
● Consider the outputs as Gaussian mixture models 
○ The outputs can be considered as confidence-weighted GMMs with
n_components=3
 
○ You can take the average of GMMs and the average of N GMMs takes the form
of GMM with n_components=3N
Ensemble by GMM and EM algorithm

● You can get ensembled outputs from by
following the steps below. 
○ Sampling enough points (e.g. 1000N) from the distribution .  
○ Run the EM algorithm with n_components=3on the sampled points  
(We used sklearn.mixture.GaussianMixture). 
○ Let be the output of the EM algorithm. 

model1:loss=67.85 
ensemble model:loss=8.26 
sampling from GMM  fitting by EM algorithm 
● Example1: loss has reduced dramatically by taking “average trajectory”!

model1:loss=3340 
ensemble model:loss=69.69 
sampling from GMM  fitting by EM algorithm 
● Example2: Model 1’s loss was very bad, ensembled result can get benefit
of better predictions from model 2.

● The final best submission was ensemble of 9 different models
● That’s all for our solution presentation, thank you!
Final submission

Other approach &
Future discussion

● CNN Models: Smaller model was enough
○ ResNet18 was enough to get 4th place
○ Tried bigger ResNet101, ResNet152, etc… But worse performance
● Only 1 epoch training was enough!
○ Because data is very big & almost duplicated for consecutive frames
○ Important to use Cosine annealing for learning rate schedule
● Rasterizer (drawing image) is bottleneck
○ CPU intensive task, GPU util is not 100%.
Findings
Rasterizer
(base implementation
provided by Lyft)
Raw data
- World coordinate
in time
- Extent (size)
- Yaw
CNN
Predict future
coordinates
(3 trajectories)
Typical approach
Image

● https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/discussion/201493
● Optimize Rasterizer implementation
→ 8 GPU * 2 days for 1 epoch
● Hyperparameters with “heavy” training
○ Semantic + Satellite images
○ Bigger image (448 * 224) ← (224, 224)
○ num history: 30 ← 10
○ min_future: 5 ← 10
○ Modify agent filter threshold
○ batch_size: 64
etc...
● Pre-training small image 4 epoch → Fine tune big image 1 epoch
○ It was very effective
[1st place solution] : L5kit Speedup

● 10th place solution GNN based methods called VectorNet
○ Faster training & inference
■ They did not use rasterized images at all
■ 11 GPU hours for 1 epoch (Our CNN needs about 960 GPU hours)
○ Comparable performance to CNN-based methods
Other interesting approaches: VectorNet
VectorNet [Gao+, CVPR2020]  VectorNet 
CNN 
CNN 
(or not shared)

Appendix1
Data analysis/Error analysis

● How different is the 3 trajectory generated by CNN models?
● Case1: Different directions
○ CNN can predict different possible ways/directions that agents move in the
future.
The diversity of 3 trajectory

● How different is the 3 trajectory generated by CNN models?
● Case2: Speed or start time is different
○ Even direction is straight, CNN can predict different possible
speed/acceleration that agents move in the future.
The diversity of 3 trajectory

Appendix2
What we tried and not worked

● raster_size (Image size)
○ Tried 224x224 & 128x128.
○ Default 224x224 was better
● pixel_size
○ Tried 0.5, 0.25, 0.15.
○ Default 0.5 was better.
● num_history specific model
○ Short history model:
■ Tried to train 0 history model
→ the performance was not better than original model
○ Long history model
■ Tried 10, 14, 20
■ Default 10 was better in our experiment
(But 1st place solution used num_history=30)
Hyperparamter change

● Added velocity arrow to the BoxRasterizer
Custom Rasterizer: 1. VelocityBoxRasterizer

● Original SemanticRasterizer: Semantic image is drawn as RGB image
Custom Rasterizer: 2. ChannelSemanticRasterizer
● ChannelSemanticRasterizer:
○ Separated road, lane, green/yellow/red signal & crosswalk
Somehow, the training performance was worse than original SemanticRasterizer...

● We thought that the red signal length is important to predict when the stopping
agent starts moving in the future.
● This Semantic Rasterizer changes its value by looking how long the single continued
in the history.
Custom Rasterizer: 3. TLSemanticRasterizer

● Draw each agent type in different color/channel
○ CAR = Blue
○ CYCLIST = Yellow
○ PEDESTRIAN = Red
○ UNKNOWN = Gray
● Unknown type agent is also drawn
Custom Rasterizer: 4. AgentTypeBoxRasterizer

● Predict all agent’s future coords at once, from 1 image.
● Using semantic segmentation models (segmentation-models-pytorch)
● Stopped investigation because agent sometimes exists very far from host car.
Multi-agent prediction model
https://self-driving.lyft.com/level5/data/

● What kind of data makes the serious big error?
● When the “yaw” annotation is wrong, prediction & actual direction becomes different!
● Fix data’s yaw field contributes total score improvement?
○ YES! for validation dataset (see below).
○ NO!! for test dataset, yaw annotation seems wrong for only stopped cars.
● In the application, I guess this is very important problem to be considered...
Yaw correction
Loss=43988 Loss=30962 Loss=10818

● Kaggle page: Lyft Motion Prediction for Autonomous Vehicles
● Data HP: https://self-driving.lyft.com/level5/data/
● Solution Discussion: Lyft Motion Prediction for Autonomous Vehicles
● Solution Code: https://github.com/pfnet-research/kaggle-lyft-motion-prediction-4th-place-solution
References

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution

Ähnlich wie Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution (20)

Mehr von Preferred Networks

Mehr von Preferred Networks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution