SlideShare a Scribd company logo
1 of 47
KaggleDays Tokyo Workshop
Practical tips for handling
noisy data and annotation
Ryuichi Kanoh (RK)
December 11, 2019 https://www.kaggle.com/ryuichi0704
Overview
- This is a KaggleDays workshop on noise handling sponsored by DeNA.
- In addition to explaining the techniques, I will touch on:
- Experimental results
- Implementations (https://github.com/ryuichi0704/workshop_noise_handling)
- Interactive communication is welcome.
- Both in English and Japanese.
2
Agenda
● Introduction
● Setup of the experiment
● Techniques for learning with a noisy dataset
3
Agenda
● Introduction
● Setup of the experiment
● Techniques for learning with a noisy dataset
4
Big data and machine learning
- Large and high-quality dataset drives the success of ML.
- However, it is very hard to prepare such a dataset.
- So, you probably want to use crowdsourcing / crawling and so on.
https://medium.com/syncedreview/sensetime-trains-imagenet-alexnet-in-record-1-5-minutes-e944ab049b2c
5
Possible noise from web crawling
Google Search
- The keywords may not be relevant to the image content.
6
Possible noise from crowd sourcing
Annotation error may occur with
- limited communication between seekers and solvers.
- limited working time.
https://www.flickr.com/photos/kjempekjekt/3470254809
7
Noise example in kaggle competitions
8
- Task
- 340 class timestamped vector classification
- Difficulty
- Dataset is collected from browser game.
- Quality differences depending on drawer
https://quickdraw.withgoogle.com/
Dataset example (class: monkey)
9
There are many noisy datasets in kaggle
10
Classes are fine-grained. It is difficult even for a
human to annotate consistently.
75% of the dataset is annotated by metadata.
(not by a human)
Annotation granularity is not stable.
(e.g. [face] vs [face, nose, eye, mouth...])
There are many noisy datasets in kaggle
11
Labels vary depending on the annotator.
Annotation was crowd-sourced.
There are external datasets with noisy
annotations.
Each video was automatically annotated by the
YouTube annotation system.
Agenda
● Introduction
● Setup of the experiment
● Techniques for learning with a noisy dataset
12
Setup
- Use QuickDraw (Link) dataset.
- 340 class image classification.
- Evaluation metric is top-1 accuracy.
- Timestamped vectors are converted to 1-channel images with 32x32 resolution.
- Dataset is randomly subsampled from original dataset.
- Train: 81600 samples, Test: 20400 samples (random split)
- 300 images per class in total.
- Test accuracy observed with the maximum validation accuracy is reported.
- Base setting:
13
model base-lr batch-size epoch train:valid objective optimizer scheduler
ResNet18 0.1 128 50
9:1
(random
split)
Cross entropy
SGD with
nesterov
momentum,
w-decay 1e-4
MultistepLR
(x 0.1 at 40, 45 epoch)
*Other details are in the GitHub repository.
Setup
14
Local machine
Container registry AI platform training
Google Cloud
push
Google sheet
results
pull
- Experiments were done with AI platform training.
notification
Base results
15
- Test accuracy distribution with 50 seeds.
- 0.563 is an average performance.
- random seed effect is around 0.004 (maximum: ~0.010)
Model output analysis
- Check hard samples and easy samples.
16
prediction (for validation dataset)
label
Error = 0.01 0.90 0.03
17
- Model output analysis
Easy samples (based on cross-entropy)
label / pred
18
- Model output analysis
Hard samples (based on cross-entropy)
label / pred
19
- Model output analysis
Why difficult?
- There are a number of ways in which a model can be difficult.
- Image itself is noisy or wrong
- 1, 2, 8, 9
- There are similar confusing classes
- 3, 4, 7
- So, the model is definitely struggling.
- Are there any techniques we can use improve model training?
1 2 3
4 5 6
7 8 9
● Introduction
● Setup of the experiment
● Techniques for learning with a noisy dataset
1. Mixup
2. Large batch size
3. Distillation
20
Agenda
[1] Mixup
- Construct virtual training sample with
- λ is randomly sampled from symmetric beta distribution.
21
https://arxiv.org/abs/1710.09412
http://moustaphacisse.com/
- Mixup
Beta distribution
22
http://wazalabo.com/mixup_1.html
- Large alpha : strong smoothing (in original paper, 0.2 for ImageNet)
- Even though we have blue (noisy) sample in here, its effect is suppressed by the
surrounding red.
- Effectiveness for label noise is also mentioned in the original paper.
23
https://www.inference.vc/mixup-data-dependent-data-augmentation/
- Mixup
Why mixup for noisy dataset?
- You may ask what happens if we mix in the intermediate vector.
- It is called “Manifold mixup”. https://arxiv.org/abs/1806.05236
- Feature vectors in the random layer are mixed using the same mixup procedure.
24
- Mixup
Derivatives of mixup
https://arxiv.org/abs/1512.03385
25
- Mixup
Experimental results
- Mixup performance is better than base performance (0.563).
- Important aspects:
- Performance changes drastically with alpha (beta-distribution parameter).
- A manifold-mixup is also a viable alternative.
- It can be used not only for images, but for categorical tabular data and so on..
- Data mixing based on beta distribution
- https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/model.py#L39-L51
- Select mixing layer (for manifold mixup)
- https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/model.py#L28-L37
- Loss calculation
- https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/runner/mixup_runner.py#L17-L20
26
- Mixup
Implementations
- Stopping strong augmentation (like mixup, auto-augment) on the final phase is helpful.
(https://arxiv.org/abs/1909.09148)
- Performance improvement is observed with our QuickDraw dataset.
27
- Mixup
Tips on training with mixup
QuickDraw dataset, with mixup (alpha=1.6)
0.6074 → 0.6165
- iMet Collection 2019 - FGVC6, 6th place
- For image data
- Freesound Audio Tagging 2019, 1st place
- For audio data
- The 2nd YouTube-8M Video Understanding Challenge, 2nd place
- For video feature vector
28
- Mixup
Examples in competitions
[2] Large batch size
29
- With a severe label noise, large batch size is helpful for training.
- Within a large batch, the gradient from random noisy labels cancels out.
Larger batch size is effective.
https://arxiv.org/abs/1705.10694
30
- Large batch size
Other aspect: sharp and flat minimum
- If the noise of the gradient is too small, the model is likely to converge into a sharp minimum.
- At the sharp minimum, the model is not likely to be generalized.
- For balancing, it is said that the learning rate should be tuned together with batch size.
https://arxiv.org/abs/1609.04836
- Practically, only considering batch size is not enough.
31
- Large batch size
Experimental results
*Trained under the same number of iterations. (epoch=batch_size)
- Clear proportional relationship is observed.
- Not so large batch size looks optimal for this dataset.
32
- Large batch size
Experimental results
- Note that there are other relationships.
- Learning-rate vs weight-decay
- Although the purposes of algorithms are different, they have a strong relationship.
- It is important to tune parameters with considering their interactions.
- Usually, it is hard to set large batch size because of GPU memory.
- Approach 1:
- Gradient accumulation
- Approach 2:
- Mixed precision training
- https://github.com/NVIDIA/apex
33
- Large batch size
Tips for setting large batch size
- Gradient accumulation
- https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/runner/base_runner.py#L102-L104
- Hyper-parameters are set by arguments
- https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/common.py#L15-L34
34
- Large batch size
Implementations
- Quick, Draw! Doodle Recognition Challenge, 5th place
- Batch size up to 10K.
- iMet Collection 2019 - FGVC6, 1st place
- Batch size 1000~1500 (Accumulation 10~20 times)
35
- Large batch size
Examples in competitions
[3] Distillation
- Train student network with pre-trained teacher prediction.
- It eases student model’s training. (student can understand which sample is difficult)
36
https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764
37
- Distillation
Procedure [1/3]
Train teacher
Teacher model
Teacher model
prediction
Train data (data) Train data (label)
Teacher prediction for
train data
Test data (data)
Operation
Data
Train data
Train Train Train Train Valid
Train Train Train Valid Train
Train Train Valid Train Train
Train Valid Train Train Train
Valid Train Train Train Train
Fold1
Fold2
Fold3
Fold4
Fold5
OOF prediction
Cross validation
For making teacher predictions, the model is often trained with cross validation.
38
- Distillation
Procedure [2/3]
Train teacher
Teacher model
Teacher model
prediction
Train data (data) Train data (label)
Teacher prediction for
train data
Test data (data)
Operation
Data
Train student
Student model
Train data
Train Train Train Train Valid
Train Train Train Valid Train
Train Train Valid Train Train
Train Valid Train Train Train
Valid Train Train Train Train
Fold1
Fold2
Fold3
Fold4
Fold5
OOF prediction
Cross validation
39
- Distillation
Procedure [3/3]
Train teacher
Teacher model
Teacher model
prediction
Train data (data) Train data (label)
Teacher prediction for
train data
Test data (data)
Operation
Data
Train student
Student model
Model prediction
Prediction result for
test data
Train data
Train Train Train Train Valid
Train Train Train Valid Train
Train Train Valid Train Train
Train Valid Train Train Train
Valid Train Train Train Train
Fold1
Fold2
Fold3
Fold4
Fold5
OOF prediction
Cross validation
There are some strategies for training the student.
- Use (a*soft-label loss) + (b*hard-label loss) as the student’s loss function.
- Max(0.7*soft-label, hard-label) as new label. (e.g. for F2 metric)
- Softmax with temperature is sometimes used for teacher prediction.
- In original paper, KL divergence between the student and the teacher was also used.
- https://arxiv.org/abs/1503.02531
40
- Distillation
How to use teacher prediction
teacher prediction (=soft label)
hard label
0.20 0.70soft label = 0.10 0.70
0.00 1.00hard label = 0.00 0.00
0.14 1.00new target = 0.07 0.49
- Distillation can smooth out extremity in noisy hard labels.
- If the data is complex and it is hard to annotate, teacher prediction labels for the
data may not have high confidence.
- When a cat is annotated as a dog by mistake, the teacher prediction label for the
data may be close to dog if other datasets are reliable.
41
- Distillation
Why distillation for noisy dataset?
dog cat
label (noisy) 0 1
teacher prediction 0.9 0.1
42
- Distillation
Experimental results
- Distillation performance is better than base performance (0.563).
- Improved performance even with the same model architecture.
- Weight of the soft loss affects performance.
- 2 (soft-loss effect is double of the hard-loss) is the best.
- Is it because of the noisy dataset?
- Calculate hard and soft loss
- https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/runner/distillation_runner.py#L64-L77
43
- Distillation
Implementations
- iMet Collection 2019 - FGVC6, 9th place
- Max(0.7*soft-label, hard-label) as new targets
- Property of the competition metric (F2) is considered.
- The 2nd YouTube-8M Video Understanding Challenge, 2nd place
- Multi-stage distillation
44
- Distillation
Examples in competitions
Summary
45
● Introduction
● Setup of the experiment
● Techniques for learning with a noisy dataset
- Learning with selected samples
- Drop large error samples from training
- Curriculum learning
- Learning with noise transition
- Forward correction (modify objective)
46
(Curriculum learning)
(Noise translation matrix)
There are many other techniques
https://arxiv.org/abs/1808.01097
55 45
55 45
55 45
55 45
55
50
50
50
50
50
12.5
12.5
EOF
47

More Related Content

What's hot

LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
RyuichiKanoh
 

What's hot (20)

PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
 
ブラックボックスからXAI (説明可能なAI) へ - LIME (Local Interpretable Model-agnostic Explanat...
ブラックボックスからXAI (説明可能なAI) へ - LIME (Local Interpretable Model-agnostic Explanat...ブラックボックスからXAI (説明可能なAI) へ - LIME (Local Interpretable Model-agnostic Explanat...
ブラックボックスからXAI (説明可能なAI) へ - LIME (Local Interpretable Model-agnostic Explanat...
 
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
 
自然言語処理紹介(就職編)
自然言語処理紹介(就職編)自然言語処理紹介(就職編)
自然言語処理紹介(就職編)
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
Kaggleのテクニック
 
TabNetの論文紹介
TabNetの論文紹介TabNetの論文紹介
TabNetの論文紹介
 
表形式データで高性能な予測モデルを構築する「DNNとXGBoostのアンサンブル学習」
表形式データで高性能な予測モデルを構築する「DNNとXGBoostのアンサンブル学習」表形式データで高性能な予測モデルを構築する「DNNとXGBoostのアンサンブル学習」
表形式データで高性能な予測モデルを構築する「DNNとXGBoostのアンサンブル学習」
 
大規模並列実験を支えるクラウドサービスと基盤技術
大規模並列実験を支えるクラウドサービスと基盤技術大規模並列実験を支えるクラウドサービスと基盤技術
大規模並列実験を支えるクラウドサービスと基盤技術
 
Long-Tailed Classificationの最新動向について
Long-Tailed Classificationの最新動向についてLong-Tailed Classificationの最新動向について
Long-Tailed Classificationの最新動向について
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Kaggle&競プロ紹介 in 中田研究室
Kaggle&競プロ紹介 in 中田研究室Kaggle&競プロ紹介 in 中田研究室
Kaggle&競プロ紹介 in 中田研究室
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
 
推薦アルゴリズムの今までとこれから
推薦アルゴリズムの今までとこれから推薦アルゴリズムの今までとこれから
推薦アルゴリズムの今までとこれから
 
ChatGPTの仕組みの解説と実務でのLLMの適用の紹介_latest.pdf
ChatGPTの仕組みの解説と実務でのLLMの適用の紹介_latest.pdfChatGPTの仕組みの解説と実務でのLLMの適用の紹介_latest.pdf
ChatGPTの仕組みの解説と実務でのLLMの適用の紹介_latest.pdf
 
DataRobotを用いた要因分析 (Causal Analysis by DataRobot)
DataRobotを用いた要因分析 (Causal Analysis by DataRobot)DataRobotを用いた要因分析 (Causal Analysis by DataRobot)
DataRobotを用いた要因分析 (Causal Analysis by DataRobot)
 
A systematic study of the class imbalance problem in convolutional neural net...
A systematic study of the class imbalance problem in convolutional neural net...A systematic study of the class imbalance problem in convolutional neural net...
A systematic study of the class imbalance problem in convolutional neural net...
 
適切なクラスタ数を機械的に求める手法の紹介
適切なクラスタ数を機械的に求める手法の紹介適切なクラスタ数を機械的に求める手法の紹介
適切なクラスタ数を機械的に求める手法の紹介
 

Similar to Practical tips for handling noisy data and annotaiton

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 

Similar to Practical tips for handling noisy data and annotaiton (20)

Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
Applying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labelsApplying Deep Learning with Weak and Noisy labels
Applying Deep Learning with Weak and Noisy labels
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
Mlp mixer image_process_210613 deeplearning paper review!
Mlp mixer image_process_210613 deeplearning paper review!Mlp mixer image_process_210613 deeplearning paper review!
Mlp mixer image_process_210613 deeplearning paper review!
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Deep learning-practical
Deep learning-practicalDeep learning-practical
Deep learning-practical
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...
 

Recently uploaded

怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 

Recently uploaded (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 

Practical tips for handling noisy data and annotaiton

  • 1. KaggleDays Tokyo Workshop Practical tips for handling noisy data and annotation Ryuichi Kanoh (RK) December 11, 2019 https://www.kaggle.com/ryuichi0704
  • 2. Overview - This is a KaggleDays workshop on noise handling sponsored by DeNA. - In addition to explaining the techniques, I will touch on: - Experimental results - Implementations (https://github.com/ryuichi0704/workshop_noise_handling) - Interactive communication is welcome. - Both in English and Japanese. 2
  • 3. Agenda ● Introduction ● Setup of the experiment ● Techniques for learning with a noisy dataset 3
  • 4. Agenda ● Introduction ● Setup of the experiment ● Techniques for learning with a noisy dataset 4
  • 5. Big data and machine learning - Large and high-quality dataset drives the success of ML. - However, it is very hard to prepare such a dataset. - So, you probably want to use crowdsourcing / crawling and so on. https://medium.com/syncedreview/sensetime-trains-imagenet-alexnet-in-record-1-5-minutes-e944ab049b2c 5
  • 6. Possible noise from web crawling Google Search - The keywords may not be relevant to the image content. 6
  • 7. Possible noise from crowd sourcing Annotation error may occur with - limited communication between seekers and solvers. - limited working time. https://www.flickr.com/photos/kjempekjekt/3470254809 7
  • 8. Noise example in kaggle competitions 8 - Task - 340 class timestamped vector classification - Difficulty - Dataset is collected from browser game. - Quality differences depending on drawer https://quickdraw.withgoogle.com/
  • 10. There are many noisy datasets in kaggle 10 Classes are fine-grained. It is difficult even for a human to annotate consistently. 75% of the dataset is annotated by metadata. (not by a human) Annotation granularity is not stable. (e.g. [face] vs [face, nose, eye, mouth...])
  • 11. There are many noisy datasets in kaggle 11 Labels vary depending on the annotator. Annotation was crowd-sourced. There are external datasets with noisy annotations. Each video was automatically annotated by the YouTube annotation system.
  • 12. Agenda ● Introduction ● Setup of the experiment ● Techniques for learning with a noisy dataset 12
  • 13. Setup - Use QuickDraw (Link) dataset. - 340 class image classification. - Evaluation metric is top-1 accuracy. - Timestamped vectors are converted to 1-channel images with 32x32 resolution. - Dataset is randomly subsampled from original dataset. - Train: 81600 samples, Test: 20400 samples (random split) - 300 images per class in total. - Test accuracy observed with the maximum validation accuracy is reported. - Base setting: 13 model base-lr batch-size epoch train:valid objective optimizer scheduler ResNet18 0.1 128 50 9:1 (random split) Cross entropy SGD with nesterov momentum, w-decay 1e-4 MultistepLR (x 0.1 at 40, 45 epoch) *Other details are in the GitHub repository.
  • 14. Setup 14 Local machine Container registry AI platform training Google Cloud push Google sheet results pull - Experiments were done with AI platform training. notification
  • 15. Base results 15 - Test accuracy distribution with 50 seeds. - 0.563 is an average performance. - random seed effect is around 0.004 (maximum: ~0.010)
  • 16. Model output analysis - Check hard samples and easy samples. 16 prediction (for validation dataset) label Error = 0.01 0.90 0.03
  • 17. 17 - Model output analysis Easy samples (based on cross-entropy) label / pred
  • 18. 18 - Model output analysis Hard samples (based on cross-entropy) label / pred
  • 19. 19 - Model output analysis Why difficult? - There are a number of ways in which a model can be difficult. - Image itself is noisy or wrong - 1, 2, 8, 9 - There are similar confusing classes - 3, 4, 7 - So, the model is definitely struggling. - Are there any techniques we can use improve model training? 1 2 3 4 5 6 7 8 9
  • 20. ● Introduction ● Setup of the experiment ● Techniques for learning with a noisy dataset 1. Mixup 2. Large batch size 3. Distillation 20 Agenda
  • 21. [1] Mixup - Construct virtual training sample with - λ is randomly sampled from symmetric beta distribution. 21 https://arxiv.org/abs/1710.09412 http://moustaphacisse.com/
  • 22. - Mixup Beta distribution 22 http://wazalabo.com/mixup_1.html - Large alpha : strong smoothing (in original paper, 0.2 for ImageNet)
  • 23. - Even though we have blue (noisy) sample in here, its effect is suppressed by the surrounding red. - Effectiveness for label noise is also mentioned in the original paper. 23 https://www.inference.vc/mixup-data-dependent-data-augmentation/ - Mixup Why mixup for noisy dataset?
  • 24. - You may ask what happens if we mix in the intermediate vector. - It is called “Manifold mixup”. https://arxiv.org/abs/1806.05236 - Feature vectors in the random layer are mixed using the same mixup procedure. 24 - Mixup Derivatives of mixup https://arxiv.org/abs/1512.03385
  • 25. 25 - Mixup Experimental results - Mixup performance is better than base performance (0.563). - Important aspects: - Performance changes drastically with alpha (beta-distribution parameter). - A manifold-mixup is also a viable alternative. - It can be used not only for images, but for categorical tabular data and so on..
  • 26. - Data mixing based on beta distribution - https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/model.py#L39-L51 - Select mixing layer (for manifold mixup) - https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/model.py#L28-L37 - Loss calculation - https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/runner/mixup_runner.py#L17-L20 26 - Mixup Implementations
  • 27. - Stopping strong augmentation (like mixup, auto-augment) on the final phase is helpful. (https://arxiv.org/abs/1909.09148) - Performance improvement is observed with our QuickDraw dataset. 27 - Mixup Tips on training with mixup QuickDraw dataset, with mixup (alpha=1.6) 0.6074 → 0.6165
  • 28. - iMet Collection 2019 - FGVC6, 6th place - For image data - Freesound Audio Tagging 2019, 1st place - For audio data - The 2nd YouTube-8M Video Understanding Challenge, 2nd place - For video feature vector 28 - Mixup Examples in competitions
  • 29. [2] Large batch size 29 - With a severe label noise, large batch size is helpful for training. - Within a large batch, the gradient from random noisy labels cancels out. Larger batch size is effective. https://arxiv.org/abs/1705.10694
  • 30. 30 - Large batch size Other aspect: sharp and flat minimum - If the noise of the gradient is too small, the model is likely to converge into a sharp minimum. - At the sharp minimum, the model is not likely to be generalized. - For balancing, it is said that the learning rate should be tuned together with batch size. https://arxiv.org/abs/1609.04836 - Practically, only considering batch size is not enough.
  • 31. 31 - Large batch size Experimental results *Trained under the same number of iterations. (epoch=batch_size) - Clear proportional relationship is observed. - Not so large batch size looks optimal for this dataset.
  • 32. 32 - Large batch size Experimental results - Note that there are other relationships. - Learning-rate vs weight-decay - Although the purposes of algorithms are different, they have a strong relationship. - It is important to tune parameters with considering their interactions.
  • 33. - Usually, it is hard to set large batch size because of GPU memory. - Approach 1: - Gradient accumulation - Approach 2: - Mixed precision training - https://github.com/NVIDIA/apex 33 - Large batch size Tips for setting large batch size
  • 34. - Gradient accumulation - https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/runner/base_runner.py#L102-L104 - Hyper-parameters are set by arguments - https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/common.py#L15-L34 34 - Large batch size Implementations
  • 35. - Quick, Draw! Doodle Recognition Challenge, 5th place - Batch size up to 10K. - iMet Collection 2019 - FGVC6, 1st place - Batch size 1000~1500 (Accumulation 10~20 times) 35 - Large batch size Examples in competitions
  • 36. [3] Distillation - Train student network with pre-trained teacher prediction. - It eases student model’s training. (student can understand which sample is difficult) 36 https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764
  • 37. 37 - Distillation Procedure [1/3] Train teacher Teacher model Teacher model prediction Train data (data) Train data (label) Teacher prediction for train data Test data (data) Operation Data Train data Train Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Train Fold1 Fold2 Fold3 Fold4 Fold5 OOF prediction Cross validation For making teacher predictions, the model is often trained with cross validation.
  • 38. 38 - Distillation Procedure [2/3] Train teacher Teacher model Teacher model prediction Train data (data) Train data (label) Teacher prediction for train data Test data (data) Operation Data Train student Student model Train data Train Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Train Fold1 Fold2 Fold3 Fold4 Fold5 OOF prediction Cross validation
  • 39. 39 - Distillation Procedure [3/3] Train teacher Teacher model Teacher model prediction Train data (data) Train data (label) Teacher prediction for train data Test data (data) Operation Data Train student Student model Model prediction Prediction result for test data Train data Train Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Valid Train Train Train Train Fold1 Fold2 Fold3 Fold4 Fold5 OOF prediction Cross validation
  • 40. There are some strategies for training the student. - Use (a*soft-label loss) + (b*hard-label loss) as the student’s loss function. - Max(0.7*soft-label, hard-label) as new label. (e.g. for F2 metric) - Softmax with temperature is sometimes used for teacher prediction. - In original paper, KL divergence between the student and the teacher was also used. - https://arxiv.org/abs/1503.02531 40 - Distillation How to use teacher prediction teacher prediction (=soft label) hard label 0.20 0.70soft label = 0.10 0.70 0.00 1.00hard label = 0.00 0.00 0.14 1.00new target = 0.07 0.49
  • 41. - Distillation can smooth out extremity in noisy hard labels. - If the data is complex and it is hard to annotate, teacher prediction labels for the data may not have high confidence. - When a cat is annotated as a dog by mistake, the teacher prediction label for the data may be close to dog if other datasets are reliable. 41 - Distillation Why distillation for noisy dataset? dog cat label (noisy) 0 1 teacher prediction 0.9 0.1
  • 42. 42 - Distillation Experimental results - Distillation performance is better than base performance (0.563). - Improved performance even with the same model architecture. - Weight of the soft loss affects performance. - 2 (soft-loss effect is double of the hard-loss) is the best. - Is it because of the noisy dataset?
  • 43. - Calculate hard and soft loss - https://github.com/ryuichi0704/workshop_noise_handling/blob/master/project/work/runner/distillation_runner.py#L64-L77 43 - Distillation Implementations
  • 44. - iMet Collection 2019 - FGVC6, 9th place - Max(0.7*soft-label, hard-label) as new targets - Property of the competition metric (F2) is considered. - The 2nd YouTube-8M Video Understanding Challenge, 2nd place - Multi-stage distillation 44 - Distillation Examples in competitions
  • 45. Summary 45 ● Introduction ● Setup of the experiment ● Techniques for learning with a noisy dataset
  • 46. - Learning with selected samples - Drop large error samples from training - Curriculum learning - Learning with noise transition - Forward correction (modify objective) 46 (Curriculum learning) (Noise translation matrix) There are many other techniques https://arxiv.org/abs/1808.01097 55 45 55 45 55 45 55 45 55 50 50 50 50 50 12.5 12.5

Editor's Notes

  1. ここまで24min
  2. ここまでで37min
  3. ここまで46min