SlideShare a Scribd company logo
1 of 26
Jaemin Jeong Seminar 2
 𝑈2-𝑁𝑒𝑡, for salient object detection.
 Advantage
 It is able to capture more contextual information
from different scales thanks to the mixture of
receptive fields of different sizes in our proposed
ReSidual U-blocks(RSU).
 It increases the depth of the whole architecture
without significantly increasing the computational
cost because of the pooling operations used in
these RSU blocks.
 Propose
 𝑈2
-𝑁𝑒𝑡 : 176.3 MB, 30 FPS on GTX 1080Ti GPU
 𝑈2-𝑁𝑒𝑡† : 4.7 MB, 40 FPS
Abstract
Jaemin Jeong Seminar 3
Saliency Object Detection
SOD(Saliency Object Detection) : 중요한 물체를 검출하는 방법
FD(Fixation Prediction) : 사람의 시선이 어디에 가장 오래 머물지 예측하
는 방법
Salient Object Detection (SOD) aims at segmenting the most visually attractive objects in an image
Jaemin Jeong Seminar 4
 There is a common pattern in the design of most SOD networks, that is, they focus on making
good use of deep features extracted by existing backbones, such as Alexnet, VGG, ResNet,
ResNeXt, DenseNet, etc.
 However, these backbones are all originally designed for image classification.
 They extract features that are representative of semantic meaning rather than local details and
global contrast information, which are essential to saliency detection.
 And they need to be pretrained on ImageNet data which data-inefficient especially if the target
data follows a different distribution than ImageNet.
This leads to our first question: can we design a new network for SOD, that allows training from
scratch and achieves comparable or better performance than those based on existing pre-trained
backbones?
Introduction
Jaemin Jeong Seminar 5
 First, they are often overly complicated. It is partially due to the additional feature aggregation
modules that are added to the existing backbones to extract multilevel saliency features from
these backbones.
 Secondly, the existing backbones usually achieve deeper architecture by sacrificing high resolution
of feature maps.
This leads to our first question: can we go deeper while maintaining high resolution feature maps, at
a low memory and computation cost?
Issues on the network architectures for SOD
Jaemin Jeong Seminar 6
Multi-level deep feature integration
= methods mainly focus on developing better multi-level feature
aggregation strategies.
Multi-scale feature extraction
= target at designing new modules for extracting both local and global
information from features obtained by backbone networks.
Related Works
Jaemin Jeong Seminar 7
RSU too much
computation
and memory
resources
Low receptive field proposed
Jaemin Jeong Seminar 8
RSU
To decrease the computation costs, PoolNet adapt the parallel configuration from pyramid pooling modules
(PPM), which uses small kernel filters on the downsampled feature maps other than the dilated convolutions on
the original size feature maps. But fusion of different scale features by direct upsampling and concatenation (or
addition) may lead to degradation of high resolution features.
Conv layer
Jaemin Jeong Seminar 9
RSU
Jaemin Jeong Seminar 10
U x 𝑛-Net : Repeat Unet n times. (stacked hourglass network, DocUNet, CU-Net)
Architecture
𝑈𝑛 -Net : Our exponential notation refers to nested U-structure rather than cascaded stacking.
Jaemin Jeong Seminar 11
𝑈2
-𝑁𝑒𝑡
Relatively low resolution.
RSU-
4F
𝐿 =
𝑚=1
𝑀
𝑤𝑠𝑖𝑑𝑒
𝑚
ℓ𝑠𝑖𝑑𝑒
(𝑚)
+ 𝑤𝑓𝑢𝑠𝑒ℓ𝑓𝑢𝑠𝑒
ℓ𝑠𝑖𝑑𝑒
(𝑚)
: loss of the side output saliency
map
ℓ𝑓𝑢𝑠𝑒 : loss of the final output saliency
map
ℓ = −
(𝑟,𝑐)
(𝐻,𝑊)
[𝑃𝐺 𝑟,𝑐 log 𝑃𝑆(𝑟,𝑐) + 1 − 𝑃𝐺 𝑟,𝑐 log(1 − 𝑃𝑆(𝑟,𝑐))]
Binary Cross Entropy
𝑃𝐺(𝑟,𝑐) = Ground Truth
𝑃𝑆(𝑟,𝑐) = Predicted
𝑤
𝑚
, 𝑤 are all set to 1.
Jaemin Jeong Seminar 12
Training
DUTS-TR contains 10553 images in total. Currently, it is the largest and most frequently used training dataset for
salient object detection. We augment this dataset by horizontal flipping to obtain 21106 training images offline
Evaluation
DUT-OMRON includes 5168 images, most of which contain one or two structurally complex foreground objects.
DUTS-TE, which contains 5019 images.
HKU-IS contains 4447 images with multiple foreground objects.
ECSSD contains 1000 structurally complex images and many of them contain large foreground objects.
PASCAL-S contains 850 images with complex foreground objects and cluttered background.
SOD only contains 300 images. But it is very challenging. Because it was originally designed for image
segmentation and many images are low contrast or contain complex foreground objects overlapping with the
image boundary.
Dataset
Jaemin Jeong Seminar 13
 Predict : 0 ~ 1
 GT : 0 or 1
PR curve : set of precision-recall paris, threshold 0 ~ 1 of predicted pixels.
Evaluation Metrics
Jaemin Jeong Seminar 14
ROC-curve
https://angeloyeo.github.io/2020/08/05/ROC.html
Jaemin Jeong Seminar 15
𝐹𝛽 =
1 + 𝛽 2
× 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝛽2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
𝛽2
: 0.3 (importance of the precision value)
report the maximum 𝐹𝛽(𝑚𝑎𝑥𝐹𝛽) for each dataset similar to previous works....
weighted F-measure
𝐹𝛽
𝑤
= 1 + 𝛽2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑤
⋅ 𝑅𝑒𝑐𝑎𝑙𝑙𝑤
𝛽2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑤 + 𝑅𝑒𝑐𝑎𝑙𝑙𝑤
overcoming the possible unfair comparison
F-measure
Jaemin Jeong Seminar 16
 Mean Absolute Error
𝑀𝐴𝐸 =
1
𝐻 × 𝑊 𝑟=1
𝐻
𝑐=1
𝑊
𝑃 𝑟, 𝑐 − 𝐺(𝑟, 𝑐)
MAE
Jaemin Jeong Seminar 17
 structure similarity of the predicted non-binary saliency map and the ground truth.
𝑆 = 1 − 𝛼 𝑆𝑟 + 𝛼 𝑆𝑜
 𝑆𝑟 = weighted sum of region-aware
 𝑆𝑜 = weighted sum of object-aware
 𝛼 = 0.5
S-measure(𝑆𝑚)
Jaemin Jeong Seminar 18
 𝑟𝑒𝑙𝑎𝑥𝐹𝛽
𝑏
is utilized to quantitatively evaluate boundaries’ quality of the predicted saliency maps
𝑟𝑒𝑙𝑎𝑥 𝐹𝛽 =
1 + 𝛽 2
× 𝑟𝑒𝑙𝑎𝑥𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑏
× 𝑟𝑒𝑙𝑎𝑥𝑅𝑒𝑐𝑎𝑙𝑙𝑏
𝛽2 × 𝑟𝑒𝑙𝑎𝑥𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑏 + 𝑟𝑒𝑙𝑎𝑥𝑅𝑒𝑐𝑎𝑙𝑙𝑏
1. (예측값, 실측값)에서 0.5 임계값으로 각각의 𝑃𝑏𝑤가 얻어진다.
2. 𝑋𝑂𝑅(𝑃𝑏𝑤, 𝑃𝑒𝑟𝑑)를 계산한다. (𝑃𝑒𝑟𝑑는 𝑃𝑏𝑤의 eroded binary mask.)
The definition of relaxed boundary precision (𝑟𝑒𝑙𝑎𝑥𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑏
) is the fraction of predicted boundary
pixels within a range of 𝜌 pixels from ground truth boundary pixels.
The relaxed boundary recall (𝑟𝑒𝑙𝑎𝑥𝑅𝑒𝑐𝑎𝑙𝑙𝑏
) is defined as the fraction of ground truth boundary pixels
that are within 𝜌 pixels of predicted boundary pixels. (𝜌 : 3)
Relax boundary F-measure
Jaemin Jeong Seminar 19
 first resized to 320×320 -> randomly flipped vertically -> cropped to 288×288.
 Adam(initial learning rate lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight decay=0)
 After 600k iterations (with a batch size of 12), the training loss converges and the whole training
process takes about 120 hours.
 Both training and testing are conducted on an eight-core, 16 threads PC with an AMD Ryzen
1800x 3.5 GHz CPU (32GB RAM) and a GTX 1080ti GPU (11GB memory).
Implementation Detail
Jaemin Jeong Seminar 20
Ablation on Blocks
Naive
Pyramid
Pooling
Module
Jaemin Jeong Seminar 21
Results of ablation study on different blocks.
encoder
-> backbone
Architecture
Block
Jaemin Jeong Seminar 22
Quantitative Comparison(PR-curve)
Jaemin Jeong Seminar 23
Table - Comparison 1
Jaemin Jeong Seminar 24
Table - Comparison 2
Qualitative
comparison
small obj
Large obj
touch borders
clean backgrond
Large, thin
clean backgrond
detail
clutter background
complicated foreground
multiple targets
Jaemin Jeong Seminar 26
 Proposed a novel deep network 𝑈2
-𝑁𝑒𝑡, for Saliency Object Detection.
 Two-level nested U-structure.
 newly designed RSU blocks.
 Provide a full size 𝑈2
-Net (176.3 MB, 30 FPS) and a smaller size version 𝑈2
-Net† (4.7 MB, 40
FPS) in this paper.
 Although our models achieve competitive results against other state-of-the-art methods, faster and
smaller models are needed for computation and memory limited devices, such as mobile phones,
robots, etc.
Conclusion

More Related Content

What's hot

Image segmentation 2
Image segmentation 2 Image segmentation 2
Image segmentation 2
Rumah Belajar
 
Vulnerable Out of the Box: An Evaluation of Android Carrier Devices
Vulnerable Out of the Box: An Evaluation of Android Carrier DevicesVulnerable Out of the Box: An Evaluation of Android Carrier Devices
Vulnerable Out of the Box: An Evaluation of Android Carrier Devices
Priyanka Aash
 

What's hot (20)

Sharpening spatial filters
Sharpening spatial filtersSharpening spatial filters
Sharpening spatial filters
 
Gray level transformation
Gray level transformationGray level transformation
Gray level transformation
 
Image segmentation
Image segmentation Image segmentation
Image segmentation
 
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
 
VLSI- An Automotive Application Perspective
VLSI- An Automotive Application PerspectiveVLSI- An Automotive Application Perspective
VLSI- An Automotive Application Perspective
 
Nx cmm creating_probes_v._8.5
Nx cmm creating_probes_v._8.5Nx cmm creating_probes_v._8.5
Nx cmm creating_probes_v._8.5
 
Image segmentation 2
Image segmentation 2 Image segmentation 2
Image segmentation 2
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-net
 
Mathematical tools in dip
Mathematical tools in dipMathematical tools in dip
Mathematical tools in dip
 
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Ip elements of image processing
Ip elements of image processingIp elements of image processing
Ip elements of image processing
 
08 frequency domain filtering DIP
08 frequency domain filtering DIP08 frequency domain filtering DIP
08 frequency domain filtering DIP
 
FORGERY (COPY-MOVE) DETECTION IN DIGITAL IMAGES USING BLOCK METHOD
FORGERY (COPY-MOVE) DETECTION IN DIGITAL IMAGES USING BLOCK METHODFORGERY (COPY-MOVE) DETECTION IN DIGITAL IMAGES USING BLOCK METHOD
FORGERY (COPY-MOVE) DETECTION IN DIGITAL IMAGES USING BLOCK METHOD
 
Image Representation & Descriptors
Image Representation & DescriptorsImage Representation & Descriptors
Image Representation & Descriptors
 
Vulnerable Out of the Box: An Evaluation of Android Carrier Devices
Vulnerable Out of the Box: An Evaluation of Android Carrier DevicesVulnerable Out of the Box: An Evaluation of Android Carrier Devices
Vulnerable Out of the Box: An Evaluation of Android Carrier Devices
 
Shannon and 5 good criteria of a good cipher
Shannon and 5 good criteria of a good cipher Shannon and 5 good criteria of a good cipher
Shannon and 5 good criteria of a good cipher
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep Learning
 

Similar to 2021 05-04-u2-net

UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
NoorUlHaq47
 
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...
IJECEIAES
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
Poster Segmentation Chain
Poster Segmentation ChainPoster Segmentation Chain
Poster Segmentation Chain
RMwebsite
 

Similar to 2021 05-04-u2-net (20)

Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
Machine Vision on Embedded Hardware
Machine Vision on Embedded HardwareMachine Vision on Embedded Hardware
Machine Vision on Embedded Hardware
 
2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx2022-01-17-Rethinking_Bisenet.pptx
2022-01-17-Rethinking_Bisenet.pptx
 
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Poster Segmentation Chain
Poster Segmentation ChainPoster Segmentation Chain
Poster Segmentation Chain
 
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...
A Smart Camera Processing Pipeline for Image Applications Utilizing Marching ...
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
Deep learning for image super resolution
Deep learning for image super resolutionDeep learning for image super resolution
Deep learning for image super resolution
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_paper
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
L14.pdf
L14.pdfL14.pdf
L14.pdf
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

More from JAEMINJEONG5

More from JAEMINJEONG5 (20)

Jaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptxJaemin_230701_Simple_Copy_paste.pptx
Jaemin_230701_Simple_Copy_paste.pptx
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmt
 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-sean
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
2021 03-02-spade
2021 03-02-spade2021 03-02-spade
2021 03-02-spade
 
2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases2021 03-02-distributed representations-of_words_and_phrases
2021 03-02-distributed representations-of_words_and_phrases
 
2021 03-02-transformer interpretability
2021 03-02-transformer interpretability2021 03-02-transformer interpretability
2021 03-02-transformer interpretability
 
2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers2021 03-01-on the relationship between self-attention and convolutional layers
2021 03-01-on the relationship between self-attention and convolutional layers
 
2021 01-04-learning filter-basis
2021 01-04-learning filter-basis2021 01-04-learning filter-basis
2021 01-04-learning filter-basis
 
2021 01-02-linformer
2021 01-02-linformer2021 01-02-linformer
2021 01-02-linformer
 
2020 12-04-shake shake
2020 12-04-shake shake2020 12-04-shake shake
2020 12-04-shake shake
 
2020 12-03-vit
2020 12-03-vit2020 12-03-vit
2020 12-03-vit
 
2020 12-2-detr
2020 12-2-detr2020 12-2-detr
2020 12-2-detr
 
2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks2020 11 4_bag_of_tricks
2020 11 4_bag_of_tricks
 
2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart2020 11 2_automated sleep stage scoring of the sleep heart
2020 11 2_automated sleep stage scoring of the sleep heart
 
2020 11 1_sleep_net
2020 11 1_sleep_net2020 11 1_sleep_net
2020 11 1_sleep_net
 
2020 12-1-adam w
2020 12-1-adam w2020 12-1-adam w
2020 12-1-adam w
 
2020 11 3_face_detection
2020 11 3_face_detection2020 11 3_face_detection
2020 11 3_face_detection
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 

2021 05-04-u2-net

  • 1.
  • 2. Jaemin Jeong Seminar 2  𝑈2-𝑁𝑒𝑡, for salient object detection.  Advantage  It is able to capture more contextual information from different scales thanks to the mixture of receptive fields of different sizes in our proposed ReSidual U-blocks(RSU).  It increases the depth of the whole architecture without significantly increasing the computational cost because of the pooling operations used in these RSU blocks.  Propose  𝑈2 -𝑁𝑒𝑡 : 176.3 MB, 30 FPS on GTX 1080Ti GPU  𝑈2-𝑁𝑒𝑡† : 4.7 MB, 40 FPS Abstract
  • 3. Jaemin Jeong Seminar 3 Saliency Object Detection SOD(Saliency Object Detection) : 중요한 물체를 검출하는 방법 FD(Fixation Prediction) : 사람의 시선이 어디에 가장 오래 머물지 예측하 는 방법 Salient Object Detection (SOD) aims at segmenting the most visually attractive objects in an image
  • 4. Jaemin Jeong Seminar 4  There is a common pattern in the design of most SOD networks, that is, they focus on making good use of deep features extracted by existing backbones, such as Alexnet, VGG, ResNet, ResNeXt, DenseNet, etc.  However, these backbones are all originally designed for image classification.  They extract features that are representative of semantic meaning rather than local details and global contrast information, which are essential to saliency detection.  And they need to be pretrained on ImageNet data which data-inefficient especially if the target data follows a different distribution than ImageNet. This leads to our first question: can we design a new network for SOD, that allows training from scratch and achieves comparable or better performance than those based on existing pre-trained backbones? Introduction
  • 5. Jaemin Jeong Seminar 5  First, they are often overly complicated. It is partially due to the additional feature aggregation modules that are added to the existing backbones to extract multilevel saliency features from these backbones.  Secondly, the existing backbones usually achieve deeper architecture by sacrificing high resolution of feature maps. This leads to our first question: can we go deeper while maintaining high resolution feature maps, at a low memory and computation cost? Issues on the network architectures for SOD
  • 6. Jaemin Jeong Seminar 6 Multi-level deep feature integration = methods mainly focus on developing better multi-level feature aggregation strategies. Multi-scale feature extraction = target at designing new modules for extracting both local and global information from features obtained by backbone networks. Related Works
  • 7. Jaemin Jeong Seminar 7 RSU too much computation and memory resources Low receptive field proposed
  • 8. Jaemin Jeong Seminar 8 RSU To decrease the computation costs, PoolNet adapt the parallel configuration from pyramid pooling modules (PPM), which uses small kernel filters on the downsampled feature maps other than the dilated convolutions on the original size feature maps. But fusion of different scale features by direct upsampling and concatenation (or addition) may lead to degradation of high resolution features. Conv layer
  • 10. Jaemin Jeong Seminar 10 U x 𝑛-Net : Repeat Unet n times. (stacked hourglass network, DocUNet, CU-Net) Architecture 𝑈𝑛 -Net : Our exponential notation refers to nested U-structure rather than cascaded stacking.
  • 11. Jaemin Jeong Seminar 11 𝑈2 -𝑁𝑒𝑡 Relatively low resolution. RSU- 4F 𝐿 = 𝑚=1 𝑀 𝑤𝑠𝑖𝑑𝑒 𝑚 ℓ𝑠𝑖𝑑𝑒 (𝑚) + 𝑤𝑓𝑢𝑠𝑒ℓ𝑓𝑢𝑠𝑒 ℓ𝑠𝑖𝑑𝑒 (𝑚) : loss of the side output saliency map ℓ𝑓𝑢𝑠𝑒 : loss of the final output saliency map ℓ = − (𝑟,𝑐) (𝐻,𝑊) [𝑃𝐺 𝑟,𝑐 log 𝑃𝑆(𝑟,𝑐) + 1 − 𝑃𝐺 𝑟,𝑐 log(1 − 𝑃𝑆(𝑟,𝑐))] Binary Cross Entropy 𝑃𝐺(𝑟,𝑐) = Ground Truth 𝑃𝑆(𝑟,𝑐) = Predicted 𝑤 𝑚 , 𝑤 are all set to 1.
  • 12. Jaemin Jeong Seminar 12 Training DUTS-TR contains 10553 images in total. Currently, it is the largest and most frequently used training dataset for salient object detection. We augment this dataset by horizontal flipping to obtain 21106 training images offline Evaluation DUT-OMRON includes 5168 images, most of which contain one or two structurally complex foreground objects. DUTS-TE, which contains 5019 images. HKU-IS contains 4447 images with multiple foreground objects. ECSSD contains 1000 structurally complex images and many of them contain large foreground objects. PASCAL-S contains 850 images with complex foreground objects and cluttered background. SOD only contains 300 images. But it is very challenging. Because it was originally designed for image segmentation and many images are low contrast or contain complex foreground objects overlapping with the image boundary. Dataset
  • 13. Jaemin Jeong Seminar 13  Predict : 0 ~ 1  GT : 0 or 1 PR curve : set of precision-recall paris, threshold 0 ~ 1 of predicted pixels. Evaluation Metrics
  • 14. Jaemin Jeong Seminar 14 ROC-curve https://angeloyeo.github.io/2020/08/05/ROC.html
  • 15. Jaemin Jeong Seminar 15 𝐹𝛽 = 1 + 𝛽 2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙 𝛽2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 𝛽2 : 0.3 (importance of the precision value) report the maximum 𝐹𝛽(𝑚𝑎𝑥𝐹𝛽) for each dataset similar to previous works.... weighted F-measure 𝐹𝛽 𝑤 = 1 + 𝛽2 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑤 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙𝑤 𝛽2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑤 + 𝑅𝑒𝑐𝑎𝑙𝑙𝑤 overcoming the possible unfair comparison F-measure
  • 16. Jaemin Jeong Seminar 16  Mean Absolute Error 𝑀𝐴𝐸 = 1 𝐻 × 𝑊 𝑟=1 𝐻 𝑐=1 𝑊 𝑃 𝑟, 𝑐 − 𝐺(𝑟, 𝑐) MAE
  • 17. Jaemin Jeong Seminar 17  structure similarity of the predicted non-binary saliency map and the ground truth. 𝑆 = 1 − 𝛼 𝑆𝑟 + 𝛼 𝑆𝑜  𝑆𝑟 = weighted sum of region-aware  𝑆𝑜 = weighted sum of object-aware  𝛼 = 0.5 S-measure(𝑆𝑚)
  • 18. Jaemin Jeong Seminar 18  𝑟𝑒𝑙𝑎𝑥𝐹𝛽 𝑏 is utilized to quantitatively evaluate boundaries’ quality of the predicted saliency maps 𝑟𝑒𝑙𝑎𝑥 𝐹𝛽 = 1 + 𝛽 2 × 𝑟𝑒𝑙𝑎𝑥𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑏 × 𝑟𝑒𝑙𝑎𝑥𝑅𝑒𝑐𝑎𝑙𝑙𝑏 𝛽2 × 𝑟𝑒𝑙𝑎𝑥𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑏 + 𝑟𝑒𝑙𝑎𝑥𝑅𝑒𝑐𝑎𝑙𝑙𝑏 1. (예측값, 실측값)에서 0.5 임계값으로 각각의 𝑃𝑏𝑤가 얻어진다. 2. 𝑋𝑂𝑅(𝑃𝑏𝑤, 𝑃𝑒𝑟𝑑)를 계산한다. (𝑃𝑒𝑟𝑑는 𝑃𝑏𝑤의 eroded binary mask.) The definition of relaxed boundary precision (𝑟𝑒𝑙𝑎𝑥𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑏 ) is the fraction of predicted boundary pixels within a range of 𝜌 pixels from ground truth boundary pixels. The relaxed boundary recall (𝑟𝑒𝑙𝑎𝑥𝑅𝑒𝑐𝑎𝑙𝑙𝑏 ) is defined as the fraction of ground truth boundary pixels that are within 𝜌 pixels of predicted boundary pixels. (𝜌 : 3) Relax boundary F-measure
  • 19. Jaemin Jeong Seminar 19  first resized to 320×320 -> randomly flipped vertically -> cropped to 288×288.  Adam(initial learning rate lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight decay=0)  After 600k iterations (with a batch size of 12), the training loss converges and the whole training process takes about 120 hours.  Both training and testing are conducted on an eight-core, 16 threads PC with an AMD Ryzen 1800x 3.5 GHz CPU (32GB RAM) and a GTX 1080ti GPU (11GB memory). Implementation Detail
  • 20. Jaemin Jeong Seminar 20 Ablation on Blocks Naive Pyramid Pooling Module
  • 21. Jaemin Jeong Seminar 21 Results of ablation study on different blocks. encoder -> backbone Architecture Block
  • 22. Jaemin Jeong Seminar 22 Quantitative Comparison(PR-curve)
  • 23. Jaemin Jeong Seminar 23 Table - Comparison 1
  • 24. Jaemin Jeong Seminar 24 Table - Comparison 2
  • 25. Qualitative comparison small obj Large obj touch borders clean backgrond Large, thin clean backgrond detail clutter background complicated foreground multiple targets
  • 26. Jaemin Jeong Seminar 26  Proposed a novel deep network 𝑈2 -𝑁𝑒𝑡, for Saliency Object Detection.  Two-level nested U-structure.  newly designed RSU blocks.  Provide a full size 𝑈2 -Net (176.3 MB, 30 FPS) and a smaller size version 𝑈2 -Net† (4.7 MB, 40 FPS) in this paper.  Although our models achieve competitive results against other state-of-the-art methods, faster and smaller models are needed for computation and memory limited devices, such as mobile phones, robots, etc. Conclusion

Editor's Notes

  1. https://angeloyeo.github.io/2020/08/05/ROC.html