Anzeige

20. Mar 2023•0 gefällt mir## 0 gefällt mir

•4 Aufrufe## Aufrufe

Sei der Erste, dem dies gefällt

Mehr anzeigen

Aufrufe insgesamt

0

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

0

Downloaden Sie, um offline zu lesen

Melden

Präsentationen & Vorträge

ResNet, short for "Residual Network," is a type of deep neural network architecture that was introduced by Microsoft researchers in 2015. ResNet is designed to address the problem of vanishing gradients, which can occur in deep neural networks that are many layers deep. The main innovation in ResNet is the use of residual connections, also known as skip connections. These connections allow information from earlier layers of the network to bypass some of the later layers and be directly fed into the later layers. This helps to ensure that the gradient signal from the output can propagate back through the network during training, which can help to prevent the vanishing gradient problem. ResNet has been shown to be very effective at image recognition and other computer vision tasks. It has achieved state-of-the-art performance on a number of benchmark datasets, such as ImageNet. Since its introduction, many variations and improvements to the original ResNet architecture have been proposed, including ResNeXt, Wide ResNet, and Residual Attention Network (RANet).

YanhuaSiFolgen

Anzeige

Anzeige

Anzeige

Resnet.pptxYanhuaSi

Deep Residual Learning for Image RecognitionWilly Marroquin (WillyDevNET)

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya

Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya

Convolutional Neural Networks (CNN)Gaurav Mittal

Convolutional neural networksLearning Courses Online

- Learning with Purpose DEEP RESIDUAL NETWORKS Kaiming He et al, “Deep ResidualLearning for Image Recognition” Kaiming He et al, “Identity Mappingsin Deep ResidualNetworks” Andreas Veit et al, “ResidualNetworks Behave Like Ensembles of RelativelyShallowNetworks ”
- Learning with Purpose ResNet @ILSVRC & COCO 2015 Competitions 1st places in all five main tracks • ImageNet Classification: “Ultra-deep” 152-layer nets • ImageNet Detection: 16% better than 2nd • ImageNet Localization: 27% better than 2nd • COCO Detection: 11% better than 2nd • COCO Segmentation: 12% better than 2nd
- Learning with Purpose Evolution of Deep Networks ImageNet Classification Challenge Error rates by year ImageNet competition results show that the winning solutions have become deeper and deeper: from 8 layers in 2012 to 200+ layers in 2016.
- Learning with Purpose What Does Depth Mean? Deep Representation ability Forward(Data flow)
- Learning with Purpose
- Learning with Purpose What Does Depth Mean? Is learning betternetworks as easyas stacking more layers? Backward(Gradient flow)
- Learning with Purpose • The multiplying property of gradients causes the phenomenon • This can be addressed by: – Normalized Initialization – Batch Normalization – Appropriate activation function • Sigmoid(x) →ReLu(x) Gradient Vanishing
- Learning with Purpose • Plain networks on Cifar-10 Simply Stacking Layers? • Plain nets: stacking 3*3 conv layers… • 56-layer net has higher training error and test error than 20-layer net
- Learning with Purpose Performance Saturation/Degradation • Overly deep plain nets have higher training error • A general phenomenon, observed in many datasets.
- Learning with Purpose a shallower model (18 layers) a deeper counterpart (34 layers) • Richer solution space • A deeper model should not have higher training error • A solution by construction: • Original layers:copied from a trained shallower model • Extra layers:set as identity • At least the same trainingerror • Optimizationdifficulties:solvers cannot find the solutionwhen going deeper…
- Learning with Purpose • Keep it simple • Base on VGG Phylosophy – All 3*3 conv(almost) – Spatial size /2 => # filters*2 – Simple design; just deep! Network Design
- Learning with Purpose Resnet Can Be Deeper
- Learning with Purpose • Define H(x)=F(x)+x, the stacked weight layers try to approximate F(x) instead of H(x). Residual Learning Block If the optimal function is closer to an identity mapping, it should be easier for the solver to find the perturbations with reference to an identity mapping, than to learn the function as a new one ❑ Introduce neither extra parameter nor computation complexity ❑ Element-wise addition is performed on all feature maps
- Learning with Purpose • We turn the ReLu activation function after the addition into an identity mapping The Insight of Identity Mapping identity If f is also an identity mapping: x(l+1) ≡ yl
- Learning with Purpose • Any xl is directly forward-propagation to any xL, plus residual. • Any xl is additive outcome • In contrast to the multiplicity: Smooth Forward Propagation Plain network， Ignoring BN and ReLU
- Learning with Purpose • The gradient flow is also in the form of addition. • The gradient of any layer is unlikely to vanish • In contrast to the multiplicity: Smooth Backward Propagation
- Learning with Purpose What if Shortcut Mapping h(x)≠ Identity?
- Learning with Purpose If Scaling the Shortcut For an extremely deep network (L is large), if for all i, this factor can be exponentially large; If for all i, this factor can be exponentially small and vanish
- Learning with Purpose • The gating should increase the representation ability (parameter increases) • It’s the optimization rather than the representation dominates the results If Gating the Shortcut
- Learning with Purpose Results of Using Different Types of Shortcut Identity shortcut is the best
- Learning with Purpose Training curves on CIFAR-10 of various shortcuts Solid lines denote test error (y-axis on the right), and dashed lines denote trainingloss (y-axis on the left)
- Learning with Purpose On the Usage of Activation Functions Proposed
- Learning with Purpose Results of Experiments on Activation
- Learning with Purpose ReLu vs. ReLu+BN • BN could block propagation • Keep the shortest path as smooth as possible
- Learning with Purpose ReLu vs. Identity • ReLu could block propagation when the network is deep • Pre-activation ease the difficulty in optimization
- Learning with Purpose ImageNet Results
- Learning with Purpose Conclusion From He Keep the shortest path as smooth (clean) as possible By making h(x) and f(x) identity mapping Forward and backward signals directly flow this path Features of any layer is additive outcome 1000-layer ResNet can be easily trained and have better accuracy
- Learning with Purpose Further expansion of Residual network yl yl+1 fl() According to previous analysis, and we replace xl with yl and F with fl We further expand this expression by unrolling the recursion in terms of basic input y. A novel interpretationof residual networks
- Learning with Purpose Example of unrolling We take L=3 and l=0 for example of unrolling The dataflows along paths exponentiallyfrom input to output We infer that residual networks have 2^n paths
- Learning with Purpose Different from traditional Neural Network In traditional NN, each layer only depends on the previous layer In ResNet, data flows along many paths from input to output. Each path is a unique configuration of which residual module to enter and which to skip
- Learning with Purpose Deleting individual module in ResNet Deleting a layer in residual networks at test time (a) is equivalent to zeroing half of the paths. In ordinary feed-forward networks (b) such as VGG or AlexNet, deleting individual layers alters the only viable path from input to output.
- Learning with Purpose Deleting individual module in ResNet
- Learning with Purpose Deleting many modules in ResNet One key characteristic of ensembles is their smooth performance with respect to the number of members. When k residual modules are removed, the effective number of paths is reduced from 2^n to 2^(n- k) Error increases smoothly when randomly deleting several modules from a residual network
- Learning with Purpose Reordering moduals in ResNet Error also increases smoothly when re-ordering a residual network by shuffling building blocks. The degree of reordering is measured by the Kendall Tau correlation coefficient.
- Learning with Purpose Conclusion First, unraveled view reveals that residual networks can be viewed as a collection of many paths, instead of a single ultra deep network Second, lesion studies show that, although these paths are trained jointly, they do not strongly depend on each other.
- Learning with Purpose Thank you

Anzeige