SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Neural Architecture Search:
The Next Half Generation of Machine Learning
Speaker: Lingxi Xie (č°ĸ凌æ›Ļ)
Noah’s Ark Lab, Huawei Inc. (华ä¸ēč¯ēäēšæ–ščˆŸåŽžéĒŒåŽ¤)
Slides available at my homepage (TALKS)
Take-Home Messages
ī‚  Neural architecture search (NAS) is the future
ī‚  Deep learning makes feature learning automatic
ī‚  NAS makes deep learning automatic
ī‚  The future is approaching faster than we used to think!
ī‚  2017: NAS appears
ī‚  2018: NAS becomes approachable
ī‚  2019 and 2020: NAS will be mature and a standard technique
Outline
ī‚  Introduction
ī‚  Framework
ī‚  RepresentativeWork
ī‚  Our New Progress
ī‚  Future Directions
Outline
ī‚  Introduction
ī‚  Framework
ī‚  RepresentativeWork
ī‚  Our New Progress
ī‚  Future Directions
Introduction: Neural Architecture Search
ī‚  NeuralArchitecture Search (NAS)
ī‚  Instead of manually designing neural network architecture (e.g., AlexNet,VGGNet,
GoogLeNet, ResNet, DenseNet, etc.), exploring the possibility of discovering
unexplored architecture with automatic algorithms
ī‚  Why is NAS important?
ī‚  A step from manual model design to automatic model design (analogy: deep learning
vs. conventional approaches)
ī‚  Able to develop data-specific models
[Krizhevsky, 2012] A. Krizhevsky et al., ImageNetClassification with Deep Convolutional Neural Networks, NIPS,
2012.
[Simonyan, 2015] K. Simonyan et al.,Very Deep Convolutional Networks for Large-scale Image Recognition, ICLR,
2015.
[Szegedy, 2015] C. Szegedy et al., Going Deeper withConvolutions, CVPR, 2015.
[He, 2016] K. He et al., Deep Residual Learning for Image Recognition,CVPR, 2016.
[Huang, 2017] G. Huang et al., Densely Connected Convolutional Networks, CVPR, 2017.
Introduction: Examples and Comparison
ī‚  Model comparison: ResNet,GeNet, NASNet and ENASNet
[He, 2016] K. He et al., Deep Residual Learning for Image Recognition, CVPR, 2016.
[Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017.
[Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018.
[Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018.
Outline
ī‚  Introduction
ī‚  Framework
ī‚  RepresentativeWork
ī‚  Our New Progress
ī‚  RelatedApplications
ī‚  Future Directions
Framework:Trial and Update
ī‚  Almost all NAS algorithms are based on the “trial and update” framework
ī‚  Starting with a set of initial architectures (e.g., manually defined) as individuals
ī‚  Assuming that better architectures can be obtained by slight modification
ī‚  Applying different operations on the existing architectures
ī‚  Preserving the high-quality individuals and updating the individual pool
ī‚  Iterating till the end
ī‚  Three fundamental requirements
ī‚  The building blocks: defining the search space (dimensionality, complexity, etc.)
ī‚  The representation: defining the transition between individuals
ī‚  The evaluation method: determining if a generated individual is of high quality
Framework: Building Blocks
ī‚  Building blocks are like basic genes for these individuals
ī‚  Some examples here
ī‚  Genetic CNN: only 3 × 3 convolution is allowed to be searched (followed by default BN
and ReLU operations), 3 × 3 pooling is fixed
ī‚  NASNet: 13 operations shown below
ī‚  PNASNet: 8 operations, removing those
never-used ones from NASNet
ī‚  ENASNet: 6 operations
ī‚  DARTS: 8 operations
[Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017.
[Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018.
[Liu, 2018] C. Liu et al., Progressive NeuralArchitecture Search, ECCV, 2018.
[Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018.
[Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
Framework: Search
ī‚  Finding new individuals that have potentials to work better
ī‚  Heuristic search in the large space
ī‚  Two mainly applied methods: the genetic algorithm and reinforcement learning
ī‚  Both are heuristic algorithms applied to the scenarios of a large search space and
limited ability to explore every single element in the space
ī‚  A fundamental assumption: both of these heuristic algorithms can preserve good genes
and based on which discover possible improvements
ī‚  Also, it is possible to integrate architecture search to network optimization
ī‚  These algorithms are often much faster
[Real, 2017] E. Real et al., Large-Scale Evolution of ImageClassifiers, ICML, 2017.
[Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017.
[Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018.
[Liu, 2018] C. Liu et al., Progressive NeuralArchitecture Search, ECCV, 2018.
[Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018.
[Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
Framework: Evaluation
ī‚  Evaluation aims at determining which individuals are good and to be preserved
ī‚  Conventionally, this was often done by training a network from scratch
ī‚  This is extremely time-consuming, so researchers often train NAS on a small dataset
like CIFAR and then transfer the found architecture to larger datasets like ImageNet
ī‚  Even in this way, the training process is really slow: Genetic-CNN requires 17 GPU-days
for a single training process, and NAS-RL requires more than 20,000 GPU-days
ī‚  Efficient methods were proposed later
ī‚  Ideas include parameter sharing (without the need of re-training everything for each
new individual) and using a differentiable architecture (joint optimization)
ī‚  Now, an efficient search process on CIFAR can be reduced to a few GPU-hours, though
training the searched architecture on ImageNet is still time-consuming
[Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017.
[Zoph, 2017] B. Zoph et al., Neural Architecture Search with Reinforcement Learning, ICLR, 2017.
[Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018.
[Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
Outline
ī‚  Introduction
ī‚  Framework
ī‚  RepresentativeWork
ī‚  Our New Progress
ī‚  Future Directions
RepresentativeWork on NAS
ī‚  Evolution-based approaches
ī‚  Reinforcement-learning-based approaches
ī‚  Towards one-shot approaches
ī‚  Applications
Genetic CNN
ī‚  Only considering the connection between basic building blocks
ī‚  Encoding each network into a fixed-length binary string
ī‚  Standard operators:
mutation, crossover,
and selection
ī‚  Limited by computation
ī‚  Relatively low accuracy
[Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017.
Genetic CNN
ī‚  CIFAR10 experiments
ī‚  3 stages, 𝐾1, 𝐾2, 𝐾3 = 3,4,5 , đŋ = 19
ī‚  𝑁 = 20 (individuals), đŋ = 50 (rounds)
[Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017.
Gen # Max % Min % Avg % Med % St-D %
0 75.96 71.81 74.39 74.53 0.91
1 75.96 73.93 75.01 75.17 0.57
2 75.96 73.95 75.32 75.48 0.57
5 76.24 72.60 75.32 75.65 0.89
10 76.72 73.92 75.68 75.80 0.88
20 76.83 74.91 76.45 76.79 0.61
50 77.06 75.84 76.58 76.81 0.55
Figure: the
impact of
initialization
is ignorable
after a
sufficient
number of
rounds
Figure: (a)
parent(s)
with higher
recognition
accuracy are
more likely
to generate
child(ren)
with higher
quality
Genetic CNN
ī‚  Generalizing the best learned structures to other tasks
ī‚  The small datasets with deeper networks
0
1
2 3
4
0
1
2 3
4
5
0
1
2 3
4 5
6
0
1
2 3
4
0
1
2 3
4
5
0
1
2 3
4 5
6
Code: 1-01
Code: 1-01
Chain-Shaped
Networks
✓ AlexNet
✓ VGGNet
Code: 0-01-100
Code: 1-01-100
Code: 0-11-
101-0001
Code: 0-11-
101-0001
Multiple-Path
Networks
✓ GoogLeNet
Highway
Networks
✓ Deep ResNet
Network SVHN CF10 CF100
GeNet #1, after Gen. #0 2.25 8.18 31.46
GeNet #1, after Gen. #5 2.15 7.67 30.17
GeNet #1, after Gen. #20 2.05 7.36 29.63
GeNet #1, after Gen. #50 1.99 7.19 29.03
GeNet #2, after Gen. #50 1.97 7.10 29.05
Network ILSVRC2012, 1/5 Depth
19-layerVGGNet 28.7 9.9 19
GeNet #1, after Gen. #50 28.12 9.95 22
GeNet #2, after Gen. #50 27.87 9.74 22
Large-Scale Evolution of Image Classifiers
ī‚  Modifying the individuals with a pre-defined set of
operations, shown in the right part
ī‚  Larger networks work better
ī‚  Much larger computational overhead is used: 250
computers for hundreds of hours
ī‚  Take-home message: NAS requires careful design
and large computational costs
[Real, 2017] E. Real et al., Large-Scale Evolution of Image Classifiers, ICML, 2017.
Large-Scale Evolution of Image Classifiers
ī‚  The search progress
[Real, 2017] E. Real et al., Large-Scale Evolution of Image Classifiers, ICML, 2017.
RepresentativeWork on NAS
ī‚  Evolution-based approaches
ī‚  Reinforcement-learning-based approaches
ī‚  Towards one-shot approaches
ī‚  Applications
NAS with Reinforcement Learning
ī‚  Using reinforcement learning (RL)
to search over the large space
ī‚  The entire structure is generated by
an RL algorithm or an agent
ī‚  The validation accuracy serves as
feedback to train the agent’s policy
ī‚  Computational overhead is high
ī‚  800 GPUs for 28 days (CIFAR)
ī‚  No ImageNet experiments
ī‚  Superior accuracy to manually-
designed network architectures
[Zoph, 2017] B. Zoph et al., Neural Architecture Search with Reinforcement Learning, ICLR, 2017.
NAS Network
ī‚  Instead of the previous work that searched everything, this work only searched for
a limited number of basic building blocks
ī‚  The remaining part is mostly the same
ī‚  Computational overhead is still high
ī‚  500 GPUs for 4 days (CIFAR)
ī‚  Good ImageNet performance
[Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018.
Progressive NAS
ī‚  Instead of searching over the entire network (containing a few blocks), this work
added one block each time (progressive search)
ī‚  The best combinations are recorded for the next-stage search
ī‚  The efficiency of search is higher
ī‚  The remaining part is mostly the same
ī‚  Computational overhead is still high
ī‚  100 GPUs for 1.5 days (CIFAR)
ī‚  Better ImageNet performance
[Liu, 2018] C. Liu et al., Progressive NeuralArchitecture Search, ECCV, 2018.
Regularized Evolution
ī‚  Regularized evolution: assigning “aged”
individuals with a higher probability to be
eliminated
ī‚  Evolution works equally well or better than
RL algorithms
ī‚  Take-home message: evolutional algorithms
play an important role especially when the
computational budget is limited; also, the
conventional evolutional algorithms need to
be modified so as to fit the NAS task
[Real, 2019] E. Real et al., Regularized Evolution for ImageClassifierArchitecture Search, AAAI, 2019.
RepresentativeWork on NAS
ī‚  Evolution-based approaches
ī‚  Reinforcement-learning-based approaches
ī‚  Towards one-shot approaches
ī‚  Applications
Efficient NAS by NetworkTransformation
ī‚  Instead of training a new individual from scratch, this work reused the weights of
a prior network (expected to be similar to the current network), so that the
current training is more efficient
ī‚  Net2Net is used for initialization
ī‚  Operations: wider and deeper
ī‚  Much more efficient
ī‚  5 GPUs for 2 days (CIFAR)
ī‚  No ImageNet experiments
[Chen, 2015] T. Chen et al., Net2Net:Accelerating Learning via KnowledgeTransfer, ICLR, 2015.
[Cai, 2018] H. Cai et al., Efficient Architecture Search by NetworkTransformation, AAAI, 2018.
Efficient NAS via Parameter Sharing
ī‚  Instead of modifying network initialization, this work
goes one step forward by sharing parameters among
all generated networks
ī‚  Each training stage is much shorter
ī‚  Much more efficient
ī‚  1 GPU for 0.45 days (CIFAR)
ī‚  No ImageNet experiments
[Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018.
DifferentiableArchitecture Search
ī‚  With a fixed number of intermediate blocks, the operator applied to each state is
unknown in the beginning
ī‚  During the training process, the operator is formulated as a mixture model
ī‚  The learning goal is the mixture
coefficients (differentiable)
ī‚  In the end of training, the most
likely operator is kept, and the
entire network is trained again
ī‚  Much more efficient
ī‚  1 GPU for 4 days (CIFAR)
ī‚  Reasonable ImageNet results
(in the mobile setting)
[Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
DifferentiableArchitecture Search
ī‚  The best cell changes over time
[Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
Proxyless NAS
ī‚  The first NAS work that is directly optimized on ImageNet (ILSVRC2012)
ī‚  Learning weight parameters and binarized architectures simultaneously
ī‚  Close to Differentiable NAS
ī‚  Efficient
ī‚  1 GPU
for 8
days
ī‚  Reason-
able
perfor-
mance
(mobile)
[Cai, 2019] H. Cai et al., ProxylessNAS: Direct Neural Architecture Search onTargetTask and Hardware, ICLR, 2019.
Probabilistic NAS
ī‚  A new way to train a super-network
ī‚  Sampling sub-networks from a distribution
ī‚  Also able to perform proxyless architecture search
ī‚  Efficiency brought by flexible control of search time on each sub-network
ī‚  1 GPU for
0.2 days
ī‚  Accuracy
is a little bit
weak on
ImageNet
[Noy, 2019] F.P. Casale et al., Probabilistic Neural Architecture Search, arXiv preprint: 1902.05116, 2019.
Single-PathOne-Shot NAS
ī‚  Main idea: balancing the sampling probability of each path in one-shot search
ī‚  With the benefit of decoupling operations on each edge
ī‚  Bridging the gap between search and evaluation
ī‚  Modified search space
ī‚  Blocks based on ShuffleNet-v2
ī‚  Evolution-based search algorithm
ī‚  Channel number search
ī‚  Latency and FLOPs constraints
ī‚  Improved accuracy on single-shot
NAS
[Guo, 2019] Z. Guo et al., Single Path One-Shot Neural Architecture Search with Uniform Sampling, arXiv preprint:
1904.00420, 2019.
Architecture Search,Anneal and Prune
ī‚  Another effort to deal with the decoupling
issue of DARTS
ī‚  Decreasing the temperature term in computing
the probability added to each edge
ī‚  Pruning edges with low weights
ī‚  Gradually turning the architecture to one-path
ī‚  Efficiency brought by pruning
ī‚  1 GPU for 0.2 days
ī‚  Accuracy is still a little bit weak on ImageNet
[Noy, 2019] A. Noy et al., ASAP:Architecture Search, Anneal and Prune, arXiv preprint: 1904.04123, 2019.
RandomlyWired Neural Networks
ī‚  A more diverse set of connectivity patterns
ī‚  Connecting NAS and randomly wired neural networks
ī‚  An important insight: when the search
space is large enough, randomly wired
networks are almost as effective as
carefully searched architectures
ī‚  This does not reflect that NAS is useless,
but reveals that the current NAS methods
are not effective enough
[Xie, 2019] S. Xie et al., Exploring RandomlyWired Neural Networks for Image Recognition, arXiv preprint:
1904.01569, 2019.
RepresentativeWork on NAS
ī‚  Evolution-based approaches
ī‚  Reinforcement-learning-based approaches
ī‚  Towards one-shot approaches
ī‚  Applications
Auto-Deeplab
ī‚  A hierarchical architecture search space
ī‚  With both network-level and cell-level structures being investigated
ī‚  Differentiable search method (in order to accelerate)
ī‚  Similar performance to Deeplab-v3 (without pre-training)
[Liu, 2019] C. Liu et al., Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation,
CVPR, 2019.
NAS-FPN
ī‚  Searching for the feature pyramid network
ī‚  Reinforcement-learning-based search
ī‚  Good performance on MS-COCO
ī‚  Improving mobile detection accuracy by 2% AP
compared to SSDLite on MobileNetV2
ī‚  Achieving 48.3% AP, surpassing state-of-the-arts
[Ghaisi, 2019] G. Ghaisi et al., NAS-FPN: Learning Scalable Feature PyramidArchitecturefor Object Detection,
CVPR, 2019.
Auto-ReID
ī‚  A search space with part-aware module
ī‚  Using both ranking and classification loss
ī‚  Differentiable search
ī‚  State-0f-the-art performance on ReID
[Quan, 2019] R. Quan et al., Auto-ReID:Searching for a Part-aware ConvNet for Person Re-Identification, arXiv
preprint: 1903.09776, 2019.
GraphNAS
ī‚  A search space containing components of
GNN layers
ī‚  RL-based search algorithm
ī‚  A modified parameter sharing scheme
ī‚  Surpassing manually designed GNN architectures
[Gao, 2019] Y.Gao et al., GraphNAS:Graph NeuralArchitecture Search with ReinforcementLearning, arXiv
preprint: 1904.09981, 2019.
V-NAS
ī‚  Medical image segmentation
ī‚  Volumetric convolution required
ī‚  Searching volumetric convolution
ī‚  2D conv, 3D conv and P3D conv
ī‚  Differentiable search algorithm
ī‚  Outperforming state-of-the-arts
[Zhu, 2019] Z. Zhu et al.,V-NAS: NeuralArchitecture Search forVolumetric Medical Image Segmentation, arXiv
preprint: 1906.02817, 2019.
AutoAugment
ī‚  Learning hyper-parameters
ī‚  Search Space: Shear-X/Y,Translate-X/Y,
Rotate, AutoContrast, Invert, etc.
ī‚  Reinforcement-learning-based search
ī‚  Impressive performance on a few standard
image classification benchmarks
ī‚  Transferring to other tasks, e.g., NAS-FPN
[Cubuk, 2019] E. Cubuk et al., AutoAugment: Learning AugmentationStrategies from Data, CVPR, 2019.
MoreWork forYour Reference
ī‚  https://github.com/markdtw/awesome-architecture-search
Outline
ī‚  Introduction
ī‚  Framework
ī‚  RepresentativeWork
ī‚  Our New Progress
ī‚  Future Directions
P-DARTS: Overview
ī‚  We start with the drawbacks of DARTS
ī‚  There is a depth gap between search and evaluation
ī‚  The search process is not stable: multiple runs, different results
ī‚  The search process is not likely to transfer: only able to work on CIFAR10
ī‚  We proposed a new approach named Progressive DARTS
ī‚  A multi-stage search progress which gradually increases the search depth
ī‚  Two useful techniques: search space approximation and search space regularization
ī‚  We obtained nice results
ī‚  SOTA accuracy by the searched networks on CIFAR10/CIFAR100 and ImageNet
ī‚  Search cost as small as 0.3 GPU-days (one single GPU, 7 hours)
[Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
[Chen, 2019] X.Chen et al., Progressive Differentiable Architecture Search: Bridging the DepthGap between
Search and Evaluation, submitted, 2019.
P-DARTS: Motivation
ī‚  The depth gap and why it is important
DARTS: CIFAR10 test error 2.83%
8
cells
20
cells
search evaluation
P-DARTS: CIFAR10 test error 2.55%
5
cells
20
cells
search evaluation
11
cells
17
cells
P-DARTS: Search Space Approximation
ī‚  The progressive way of increasing search depth
P-DARTS: Search Space Regularization
ī‚  Problem: the strange behavior of skip-connect
ī‚  Searching on a deep network leads to many skip-connect operations (poor results)
ī‚  Reasons?
ī‚  On the one hand, skip-connect often leads to fastest gradient descent
ī‚  On the other hand, skip-connect does not have parameters and so leads to bad results
ī‚  Solution: regularization
ī‚  Adding a Dropout after each skip-connect, dedaying the rate during search
ī‚  Preserving a fixed number of skip-connect after the entire search
ī‚  Results
Dropout on skip-c Testing Error, 2 SC Testing Error, 3 SC Testing Error, 4 SC
with Dropout 2.93% 3.28% 3.51%
without Dropout 2.69% 2.84% 2.97%
P-DARTS: Performance on CIFAR10/100
ī‚  CIFAR10 andCIFAR100 (a useful enhancement:Cutout)
[DeVries, 2017] T. DeVries et al., Improved Regularization of Convolutional Neural Networks with Cutout, arXiv
1708.04552, 2017.
P-DARTS: Performance on ImageNet
ī‚  ImageNet (ILSVRC2012) under the Mobile Setting
P-DARTS: Searched Cells
ī‚  Searched architectures (verification of depth gap!)
P-DARTS: Summary
ī‚  The depth gap needs to be solved
ī‚  Different properties of networks with different depths
ī‚  Depth is still the key issue in deep learning
ī‚  Our approach
ī‚  State-of-the-art results on both CIFAR10/100 and ImageNet
ī‚  Search cost as small as 0.3 GPU-days
ī‚  Future directions
ī‚  Directly searching on ImageNet
ī‚  There are many unsolved issues on NAS!
PC-DARTS:A More Powerful Approach
ī‚  We still build our approach upon DARTS
ī‚  We proposed a new approach named Partially-Connected DARTS
ī‚  An alternative approach to deal with the over-fitting issue of DARTS
ī‚  Using partial channel connection as regularization
ī‚  This method is even more stable, which can be directly searched on ImageNet
ī‚  We obtained nice results
ī‚  SOTA accuracy by the searched networks on ImageNet
ī‚  Search cost as small as 0.06 GPU-days (one single GPU, 1.5 hours) on CIFAR10/100, or 4
GPU-days (8 GPUs, 11.5 hours) on ImageNet
[Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
[Xu, 2019] Y. Xu et al., PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture
Search, submitted, 2019.
PC-DARTS: Illustration
ī‚  Partial channel connection and edge normalization
PC-DARTS: Performance on ImageNet
ī‚  ImageNet (ILSVRC2012) under the Mobile Setting
PC-DARTS: Summary
ī‚  Regularization is still a big issue
ī‚  Partial channel connection in order to prevent over-fitting
ī‚  Edge normalization in order to make partial channel connection work more stable
ī‚  Our approach
ī‚  State-of-the-art results on ImageNet
ī‚  Search cost as small as 0.06 GPU-days
ī‚  Future directions
ī‚  Searching on a larger number of classes
ī‚  There are many unsolved issues on NAS!
Outline
ī‚  Introduction
ī‚  Framework
ī‚  RepresentativeWork
ī‚  Our New Progress
ī‚  Future Directions
Conclusions
ī‚  NAS is a promising and important trend for machine learning in the future
ī‚  NAS vs. fixed architectures as deep learning vs. conventional handcrafted features
ī‚  Two important factors of NAS to be determined
ī‚  Basic building blocks: fixed or learnable
ī‚  The way of exploring the search space: genetic algorithm, reinforcement learning, or
joint optimization
ī‚  The importance of computational power is reduced, but still significant
RelatedApplications
ī‚  The searched architectures were verified effective for transfer learning tasks
ī‚  NASNet outperformed ResNet101 in object detection by 4%
ī‚  Take-home message: stronger architectures are often transferrable
ī‚  The ability of NAS in other vision tasks
ī‚  Preliminary success in semantic segmentation, object detection, etc.
[Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018.
[Chen, 2018] L. Chen et al., Searching for Efficient Multi-ScaleArchitectures for Dense Image Prediction, NIPS,
2018.
[Liu, 2019] C. Liu et al., Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation,
CVPR, 2019.
[Ghiasi, 2019] G. Ghiasi et al., NAS-FPN: Learning Scalable Feature PyramidArchitecture for Object Detection,
CVPR, 2019.
Future Directions
ī‚  Currently, the search space is constrained by the limited types of building blocks
ī‚  It is not guaranteed that the current building blocks are optimal
ī‚  It remains to explore the possibility of searching into the building blocks
ī‚  Currently, the searched architectures are not friendly to hardware
ī‚  Which leads to dramatically slow speed in network training
ī‚  Currently, the searched architectures are task-specific
ī‚  This may not be a problem, but an ideal vision system should be generalized
ī‚  Currently, the searching process is not yet stable
ī‚  We desire a framework as generalized as regular deep networks
Thanks
ī‚  Questions, please?
ī‚  Contact me for collaboration and internship â˜ē

Weitere ähnliche Inhalte

Was ist angesagt?

Survey of Attention mechanism
Survey of Attention mechanismSurvey of Attention mechanism
Survey of Attention mechanismSwatiNarkhede1
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work IIMohamed Loey
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Artificial nueral network slideshare
Artificial nueral network slideshareArtificial nueral network slideshare
Artificial nueral network slideshareRed Innovators
 
Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learningKaty Lee
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning PresentationAkshayaNagarajan10
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You NeedIllia Polosukhin
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IWanjin Yu
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchBill Liu
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 

Was ist angesagt? (20)

Survey of Attention mechanism
Survey of Attention mechanismSurvey of Attention mechanism
Survey of Attention mechanism
 
Deep Learning - Overview of my work II
Deep Learning - Overview of my work IIDeep Learning - Overview of my work II
Deep Learning - Overview of my work II
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Artificial nueral network slideshare
Artificial nueral network slideshareArtificial nueral network slideshare
Artificial nueral network slideshare
 
Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
AlexNet
AlexNetAlexNet
AlexNet
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Associative memory network
Associative memory networkAssociative memory network
Associative memory network
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 

Ähnlich wie Architecture Design for Deep Neural Networks III

Multilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingMultilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingSurbhi Bhosale
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentationTunde Ajose-Ismail
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksYoonho Lee
 
Emr a scalable graph based ranking model for content-based image retrieval
Emr a scalable graph based ranking model for content-based image retrievalEmr a scalable graph based ranking model for content-based image retrieval
Emr a scalable graph based ranking model for content-based image retrievalPvrtechnologies Nellore
 
deeplearningpresentation-180625071236.pptx
deeplearningpresentation-180625071236.pptxdeeplearningpresentation-180625071236.pptx
deeplearningpresentation-180625071236.pptxJeetDesai14
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
 
Research Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and ScienceResearch Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and Scienceresearchinventy
 
How well do self-supervised models transfer.pptx
How well do self-supervised models transfer.pptxHow well do self-supervised models transfer.pptx
How well do self-supervised models transfer.pptxssuserbafbd0
 
Multivariate feature descriptor based cbir model to query large image databases
Multivariate feature descriptor based cbir model to query large image databasesMultivariate feature descriptor based cbir model to query large image databases
Multivariate feature descriptor based cbir model to query large image databasesIJARIIT
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.bhavinecindus
 
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
EMR: A Scalable Graph-based Ranking Model for Content-based Image RetrievalEMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval1crore projects
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learningVaibhav Singh
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningSergey Karayev
 
A Graph-based Web Image Annotation for Large Scale Image Retrieval
A Graph-based Web Image Annotation for Large Scale Image RetrievalA Graph-based Web Image Annotation for Large Scale Image Retrieval
A Graph-based Web Image Annotation for Large Scale Image RetrievalIRJET Journal
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 
Automating Software Development Using Artificial Intelligence (AI)
Automating Software Development Using Artificial Intelligence (AI)Automating Software Development Using Artificial Intelligence (AI)
Automating Software Development Using Artificial Intelligence (AI)Jeremy Bradbury
 
Automatic Learning Image Objects via Incremental Model
Automatic Learning Image Objects via Incremental ModelAutomatic Learning Image Objects via Incremental Model
Automatic Learning Image Objects via Incremental ModelIOSR Journals
 
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...Simone Ercoli
 
Perspective on HPC-enabled AI
Perspective on HPC-enabled AIPerspective on HPC-enabled AI
Perspective on HPC-enabled AIinside-BigData.com
 

Ähnlich wie Architecture Design for Deep Neural Networks III (20)

Multilabel Image Retreval Using Hashing
Multilabel Image Retreval Using HashingMultilabel Image Retreval Using Hashing
Multilabel Image Retreval Using Hashing
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
 
Emr a scalable graph based ranking model for content-based image retrieval
Emr a scalable graph based ranking model for content-based image retrievalEmr a scalable graph based ranking model for content-based image retrieval
Emr a scalable graph based ranking model for content-based image retrieval
 
deeplearningpresentation-180625071236.pptx
deeplearningpresentation-180625071236.pptxdeeplearningpresentation-180625071236.pptx
deeplearningpresentation-180625071236.pptx
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...
 
Research Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and ScienceResearch Inventy: International Journal of Engineering and Science
Research Inventy: International Journal of Engineering and Science
 
How well do self-supervised models transfer.pptx
How well do self-supervised models transfer.pptxHow well do self-supervised models transfer.pptx
How well do self-supervised models transfer.pptx
 
Multivariate feature descriptor based cbir model to query large image databases
Multivariate feature descriptor based cbir model to query large image databasesMultivariate feature descriptor based cbir model to query large image databases
Multivariate feature descriptor based cbir model to query large image databases
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
 
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
EMR: A Scalable Graph-based Ranking Model for Content-based Image RetrievalEMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
 
A Graph-based Web Image Annotation for Large Scale Image Retrieval
A Graph-based Web Image Annotation for Large Scale Image RetrievalA Graph-based Web Image Annotation for Large Scale Image Retrieval
A Graph-based Web Image Annotation for Large Scale Image Retrieval
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 
Automating Software Development Using Artificial Intelligence (AI)
Automating Software Development Using Artificial Intelligence (AI)Automating Software Development Using Artificial Intelligence (AI)
Automating Software Development Using Artificial Intelligence (AI)
 
Automatic Learning Image Objects via Incremental Model
Automatic Learning Image Objects via Incremental ModelAutomatic Learning Image Objects via Incremental Model
Automatic Learning Image Objects via Incremental Model
 
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
 
Perspective on HPC-enabled AI
Perspective on HPC-enabled AIPerspective on HPC-enabled AI
Perspective on HPC-enabled AI
 

Mehr von Wanjin Yu

Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia RecommendationWanjin Yu
 
Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIWanjin Yu
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learningWanjin Yu
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportationWanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIWanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIWanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IWanjin Yu
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering IIWanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 

Mehr von Wanjin Yu (15)

Intelligent Multimedia Recommendation
Intelligent Multimedia RecommendationIntelligent Multimedia Recommendation
Intelligent Multimedia Recommendation
 
Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks II
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
 
Visual Search and Question Answering II
Visual Search and Question Answering IIVisual Search and Question Answering II
Visual Search and Question Answering II
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 

KÃŧrzlich hochgeladen

Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
办į†å¤šäŧĻ多大å­Ļæ¯•ä¸šč¯æˆįģŠå•|č´­äš°åŠ æ‹ŋ大UTSGæ–‡å‡­č¯äšĻ
办į†å¤šäŧĻ多大å­Ļæ¯•ä¸šč¯æˆįģŠå•|č´­äš°åŠ æ‹ŋ大UTSGæ–‡å‡­č¯äšĻ办į†å¤šäŧĻ多大å­Ļæ¯•ä¸šč¯æˆįģŠå•|č´­äš°åŠ æ‹ŋ大UTSGæ–‡å‡­č¯äšĻ
办į†å¤šäŧĻ多大å­Ļæ¯•ä¸šč¯æˆįģŠå•|č´­äš°åŠ æ‹ŋ大UTSGæ–‡å‡­č¯äšĻzdzoqco
 
『æžŗæ´˛æ–‡å‡­ã€äš°æ‹‰į­šäŧ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛LTU文凭å­ĻäŊč¯äšĻ
『æžŗæ´˛æ–‡å‡­ã€äš°æ‹‰į­šäŧ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛LTU文凭å­ĻäŊč¯äšĻ『æžŗæ´˛æ–‡å‡­ã€äš°æ‹‰į­šäŧ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛LTU文凭å­ĻäŊč¯äšĻ
『æžŗæ´˛æ–‡å‡­ã€äš°æ‹‰į­šäŧ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛LTU文凭å­ĻäŊč¯äšĻrnrncn29
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
『æžŗæ´˛æ–‡å‡­ã€äš°čŠšå§†åŖĢåē“克大å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛JCU文凭å­ĻäŊč¯äšĻ
『æžŗæ´˛æ–‡å‡­ã€äš°čŠšå§†åŖĢåē“克大å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛JCU文凭å­ĻäŊč¯äšĻ『æžŗæ´˛æ–‡å‡­ã€äš°čŠšå§†åŖĢåē“克大å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛JCU文凭å­ĻäŊč¯äšĻ
『æžŗæ´˛æ–‡å‡­ã€äš°čŠšå§†åŖĢåē“克大å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛JCU文凭å­ĻäŊč¯äšĻrnrncn29
 
Potsdam FHå­ĻäŊč¯,æŗĸ茨åĻåē”į”¨æŠ€æœ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ1:1åˆļäŊœ
Potsdam FHå­ĻäŊč¯,æŗĸ茨åĻåē”į”¨æŠ€æœ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ1:1åˆļäŊœPotsdam FHå­ĻäŊč¯,æŗĸ茨åĻåē”į”¨æŠ€æœ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ1:1åˆļäŊœ
Potsdam FHå­ĻäŊč¯,æŗĸ茨åĻåē”į”¨æŠ€æœ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ1:1åˆļäŊœys8omjxb
 
办į†(UofRæ¯•ä¸šč¯äšĻ)įŊ—切斯į‰šå¤§å­Ļæ¯•ä¸šč¯æˆįģŠå•åŽŸį‰ˆä¸€æ¯”一
办į†(UofRæ¯•ä¸šč¯äšĻ)įŊ—切斯į‰šå¤§å­Ļæ¯•ä¸šč¯æˆįģŠå•åŽŸį‰ˆä¸€æ¯”一办į†(UofRæ¯•ä¸šč¯äšĻ)įŊ—切斯į‰šå¤§å­Ļæ¯•ä¸šč¯æˆįģŠå•åŽŸį‰ˆä¸€æ¯”一
办į†(UofRæ¯•ä¸šč¯äšĻ)įŊ—切斯į‰šå¤§å­Ļæ¯•ä¸šč¯æˆįģŠå•åŽŸį‰ˆä¸€æ¯”一z xss
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 

KÃŧrzlich hochgeladen (17)

young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
办į†å¤šäŧĻ多大å­Ļæ¯•ä¸šč¯æˆįģŠå•|č´­äš°åŠ æ‹ŋ大UTSGæ–‡å‡­č¯äšĻ
办į†å¤šäŧĻ多大å­Ļæ¯•ä¸šč¯æˆįģŠå•|č´­äš°åŠ æ‹ŋ大UTSGæ–‡å‡­č¯äšĻ办į†å¤šäŧĻ多大å­Ļæ¯•ä¸šč¯æˆįģŠå•|č´­äš°åŠ æ‹ŋ大UTSGæ–‡å‡­č¯äšĻ
办į†å¤šäŧĻ多大å­Ļæ¯•ä¸šč¯æˆįģŠå•|č´­äš°åŠ æ‹ŋ大UTSGæ–‡å‡­č¯äšĻ
 
『æžŗæ´˛æ–‡å‡­ã€äš°æ‹‰į­šäŧ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛LTU文凭å­ĻäŊč¯äšĻ
『æžŗæ´˛æ–‡å‡­ã€äš°æ‹‰į­šäŧ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛LTU文凭å­ĻäŊč¯äšĻ『æžŗæ´˛æ–‡å‡­ã€äš°æ‹‰į­šäŧ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛LTU文凭å­ĻäŊč¯äšĻ
『æžŗæ´˛æ–‡å‡­ã€äš°æ‹‰į­šäŧ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛LTU文凭å­ĻäŊč¯äšĻ
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
『æžŗæ´˛æ–‡å‡­ã€äš°čŠšå§†åŖĢåē“克大å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛JCU文凭å­ĻäŊč¯äšĻ
『æžŗæ´˛æ–‡å‡­ã€äš°čŠšå§†åŖĢåē“克大å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛JCU文凭å­ĻäŊč¯äšĻ『æžŗæ´˛æ–‡å‡­ã€äš°čŠšå§†åŖĢåē“克大å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛JCU文凭å­ĻäŊč¯äšĻ
『æžŗæ´˛æ–‡å‡­ã€äš°čŠšå§†åŖĢåē“克大å­Ļæ¯•ä¸šč¯äšĻ成įģŠå•åŠžį†æžŗæ´˛JCU文凭å­ĻäŊč¯äšĻ
 
Potsdam FHå­ĻäŊč¯,æŗĸ茨åĻåē”į”¨æŠ€æœ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ1:1åˆļäŊœ
Potsdam FHå­ĻäŊč¯,æŗĸ茨åĻåē”į”¨æŠ€æœ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ1:1åˆļäŊœPotsdam FHå­ĻäŊč¯,æŗĸ茨åĻåē”į”¨æŠ€æœ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ1:1åˆļäŊœ
Potsdam FHå­ĻäŊč¯,æŗĸ茨åĻåē”į”¨æŠ€æœ¯å¤§å­Ļæ¯•ä¸šč¯äšĻ1:1åˆļäŊœ
 
办į†(UofRæ¯•ä¸šč¯äšĻ)įŊ—切斯į‰šå¤§å­Ļæ¯•ä¸šč¯æˆįģŠå•åŽŸį‰ˆä¸€æ¯”一
办į†(UofRæ¯•ä¸šč¯äšĻ)įŊ—切斯į‰šå¤§å­Ļæ¯•ä¸šč¯æˆįģŠå•åŽŸį‰ˆä¸€æ¯”一办į†(UofRæ¯•ä¸šč¯äšĻ)įŊ—切斯į‰šå¤§å­Ļæ¯•ä¸šč¯æˆįģŠå•åŽŸį‰ˆä¸€æ¯”一
办į†(UofRæ¯•ä¸šč¯äšĻ)įŊ—切斯į‰šå¤§å­Ļæ¯•ä¸šč¯æˆįģŠå•åŽŸį‰ˆä¸€æ¯”一
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 

Architecture Design for Deep Neural Networks III

  • 1. Neural Architecture Search: The Next Half Generation of Machine Learning Speaker: Lingxi Xie (č°ĸ凌æ›Ļ) Noah’s Ark Lab, Huawei Inc. (华ä¸ēč¯ēäēšæ–ščˆŸåŽžéĒŒåŽ¤) Slides available at my homepage (TALKS)
  • 2. Take-Home Messages ī‚  Neural architecture search (NAS) is the future ī‚  Deep learning makes feature learning automatic ī‚  NAS makes deep learning automatic ī‚  The future is approaching faster than we used to think! ī‚  2017: NAS appears ī‚  2018: NAS becomes approachable ī‚  2019 and 2020: NAS will be mature and a standard technique
  • 3. Outline ī‚  Introduction ī‚  Framework ī‚  RepresentativeWork ī‚  Our New Progress ī‚  Future Directions
  • 4. Outline ī‚  Introduction ī‚  Framework ī‚  RepresentativeWork ī‚  Our New Progress ī‚  Future Directions
  • 5. Introduction: Neural Architecture Search ī‚  NeuralArchitecture Search (NAS) ī‚  Instead of manually designing neural network architecture (e.g., AlexNet,VGGNet, GoogLeNet, ResNet, DenseNet, etc.), exploring the possibility of discovering unexplored architecture with automatic algorithms ī‚  Why is NAS important? ī‚  A step from manual model design to automatic model design (analogy: deep learning vs. conventional approaches) ī‚  Able to develop data-specific models [Krizhevsky, 2012] A. Krizhevsky et al., ImageNetClassification with Deep Convolutional Neural Networks, NIPS, 2012. [Simonyan, 2015] K. Simonyan et al.,Very Deep Convolutional Networks for Large-scale Image Recognition, ICLR, 2015. [Szegedy, 2015] C. Szegedy et al., Going Deeper withConvolutions, CVPR, 2015. [He, 2016] K. He et al., Deep Residual Learning for Image Recognition,CVPR, 2016. [Huang, 2017] G. Huang et al., Densely Connected Convolutional Networks, CVPR, 2017.
  • 6. Introduction: Examples and Comparison ī‚  Model comparison: ResNet,GeNet, NASNet and ENASNet [He, 2016] K. He et al., Deep Residual Learning for Image Recognition, CVPR, 2016. [Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017. [Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018. [Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018.
  • 7. Outline ī‚  Introduction ī‚  Framework ī‚  RepresentativeWork ī‚  Our New Progress ī‚  RelatedApplications ī‚  Future Directions
  • 8. Framework:Trial and Update ī‚  Almost all NAS algorithms are based on the “trial and update” framework ī‚  Starting with a set of initial architectures (e.g., manually defined) as individuals ī‚  Assuming that better architectures can be obtained by slight modification ī‚  Applying different operations on the existing architectures ī‚  Preserving the high-quality individuals and updating the individual pool ī‚  Iterating till the end ī‚  Three fundamental requirements ī‚  The building blocks: defining the search space (dimensionality, complexity, etc.) ī‚  The representation: defining the transition between individuals ī‚  The evaluation method: determining if a generated individual is of high quality
  • 9. Framework: Building Blocks ī‚  Building blocks are like basic genes for these individuals ī‚  Some examples here ī‚  Genetic CNN: only 3 × 3 convolution is allowed to be searched (followed by default BN and ReLU operations), 3 × 3 pooling is fixed ī‚  NASNet: 13 operations shown below ī‚  PNASNet: 8 operations, removing those never-used ones from NASNet ī‚  ENASNet: 6 operations ī‚  DARTS: 8 operations [Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017. [Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018. [Liu, 2018] C. Liu et al., Progressive NeuralArchitecture Search, ECCV, 2018. [Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018. [Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
  • 10. Framework: Search ī‚  Finding new individuals that have potentials to work better ī‚  Heuristic search in the large space ī‚  Two mainly applied methods: the genetic algorithm and reinforcement learning ī‚  Both are heuristic algorithms applied to the scenarios of a large search space and limited ability to explore every single element in the space ī‚  A fundamental assumption: both of these heuristic algorithms can preserve good genes and based on which discover possible improvements ī‚  Also, it is possible to integrate architecture search to network optimization ī‚  These algorithms are often much faster [Real, 2017] E. Real et al., Large-Scale Evolution of ImageClassifiers, ICML, 2017. [Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017. [Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018. [Liu, 2018] C. Liu et al., Progressive NeuralArchitecture Search, ECCV, 2018. [Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018. [Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
  • 11. Framework: Evaluation ī‚  Evaluation aims at determining which individuals are good and to be preserved ī‚  Conventionally, this was often done by training a network from scratch ī‚  This is extremely time-consuming, so researchers often train NAS on a small dataset like CIFAR and then transfer the found architecture to larger datasets like ImageNet ī‚  Even in this way, the training process is really slow: Genetic-CNN requires 17 GPU-days for a single training process, and NAS-RL requires more than 20,000 GPU-days ī‚  Efficient methods were proposed later ī‚  Ideas include parameter sharing (without the need of re-training everything for each new individual) and using a differentiable architecture (joint optimization) ī‚  Now, an efficient search process on CIFAR can be reduced to a few GPU-hours, though training the searched architecture on ImageNet is still time-consuming [Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017. [Zoph, 2017] B. Zoph et al., Neural Architecture Search with Reinforcement Learning, ICLR, 2017. [Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018. [Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
  • 12. Outline ī‚  Introduction ī‚  Framework ī‚  RepresentativeWork ī‚  Our New Progress ī‚  Future Directions
  • 13. RepresentativeWork on NAS ī‚  Evolution-based approaches ī‚  Reinforcement-learning-based approaches ī‚  Towards one-shot approaches ī‚  Applications
  • 14. Genetic CNN ī‚  Only considering the connection between basic building blocks ī‚  Encoding each network into a fixed-length binary string ī‚  Standard operators: mutation, crossover, and selection ī‚  Limited by computation ī‚  Relatively low accuracy [Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017.
  • 15. Genetic CNN ī‚  CIFAR10 experiments ī‚  3 stages, 𝐾1, 𝐾2, 𝐾3 = 3,4,5 , đŋ = 19 ī‚  𝑁 = 20 (individuals), đŋ = 50 (rounds) [Xie, 2017] L. Xie et al., Genetic CNN, ICCV, 2017. Gen # Max % Min % Avg % Med % St-D % 0 75.96 71.81 74.39 74.53 0.91 1 75.96 73.93 75.01 75.17 0.57 2 75.96 73.95 75.32 75.48 0.57 5 76.24 72.60 75.32 75.65 0.89 10 76.72 73.92 75.68 75.80 0.88 20 76.83 74.91 76.45 76.79 0.61 50 77.06 75.84 76.58 76.81 0.55 Figure: the impact of initialization is ignorable after a sufficient number of rounds Figure: (a) parent(s) with higher recognition accuracy are more likely to generate child(ren) with higher quality
  • 16. Genetic CNN ī‚  Generalizing the best learned structures to other tasks ī‚  The small datasets with deeper networks 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 4 5 6 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 4 5 6 Code: 1-01 Code: 1-01 Chain-Shaped Networks ✓ AlexNet ✓ VGGNet Code: 0-01-100 Code: 1-01-100 Code: 0-11- 101-0001 Code: 0-11- 101-0001 Multiple-Path Networks ✓ GoogLeNet Highway Networks ✓ Deep ResNet Network SVHN CF10 CF100 GeNet #1, after Gen. #0 2.25 8.18 31.46 GeNet #1, after Gen. #5 2.15 7.67 30.17 GeNet #1, after Gen. #20 2.05 7.36 29.63 GeNet #1, after Gen. #50 1.99 7.19 29.03 GeNet #2, after Gen. #50 1.97 7.10 29.05 Network ILSVRC2012, 1/5 Depth 19-layerVGGNet 28.7 9.9 19 GeNet #1, after Gen. #50 28.12 9.95 22 GeNet #2, after Gen. #50 27.87 9.74 22
  • 17. Large-Scale Evolution of Image Classifiers ī‚  Modifying the individuals with a pre-defined set of operations, shown in the right part ī‚  Larger networks work better ī‚  Much larger computational overhead is used: 250 computers for hundreds of hours ī‚  Take-home message: NAS requires careful design and large computational costs [Real, 2017] E. Real et al., Large-Scale Evolution of Image Classifiers, ICML, 2017.
  • 18. Large-Scale Evolution of Image Classifiers ī‚  The search progress [Real, 2017] E. Real et al., Large-Scale Evolution of Image Classifiers, ICML, 2017.
  • 19. RepresentativeWork on NAS ī‚  Evolution-based approaches ī‚  Reinforcement-learning-based approaches ī‚  Towards one-shot approaches ī‚  Applications
  • 20. NAS with Reinforcement Learning ī‚  Using reinforcement learning (RL) to search over the large space ī‚  The entire structure is generated by an RL algorithm or an agent ī‚  The validation accuracy serves as feedback to train the agent’s policy ī‚  Computational overhead is high ī‚  800 GPUs for 28 days (CIFAR) ī‚  No ImageNet experiments ī‚  Superior accuracy to manually- designed network architectures [Zoph, 2017] B. Zoph et al., Neural Architecture Search with Reinforcement Learning, ICLR, 2017.
  • 21. NAS Network ī‚  Instead of the previous work that searched everything, this work only searched for a limited number of basic building blocks ī‚  The remaining part is mostly the same ī‚  Computational overhead is still high ī‚  500 GPUs for 4 days (CIFAR) ī‚  Good ImageNet performance [Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018.
  • 22. Progressive NAS ī‚  Instead of searching over the entire network (containing a few blocks), this work added one block each time (progressive search) ī‚  The best combinations are recorded for the next-stage search ī‚  The efficiency of search is higher ī‚  The remaining part is mostly the same ī‚  Computational overhead is still high ī‚  100 GPUs for 1.5 days (CIFAR) ī‚  Better ImageNet performance [Liu, 2018] C. Liu et al., Progressive NeuralArchitecture Search, ECCV, 2018.
  • 23. Regularized Evolution ī‚  Regularized evolution: assigning “aged” individuals with a higher probability to be eliminated ī‚  Evolution works equally well or better than RL algorithms ī‚  Take-home message: evolutional algorithms play an important role especially when the computational budget is limited; also, the conventional evolutional algorithms need to be modified so as to fit the NAS task [Real, 2019] E. Real et al., Regularized Evolution for ImageClassifierArchitecture Search, AAAI, 2019.
  • 24. RepresentativeWork on NAS ī‚  Evolution-based approaches ī‚  Reinforcement-learning-based approaches ī‚  Towards one-shot approaches ī‚  Applications
  • 25. Efficient NAS by NetworkTransformation ī‚  Instead of training a new individual from scratch, this work reused the weights of a prior network (expected to be similar to the current network), so that the current training is more efficient ī‚  Net2Net is used for initialization ī‚  Operations: wider and deeper ī‚  Much more efficient ī‚  5 GPUs for 2 days (CIFAR) ī‚  No ImageNet experiments [Chen, 2015] T. Chen et al., Net2Net:Accelerating Learning via KnowledgeTransfer, ICLR, 2015. [Cai, 2018] H. Cai et al., Efficient Architecture Search by NetworkTransformation, AAAI, 2018.
  • 26. Efficient NAS via Parameter Sharing ī‚  Instead of modifying network initialization, this work goes one step forward by sharing parameters among all generated networks ī‚  Each training stage is much shorter ī‚  Much more efficient ī‚  1 GPU for 0.45 days (CIFAR) ī‚  No ImageNet experiments [Pham, 2018] H. Pham et al., Efficient Neural Architecture Search via Parameter Sharing, ICML, 2018.
  • 27. DifferentiableArchitecture Search ī‚  With a fixed number of intermediate blocks, the operator applied to each state is unknown in the beginning ī‚  During the training process, the operator is formulated as a mixture model ī‚  The learning goal is the mixture coefficients (differentiable) ī‚  In the end of training, the most likely operator is kept, and the entire network is trained again ī‚  Much more efficient ī‚  1 GPU for 4 days (CIFAR) ī‚  Reasonable ImageNet results (in the mobile setting) [Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
  • 28. DifferentiableArchitecture Search ī‚  The best cell changes over time [Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019.
  • 29. Proxyless NAS ī‚  The first NAS work that is directly optimized on ImageNet (ILSVRC2012) ī‚  Learning weight parameters and binarized architectures simultaneously ī‚  Close to Differentiable NAS ī‚  Efficient ī‚  1 GPU for 8 days ī‚  Reason- able perfor- mance (mobile) [Cai, 2019] H. Cai et al., ProxylessNAS: Direct Neural Architecture Search onTargetTask and Hardware, ICLR, 2019.
  • 30. Probabilistic NAS ī‚  A new way to train a super-network ī‚  Sampling sub-networks from a distribution ī‚  Also able to perform proxyless architecture search ī‚  Efficiency brought by flexible control of search time on each sub-network ī‚  1 GPU for 0.2 days ī‚  Accuracy is a little bit weak on ImageNet [Noy, 2019] F.P. Casale et al., Probabilistic Neural Architecture Search, arXiv preprint: 1902.05116, 2019.
  • 31. Single-PathOne-Shot NAS ī‚  Main idea: balancing the sampling probability of each path in one-shot search ī‚  With the benefit of decoupling operations on each edge ī‚  Bridging the gap between search and evaluation ī‚  Modified search space ī‚  Blocks based on ShuffleNet-v2 ī‚  Evolution-based search algorithm ī‚  Channel number search ī‚  Latency and FLOPs constraints ī‚  Improved accuracy on single-shot NAS [Guo, 2019] Z. Guo et al., Single Path One-Shot Neural Architecture Search with Uniform Sampling, arXiv preprint: 1904.00420, 2019.
  • 32. Architecture Search,Anneal and Prune ī‚  Another effort to deal with the decoupling issue of DARTS ī‚  Decreasing the temperature term in computing the probability added to each edge ī‚  Pruning edges with low weights ī‚  Gradually turning the architecture to one-path ī‚  Efficiency brought by pruning ī‚  1 GPU for 0.2 days ī‚  Accuracy is still a little bit weak on ImageNet [Noy, 2019] A. Noy et al., ASAP:Architecture Search, Anneal and Prune, arXiv preprint: 1904.04123, 2019.
  • 33. RandomlyWired Neural Networks ī‚  A more diverse set of connectivity patterns ī‚  Connecting NAS and randomly wired neural networks ī‚  An important insight: when the search space is large enough, randomly wired networks are almost as effective as carefully searched architectures ī‚  This does not reflect that NAS is useless, but reveals that the current NAS methods are not effective enough [Xie, 2019] S. Xie et al., Exploring RandomlyWired Neural Networks for Image Recognition, arXiv preprint: 1904.01569, 2019.
  • 34. RepresentativeWork on NAS ī‚  Evolution-based approaches ī‚  Reinforcement-learning-based approaches ī‚  Towards one-shot approaches ī‚  Applications
  • 35. Auto-Deeplab ī‚  A hierarchical architecture search space ī‚  With both network-level and cell-level structures being investigated ī‚  Differentiable search method (in order to accelerate) ī‚  Similar performance to Deeplab-v3 (without pre-training) [Liu, 2019] C. Liu et al., Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation, CVPR, 2019.
  • 36. NAS-FPN ī‚  Searching for the feature pyramid network ī‚  Reinforcement-learning-based search ī‚  Good performance on MS-COCO ī‚  Improving mobile detection accuracy by 2% AP compared to SSDLite on MobileNetV2 ī‚  Achieving 48.3% AP, surpassing state-of-the-arts [Ghaisi, 2019] G. Ghaisi et al., NAS-FPN: Learning Scalable Feature PyramidArchitecturefor Object Detection, CVPR, 2019.
  • 37. Auto-ReID ī‚  A search space with part-aware module ī‚  Using both ranking and classification loss ī‚  Differentiable search ī‚  State-0f-the-art performance on ReID [Quan, 2019] R. Quan et al., Auto-ReID:Searching for a Part-aware ConvNet for Person Re-Identification, arXiv preprint: 1903.09776, 2019.
  • 38. GraphNAS ī‚  A search space containing components of GNN layers ī‚  RL-based search algorithm ī‚  A modified parameter sharing scheme ī‚  Surpassing manually designed GNN architectures [Gao, 2019] Y.Gao et al., GraphNAS:Graph NeuralArchitecture Search with ReinforcementLearning, arXiv preprint: 1904.09981, 2019.
  • 39. V-NAS ī‚  Medical image segmentation ī‚  Volumetric convolution required ī‚  Searching volumetric convolution ī‚  2D conv, 3D conv and P3D conv ī‚  Differentiable search algorithm ī‚  Outperforming state-of-the-arts [Zhu, 2019] Z. Zhu et al.,V-NAS: NeuralArchitecture Search forVolumetric Medical Image Segmentation, arXiv preprint: 1906.02817, 2019.
  • 40. AutoAugment ī‚  Learning hyper-parameters ī‚  Search Space: Shear-X/Y,Translate-X/Y, Rotate, AutoContrast, Invert, etc. ī‚  Reinforcement-learning-based search ī‚  Impressive performance on a few standard image classification benchmarks ī‚  Transferring to other tasks, e.g., NAS-FPN [Cubuk, 2019] E. Cubuk et al., AutoAugment: Learning AugmentationStrategies from Data, CVPR, 2019.
  • 41. MoreWork forYour Reference ī‚  https://github.com/markdtw/awesome-architecture-search
  • 42. Outline ī‚  Introduction ī‚  Framework ī‚  RepresentativeWork ī‚  Our New Progress ī‚  Future Directions
  • 43. P-DARTS: Overview ī‚  We start with the drawbacks of DARTS ī‚  There is a depth gap between search and evaluation ī‚  The search process is not stable: multiple runs, different results ī‚  The search process is not likely to transfer: only able to work on CIFAR10 ī‚  We proposed a new approach named Progressive DARTS ī‚  A multi-stage search progress which gradually increases the search depth ī‚  Two useful techniques: search space approximation and search space regularization ī‚  We obtained nice results ī‚  SOTA accuracy by the searched networks on CIFAR10/CIFAR100 and ImageNet ī‚  Search cost as small as 0.3 GPU-days (one single GPU, 7 hours) [Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019. [Chen, 2019] X.Chen et al., Progressive Differentiable Architecture Search: Bridging the DepthGap between Search and Evaluation, submitted, 2019.
  • 44. P-DARTS: Motivation ī‚  The depth gap and why it is important DARTS: CIFAR10 test error 2.83% 8 cells 20 cells search evaluation P-DARTS: CIFAR10 test error 2.55% 5 cells 20 cells search evaluation 11 cells 17 cells
  • 45. P-DARTS: Search Space Approximation ī‚  The progressive way of increasing search depth
  • 46. P-DARTS: Search Space Regularization ī‚  Problem: the strange behavior of skip-connect ī‚  Searching on a deep network leads to many skip-connect operations (poor results) ī‚  Reasons? ī‚  On the one hand, skip-connect often leads to fastest gradient descent ī‚  On the other hand, skip-connect does not have parameters and so leads to bad results ī‚  Solution: regularization ī‚  Adding a Dropout after each skip-connect, dedaying the rate during search ī‚  Preserving a fixed number of skip-connect after the entire search ī‚  Results Dropout on skip-c Testing Error, 2 SC Testing Error, 3 SC Testing Error, 4 SC with Dropout 2.93% 3.28% 3.51% without Dropout 2.69% 2.84% 2.97%
  • 47. P-DARTS: Performance on CIFAR10/100 ī‚  CIFAR10 andCIFAR100 (a useful enhancement:Cutout) [DeVries, 2017] T. DeVries et al., Improved Regularization of Convolutional Neural Networks with Cutout, arXiv 1708.04552, 2017.
  • 48. P-DARTS: Performance on ImageNet ī‚  ImageNet (ILSVRC2012) under the Mobile Setting
  • 49. P-DARTS: Searched Cells ī‚  Searched architectures (verification of depth gap!)
  • 50. P-DARTS: Summary ī‚  The depth gap needs to be solved ī‚  Different properties of networks with different depths ī‚  Depth is still the key issue in deep learning ī‚  Our approach ī‚  State-of-the-art results on both CIFAR10/100 and ImageNet ī‚  Search cost as small as 0.3 GPU-days ī‚  Future directions ī‚  Directly searching on ImageNet ī‚  There are many unsolved issues on NAS!
  • 51. PC-DARTS:A More Powerful Approach ī‚  We still build our approach upon DARTS ī‚  We proposed a new approach named Partially-Connected DARTS ī‚  An alternative approach to deal with the over-fitting issue of DARTS ī‚  Using partial channel connection as regularization ī‚  This method is even more stable, which can be directly searched on ImageNet ī‚  We obtained nice results ī‚  SOTA accuracy by the searched networks on ImageNet ī‚  Search cost as small as 0.06 GPU-days (one single GPU, 1.5 hours) on CIFAR10/100, or 4 GPU-days (8 GPUs, 11.5 hours) on ImageNet [Liu, 2019] H. Liu et al., DARTS: Differentiable Architecture Search, ICLR, 2019. [Xu, 2019] Y. Xu et al., PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search, submitted, 2019.
  • 52. PC-DARTS: Illustration ī‚  Partial channel connection and edge normalization
  • 53. PC-DARTS: Performance on ImageNet ī‚  ImageNet (ILSVRC2012) under the Mobile Setting
  • 54. PC-DARTS: Summary ī‚  Regularization is still a big issue ī‚  Partial channel connection in order to prevent over-fitting ī‚  Edge normalization in order to make partial channel connection work more stable ī‚  Our approach ī‚  State-of-the-art results on ImageNet ī‚  Search cost as small as 0.06 GPU-days ī‚  Future directions ī‚  Searching on a larger number of classes ī‚  There are many unsolved issues on NAS!
  • 55. Outline ī‚  Introduction ī‚  Framework ī‚  RepresentativeWork ī‚  Our New Progress ī‚  Future Directions
  • 56. Conclusions ī‚  NAS is a promising and important trend for machine learning in the future ī‚  NAS vs. fixed architectures as deep learning vs. conventional handcrafted features ī‚  Two important factors of NAS to be determined ī‚  Basic building blocks: fixed or learnable ī‚  The way of exploring the search space: genetic algorithm, reinforcement learning, or joint optimization ī‚  The importance of computational power is reduced, but still significant
  • 57. RelatedApplications ī‚  The searched architectures were verified effective for transfer learning tasks ī‚  NASNet outperformed ResNet101 in object detection by 4% ī‚  Take-home message: stronger architectures are often transferrable ī‚  The ability of NAS in other vision tasks ī‚  Preliminary success in semantic segmentation, object detection, etc. [Zoph, 2018] B. Zoph et al., LearningTransferable Architectures for Scalable Image Recognition, CVPR, 2018. [Chen, 2018] L. Chen et al., Searching for Efficient Multi-ScaleArchitectures for Dense Image Prediction, NIPS, 2018. [Liu, 2019] C. Liu et al., Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation, CVPR, 2019. [Ghiasi, 2019] G. Ghiasi et al., NAS-FPN: Learning Scalable Feature PyramidArchitecture for Object Detection, CVPR, 2019.
  • 58. Future Directions ī‚  Currently, the search space is constrained by the limited types of building blocks ī‚  It is not guaranteed that the current building blocks are optimal ī‚  It remains to explore the possibility of searching into the building blocks ī‚  Currently, the searched architectures are not friendly to hardware ī‚  Which leads to dramatically slow speed in network training ī‚  Currently, the searched architectures are task-specific ī‚  This may not be a problem, but an ideal vision system should be generalized ī‚  Currently, the searching process is not yet stable ī‚  We desire a framework as generalized as regular deep networks
  • 59. Thanks ī‚  Questions, please? ī‚  Contact me for collaboration and internship â˜ē