SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
An Introduction to
Neural Architecture Search
Colin White, RealityEngines.AI
Deep learning
● Explosion of interest since 2012
● Very powerful machine learning technique
● Huge variety of neural networks for different tasks
● Key ingredients took years to develop
● Algorithms are getting increasingly more specialized and complicated
Algorithms are getting increasingly more specialized and complicated
E.g. accuracy on ImageNet has steadily improved over 10 years
Deep learning
● What if an algorithm could do this
for us?
● Neural architecture search (NAS)
is a hot area of research
● Given a dataset, define a search
space of architectures, then use a
search strategy to find the best
architecture for your dataset
Neural architecture search
Outline
● Introduction to NAS
● Background on deep learning
● Automated machine learning
● Optimization techniques
● NAS Framework
○ Search space
○ Search strategy
○ Evaluation method
● Conclusions
Background - deep learning
Source:https://www.nextplatform.com/2017/03/21/can-fpgas-beat-gpus-accelerating-
next-generation-deep-learning/
● Studied since the 1940s - simulate the human brain
● “Neural networks are the second-best way to do almost anything” - JS Denker, 2000s
● Breakthrough: 2012 ImageNet competition [Krizhevsky, Sutskever, and Hinton]
Automated Machine Learning
● Automated machine learning
○ Data cleaning, model selection, HPO, NAS, ...
● Hyperparameter optimization (HPO)
○ Learning rate, dropout rate, batch size, ...
● Neural architecture search
○ Finding the best neural architecture
Source:https://determined.ai/blog/neural-arc
hitecture-search/
Optimization
● Zero’th order optimization (used for HPO, NAS)
○ Bayesian Optimization
● First order optimization (used for Neural nets)
○ Gradient descent / stochastic gradient descent
● Second order optimization
○ Newton’s method
Source: https://ieeexplore.ieee.org/document/8422997
Zero’th order optimization
● Grid search
● Random search
● Bayesian Optimization
○ Use the results of the previous guesses to make the next guess
Source: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/
Outline
● Introduction to NAS
● Background on deep learning
● Automated machine learning
● Optimization techniques
● NAS Framework
○ Search space
○ Search strategy
○ Evaluation method
● Conclusions
Neural Architecture Search
Evaluation method
● Full training
● Partial training
● Training with shared
weights
Search space
● Cell-based search space
● Macro vs micro search
Search strategy
● Reinforcement learning
● Continuous methods
● Bayesian optimization
● Evolutionary algorithm
Search space
● Macro vs micro search
● Progressive search
● Cell-based search space
[Zoph, Le ‘16]
Bayesian optimization
● Popular method [Golovin et al. ‘17], [Jin et al. ‘18], [Kandasamy et al. ‘18]
● Great method to optimize an expensive function
● Fix a dataset (e.g. CIFAR-10, MNIST)
● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense})
● Define an objective function f:A→[0,1]
○ f(a) = validation accuracy of a after training
● Define a distance function d(a1
, a2
) between architectures
○ Quick to evaluate. If d(a1
, a2
) is small, | f(a1
) - f( a2
) | is small
Bayesian optimization
Goal: find a∈A which maximizes f(a)
● Choose several random architectures a and evaluate f(a)
● In each iteration i:
○ Use f(a1
) … f(ai-1
) to choose new ai
○ Evaluate f(ai
)
● Fix a dataset (e.g. CIFAR-10, MNIST)
● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense})
● Define an objective function f:A→[0,1]
○ f(a) = validation accuracy of a after training
● Define a distance function d(a1
, a2
) between architectures
○ Quick to evaluate. If d(a1
, a2
) is small, | f(a1
) - f( a2
) | is small
Gaussian process
● Assume the distribution f(A) is smooth
● The deviations look like Gaussian noise
● Update as we get more information
Source: http://keyonvafa.com/gp-tutorial/,
https://katbailey.github.io/post/gaussian-processes-for-dummies/
Acquisition function
● In each iteration,
find the architecture
with the largest
expected
improvement
DARTS: Differentiable Architecture Search
● Relax NAS to a continuous problem
● Use gradient descent (just like normal parameters)
[Liu et al. ‘18]
DARTS: Differentiable Architecture Search
● Upsides: “one-shot”
● Downside: may only work in “micro” search setting
[Liu et al. ‘18]
Reinforcement Learning
● Controller recurrent neural network
○ Chooses a new architecture in each round
○ Architecture is trained and evaluated
○ Controller receives feedback
[Zoph, Le ‘16]
Reinforcement Learning
● Upside: much more powerful than BayesOpt, gradient descent
● Downsides: train a whole new network using neural networks;
RL could be overkill
[Zoph, Le ‘16]
Evaluation Strategy
● Full training
○ Simple
○ Accurate
● Partial training
○ Less computation
○ Less accurate
● Shared weights
○ Least computation
○ Least accurate
Source:
https://www.automl.org/blog-2nd-a
utoml-challenge/
Is NAS ready for widespread adoption?
● Hard to reproduce results [Li, Talwalkar ‘19]
● Hard to compare different papers
● Search spaces have been getting smaller
● Random search is a strong baseline [Li, Talwalkar ‘19], [Sciuto et al. ‘19]
● Recent papers are giving fair comparisons
● NAS cannot yet consistently beat human engineers
● Auto-Keras tutorial: https://www.pyimagesearch.com/2019/01/07/auto-keras-and-automl-a-getting-started-guide/
● DARTS repo: https://github.com/quark0/darts
Conclusion
● NAS: find the best neural architecture for a given dataset
● Search space, search strategy, evaluation method
● Search strategies: RL, BayesOpt, continuous optimization
● Not yet at the point of widespread adoption in industry
Thanks! Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
Vincenzo Lomonaco
 

Was ist angesagt? (20)

Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Multimodal Deep Learning
Multimodal Deep LearningMultimodal Deep Learning
Multimodal Deep Learning
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Deformable Convolutional Network (2017)
Deformable Convolutional Network (2017)Deformable Convolutional Network (2017)
Deformable Convolutional Network (2017)
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
07 regularization
07 regularization07 regularization
07 regularization
 

Ähnlich wie An Introduction to Neural Architecture Search

Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
Zimin Park
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorch
inside-BigData.com
 

Ähnlich wie An Introduction to Neural Architecture Search (20)

Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Centernet
CenternetCenternet
Centernet
 
Scalable high-dimensional indexing with Hadoop
Scalable high-dimensional indexing with HadoopScalable high-dimensional indexing with Hadoop
Scalable high-dimensional indexing with Hadoop
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
 
Reduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective SearchReduce Query Time Up to 60% with Selective Search
Reduce Query Time Up to 60% with Selective Search
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorch
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
 

Mehr von Bill Liu

Mehr von Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

An Introduction to Neural Architecture Search

  • 1. An Introduction to Neural Architecture Search Colin White, RealityEngines.AI
  • 2. Deep learning ● Explosion of interest since 2012 ● Very powerful machine learning technique ● Huge variety of neural networks for different tasks ● Key ingredients took years to develop ● Algorithms are getting increasingly more specialized and complicated
  • 3. Algorithms are getting increasingly more specialized and complicated E.g. accuracy on ImageNet has steadily improved over 10 years Deep learning
  • 4. ● What if an algorithm could do this for us? ● Neural architecture search (NAS) is a hot area of research ● Given a dataset, define a search space of architectures, then use a search strategy to find the best architecture for your dataset Neural architecture search
  • 5. Outline ● Introduction to NAS ● Background on deep learning ● Automated machine learning ● Optimization techniques ● NAS Framework ○ Search space ○ Search strategy ○ Evaluation method ● Conclusions
  • 6. Background - deep learning Source:https://www.nextplatform.com/2017/03/21/can-fpgas-beat-gpus-accelerating- next-generation-deep-learning/ ● Studied since the 1940s - simulate the human brain ● “Neural networks are the second-best way to do almost anything” - JS Denker, 2000s ● Breakthrough: 2012 ImageNet competition [Krizhevsky, Sutskever, and Hinton]
  • 7. Automated Machine Learning ● Automated machine learning ○ Data cleaning, model selection, HPO, NAS, ... ● Hyperparameter optimization (HPO) ○ Learning rate, dropout rate, batch size, ... ● Neural architecture search ○ Finding the best neural architecture Source:https://determined.ai/blog/neural-arc hitecture-search/
  • 8. Optimization ● Zero’th order optimization (used for HPO, NAS) ○ Bayesian Optimization ● First order optimization (used for Neural nets) ○ Gradient descent / stochastic gradient descent ● Second order optimization ○ Newton’s method Source: https://ieeexplore.ieee.org/document/8422997
  • 9. Zero’th order optimization ● Grid search ● Random search ● Bayesian Optimization ○ Use the results of the previous guesses to make the next guess Source: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/
  • 10. Outline ● Introduction to NAS ● Background on deep learning ● Automated machine learning ● Optimization techniques ● NAS Framework ○ Search space ○ Search strategy ○ Evaluation method ● Conclusions
  • 11. Neural Architecture Search Evaluation method ● Full training ● Partial training ● Training with shared weights Search space ● Cell-based search space ● Macro vs micro search Search strategy ● Reinforcement learning ● Continuous methods ● Bayesian optimization ● Evolutionary algorithm
  • 12. Search space ● Macro vs micro search ● Progressive search ● Cell-based search space [Zoph, Le ‘16]
  • 13. Bayesian optimization ● Popular method [Golovin et al. ‘17], [Jin et al. ‘18], [Kandasamy et al. ‘18] ● Great method to optimize an expensive function ● Fix a dataset (e.g. CIFAR-10, MNIST) ● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense}) ● Define an objective function f:A→[0,1] ○ f(a) = validation accuracy of a after training ● Define a distance function d(a1 , a2 ) between architectures ○ Quick to evaluate. If d(a1 , a2 ) is small, | f(a1 ) - f( a2 ) | is small
  • 14. Bayesian optimization Goal: find a∈A which maximizes f(a) ● Choose several random architectures a and evaluate f(a) ● In each iteration i: ○ Use f(a1 ) … f(ai-1 ) to choose new ai ○ Evaluate f(ai ) ● Fix a dataset (e.g. CIFAR-10, MNIST) ● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense}) ● Define an objective function f:A→[0,1] ○ f(a) = validation accuracy of a after training ● Define a distance function d(a1 , a2 ) between architectures ○ Quick to evaluate. If d(a1 , a2 ) is small, | f(a1 ) - f( a2 ) | is small
  • 15. Gaussian process ● Assume the distribution f(A) is smooth ● The deviations look like Gaussian noise ● Update as we get more information Source: http://keyonvafa.com/gp-tutorial/, https://katbailey.github.io/post/gaussian-processes-for-dummies/
  • 16. Acquisition function ● In each iteration, find the architecture with the largest expected improvement
  • 17. DARTS: Differentiable Architecture Search ● Relax NAS to a continuous problem ● Use gradient descent (just like normal parameters) [Liu et al. ‘18]
  • 18. DARTS: Differentiable Architecture Search ● Upsides: “one-shot” ● Downside: may only work in “micro” search setting [Liu et al. ‘18]
  • 19. Reinforcement Learning ● Controller recurrent neural network ○ Chooses a new architecture in each round ○ Architecture is trained and evaluated ○ Controller receives feedback [Zoph, Le ‘16]
  • 20. Reinforcement Learning ● Upside: much more powerful than BayesOpt, gradient descent ● Downsides: train a whole new network using neural networks; RL could be overkill [Zoph, Le ‘16]
  • 21. Evaluation Strategy ● Full training ○ Simple ○ Accurate ● Partial training ○ Less computation ○ Less accurate ● Shared weights ○ Least computation ○ Least accurate Source: https://www.automl.org/blog-2nd-a utoml-challenge/
  • 22. Is NAS ready for widespread adoption? ● Hard to reproduce results [Li, Talwalkar ‘19] ● Hard to compare different papers ● Search spaces have been getting smaller ● Random search is a strong baseline [Li, Talwalkar ‘19], [Sciuto et al. ‘19] ● Recent papers are giving fair comparisons ● NAS cannot yet consistently beat human engineers ● Auto-Keras tutorial: https://www.pyimagesearch.com/2019/01/07/auto-keras-and-automl-a-getting-started-guide/ ● DARTS repo: https://github.com/quark0/darts
  • 23. Conclusion ● NAS: find the best neural architecture for a given dataset ● Search space, search strategy, evaluation method ● Search strategies: RL, BayesOpt, continuous optimization ● Not yet at the point of widespread adoption in industry Thanks! Questions?