SlideShare a Scribd company logo
1 of 28
Neural Networks II
Chen Gao
Virginia Tech Spring 2019
ECE-5424G / CS-5824
Neural Networks
• Origins: Algorithms that try to mimic the brain.
What is this?
A single neuron in the brain
Input
Output Slide credit: Andrew Ng
An artificial neuron: Logistic unit
𝑥 =
𝑥0
𝑥1
𝑥2
𝑥3
𝜃 =
𝜃0
𝜃1
𝜃2
𝜃3
• Sigmoid (logistic) activation function
𝑥1
𝑥2
𝑥3
𝑥0
ℎ𝜃 𝑥 =
1
1 + 𝑒−𝜃⊤𝑥
“Bias unit”
“Input”
“Output”
“Weights”
“Parameters”
Slide credit: Andrew Ng
Visualization of weights, bias, activation function
bias b only change the
position of the hyperplane
Slide credit: Hugo Larochelle
range
determined
by g(.)
Activation - sigmoid
• Squashes the neuron’s pre-
activation between 0 and 1
• Always positive
• Bounded
• Strictly increasing
𝑔 𝑥 =
1
1 + 𝑒−𝑥
Slide credit: Hugo Larochelle
Activation - hyperbolic tangent (tanh)
• Squashes the neuron’s pre-
activation between -1 and 1
• Can be positive or negative
• Bounded
• Strictly increasing
𝑔 𝑥 = tanh 𝑥 =
𝑒𝑥
− 𝑒−𝑥
𝑒𝑥 + 𝑒−𝑥
Slide credit: Hugo Larochelle
Activation - rectified linear(relu)
• Bounded below by 0
• always non-negative
• Not upper bounded
• Tends to give neurons with
sparse activities
Slide credit: Hugo Larochelle
𝑔 𝑥 = relu 𝑥 = max 0, 𝑥
Activation - softmax
• For multi-class classification:
• we need multiple outputs (1 output per class)
• we would like to estimate the conditional probability 𝑝 𝑦 = 𝑐 | 𝑥
• We use the softmax activation function at the output:
𝑔 𝑥 = softmax 𝑥 =
𝑒𝑥1
𝑐 𝑒
𝑥𝑐
…
𝑒𝑥𝑐
𝑐 𝑒
𝑥𝑐
Slide credit: Hugo Larochelle
Universal approximation theorem
‘‘a single hidden layer neural network with a linear output unit
can approximate any continuous function arbitrarily well,
given enough hidden units’’
Hornik, 1991
Slide credit: Hugo Larochelle
Neural network – Multilayer
𝑎1
(2)
𝑎2
(2)
𝑎3
(2)
𝑎0
(2)
ℎΘ 𝑥
Layer 1
“Output”
𝑥1
𝑥2
𝑥3
𝑥0
Layer 2 (hidden) Layer 3 Slide credit: Andrew Ng
Neural network
𝑎1
(2)
𝑎2
(2)
𝑎3
(2)
𝑎0
(2)
ℎΘ 𝑥
𝑥1
𝑥2
𝑥3
𝑥0
𝑎𝑖
(𝑗)
= “activation” of unit 𝑖 in layer 𝑗
Θ 𝑗 = matrix of weights controlling
function mapping from layer 𝑗 to layer 𝑗 + 1
𝑎1
(2)
= 𝑔 Θ10
(1)
𝑥0 + Θ11
(1)
𝑥1 + Θ12
(1)
𝑥2 + Θ13
(1)
𝑥3
𝑎2
(2)
= 𝑔 Θ20
(1)
𝑥0 + Θ21
(1)
𝑥1 + Θ22
(1)
𝑥2 + Θ23
(1)
𝑥3
𝑎3
(2)
= 𝑔 Θ30
(1)
𝑥0 + Θ31
(1)
𝑥1 + Θ32
(1)
𝑥2 + Θ33
(1)
𝑥3
ℎΘ(𝑥) = 𝑔 Θ10
(2)
𝑎0
(2)
+ Θ11
(1)
𝑎1
(2)
+ Θ12
(1)
𝑎2
(2)
+ Θ13
(1)
𝑎3
(2)
𝑠𝑗 unit in layer 𝑗
𝑠𝑗+1 units in layer 𝑗 + 1
Size of Θ 𝑗
?
𝑠𝑗+1 × (𝑠𝑗 + 1)
Slide credit: Andrew Ng
Neural network
𝑎1
(2)
𝑎2
(2)
𝑎3
(2)
𝑎0
(2)
ℎΘ 𝑥
𝑥1
𝑥2
𝑥3
𝑥0
𝑥 =
𝑥0
𝑥1
𝑥2
𝑥3
z(2) =
z1
(2)
z2
(2)
z3
(2)
𝑎1
(2)
= 𝑔 Θ10
(1)
𝑥0 + Θ11
(1)
𝑥1 + Θ12
(1)
𝑥2 + Θ13
(1)
𝑥3 = 𝑔(z1
(2)
)
𝑎2
(2)
= 𝑔 Θ20
(1)
𝑥0 + Θ21
(1)
𝑥1 + Θ22
(1)
𝑥2 + Θ23
(1)
𝑥3 = 𝑔(z2
(2)
)
𝑎3
(2)
= 𝑔 Θ30
(1)
𝑥0 + Θ31
(1)
𝑥1 + Θ32
(1)
𝑥2 + Θ33
(1)
𝑥3 = 𝑔(z3
(2)
)
ℎΘ 𝑥 = 𝑔 Θ10
2
𝑎0
2
+ Θ11
1
𝑎1
2
+ Θ12
1
𝑎2
2
+ Θ13
1
𝑎3
2
= 𝑔(𝑧(3))
“Pre-activation”
Slide credit: Andrew Ng
Why do we need g(.)?
Neural network
𝑎1
(2)
𝑎2
(2)
𝑎3
(2)
𝑎0
(2)
ℎΘ 𝑥
𝑥1
𝑥2
𝑥3
𝑥0
𝑥 =
𝑥0
𝑥1
𝑥2
𝑥3
z(2) =
z1
(2)
z2
(2)
z3
(2)
𝑎1
(2)
= 𝑔(z1
(2)
)
𝑎2
(2)
= 𝑔(z2
(2)
)
𝑎3
(2)
= 𝑔(z3
(2)
)
ℎΘ 𝑥 = 𝑔(𝑧(3)
)
“Pre-activation”
𝑧(2)
= Θ(1)
𝑥 = Θ(1)
𝑎(1)
𝑎(2)
= 𝑔(𝑧(2)
)
Add 𝑎0
(2)
= 1
𝑧(3)
= Θ(2)
𝑎(2)
ℎΘ 𝑥 = 𝑎(3)
= 𝑔(𝑧(3)
)
Slide credit: Andrew Ng
Flow graph - Forward propagation
𝑎
(2)
𝑧
(2)
𝑎
(3)
ℎΘ 𝑥
X
𝑊
(1)
𝑏
(1)
𝑧
(3)
𝑊
(2)
𝑏
(2)
𝑧(2)
= Θ(1)
𝑥 = Θ(1)
𝑎(1)
𝑎(2)
= 𝑔(𝑧(2)
)
Add 𝑎0
(2)
= 1
𝑧(3)
= Θ(2)
𝑎(2)
ℎΘ 𝑥 = 𝑎(3)
= 𝑔(𝑧(3)
)
How do we evaluate
our prediction?
Cost function
Logistic regression:
Neural network:
Slide credit: Andrew Ng
Gradient computation
Need to compute:
Slide credit: Andrew Ng
Gradient computation
Given one training example 𝑥, 𝑦
𝑎(1)
= 𝑥
𝑧(2)
= Θ(1)
𝑎(1)
𝑎(2)
= 𝑔(𝑧(2)
) (add a0
(2)
)
𝑧(3)
= Θ(2)
𝑎(2)
𝑎(3)
= 𝑔(𝑧(3)
) (add a0
(3)
)
𝑧(4)
= Θ(3)
𝑎(3)
𝑎(4)
= 𝑔 𝑧 4
= ℎΘ 𝑥
Slide credit: Andrew Ng
Gradient computation: Backpropagation
Intuition: ߜ𝑗
(𝑙)
= “error” of node 𝑗 in layer 𝑙
For each output unit (layer L = 4)
Slide credit: Andrew Ng
ߜ(4) = 𝑎(4) − 𝑦
ߜ(3) = ߜ(4) 𝜕ߜ(4)
𝜕𝑧(3) = ߜ(4) 𝜕ߜ(4)
𝜕𝑎(4)
𝜕𝑎(4)
𝜕𝑧(4)
𝜕𝑧(4)
𝜕𝑎(3)
𝜕𝑎(3)
𝜕𝑧(3)
= 1 *Θ 3 𝑇
ߜ(4)
.∗ 𝑔′ 𝑧 4
.∗ 𝑔′(𝑧(3)
)
𝑧(3)
= Θ(2)
𝑎(2)
𝑎(3)
= 𝑔(𝑧(3)
)
𝑧(4)
= Θ(3)
𝑎(3)
𝑎(4)
= 𝑔 𝑧 4
Backpropagation algorithm
Training set 𝑥(1)
, 𝑦(1)
… 𝑥(𝑚)
, 𝑦(𝑚)
Set Θ(1)
= 0
For 𝑖 = 1 to 𝑚
Set 𝑎(1)
= 𝑥
Perform forward propagation to compute 𝑎(𝑙)
for 𝑙 = 2. . 𝐿
use 𝑦(𝑖)
to compute ߜ
(𝐿)
= 𝑎(𝐿)
− 𝑦(𝑖)
Compute ߜ
(𝐿−1)
, ߜ
(𝐿−2)
… ߜ
(2)
Θ(𝑙)
= Θ(𝑙)
− 𝑎(𝑙)
ߜ
(𝑙+1)
Slide credit: Andrew Ng
Activation - sigmoid
• Partial derivative
𝑔 𝑥 =
1
1 + 𝑒−𝑥
Slide credit: Hugo Larochelle
𝑔′ 𝑥 = 𝑔 𝑥 1 − 𝑔 𝑥
Activation - hyperbolic tangent (tanh)
𝑔 𝑥 = tanh 𝑥 =
𝑒𝑥
− 𝑒−𝑥
𝑒𝑥 + 𝑒−𝑥
Slide credit: Hugo Larochelle
• Partial derivative
𝑔′ 𝑥 = 1 − 𝑔 𝑥 2
Activation - rectified linear(relu)
Slide credit: Hugo Larochelle
𝑔 𝑥 = relu 𝑥 = max 0, 𝑥
• Partial derivative
𝑔′ 𝑥 = 1𝑥 > 0
Initialization
• For bias
• Initialize all to 0
• For weights
• Can’t initialize all weights to the same value
• we can show that all hidden units in a layer will always behave the same
• need to break symmetry
• Recipe: U[-b, b]
• the idea is to sample around 0 but break symmetry
Slide credit: Hugo Larochelle
Putting it together
Pick a network architecture
Slide credit: Hugo Larochelle
• No. of input units: Dimension of features
• No. output units: Number of classes
• Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no.
of hidden units in every layer (usually the more the better)
• Grid search
Putting it together
Early stopping
Slide credit: Hugo Larochelle
• Use a validation set performance to select the best configuration
• To select the number of epochs, stop training when validation set error
increases
Other tricks of the trade
Slide credit: Hugo Larochelle
• Normalizing your (real-valued) data
• Decaying the learning rate
• as we get closer to the optimum, makes sense to take smaller update steps
• mini-batch
• can give a more accurate estimate of the risk gradient
• Momentum
• can use an exponential average of previous gradients
Dropout
Slide credit: Hugo Larochelle
• Idea: «cripple» neural network by removing hidden units
• each hidden unit is set to 0 with probability 0.5
• hidden units cannot co-adapt to other units
• hidden units must be more generally useful

More Related Content

Similar to Lec10.pptx

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep LearningShajun Nisha
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksarjitkantgupta
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoningSan Kim
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 

Similar to Lec10.pptx (20)

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
Neural network
Neural networkNeural network
Neural network
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
DNN.pptx
DNN.pptxDNN.pptx
DNN.pptx
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Neural networks
Neural networksNeural networks
Neural networks
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
Neural network
Neural networkNeural network
Neural network
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoning
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 

Recently uploaded

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Recently uploaded (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Lec10.pptx

  • 1. Neural Networks II Chen Gao Virginia Tech Spring 2019 ECE-5424G / CS-5824
  • 2. Neural Networks • Origins: Algorithms that try to mimic the brain. What is this?
  • 3. A single neuron in the brain Input Output Slide credit: Andrew Ng
  • 4. An artificial neuron: Logistic unit 𝑥 = 𝑥0 𝑥1 𝑥2 𝑥3 𝜃 = 𝜃0 𝜃1 𝜃2 𝜃3 • Sigmoid (logistic) activation function 𝑥1 𝑥2 𝑥3 𝑥0 ℎ𝜃 𝑥 = 1 1 + 𝑒−𝜃⊤𝑥 “Bias unit” “Input” “Output” “Weights” “Parameters” Slide credit: Andrew Ng
  • 5. Visualization of weights, bias, activation function bias b only change the position of the hyperplane Slide credit: Hugo Larochelle range determined by g(.)
  • 6. Activation - sigmoid • Squashes the neuron’s pre- activation between 0 and 1 • Always positive • Bounded • Strictly increasing 𝑔 𝑥 = 1 1 + 𝑒−𝑥 Slide credit: Hugo Larochelle
  • 7. Activation - hyperbolic tangent (tanh) • Squashes the neuron’s pre- activation between -1 and 1 • Can be positive or negative • Bounded • Strictly increasing 𝑔 𝑥 = tanh 𝑥 = 𝑒𝑥 − 𝑒−𝑥 𝑒𝑥 + 𝑒−𝑥 Slide credit: Hugo Larochelle
  • 8. Activation - rectified linear(relu) • Bounded below by 0 • always non-negative • Not upper bounded • Tends to give neurons with sparse activities Slide credit: Hugo Larochelle 𝑔 𝑥 = relu 𝑥 = max 0, 𝑥
  • 9. Activation - softmax • For multi-class classification: • we need multiple outputs (1 output per class) • we would like to estimate the conditional probability 𝑝 𝑦 = 𝑐 | 𝑥 • We use the softmax activation function at the output: 𝑔 𝑥 = softmax 𝑥 = 𝑒𝑥1 𝑐 𝑒 𝑥𝑐 … 𝑒𝑥𝑐 𝑐 𝑒 𝑥𝑐 Slide credit: Hugo Larochelle
  • 10. Universal approximation theorem ‘‘a single hidden layer neural network with a linear output unit can approximate any continuous function arbitrarily well, given enough hidden units’’ Hornik, 1991 Slide credit: Hugo Larochelle
  • 11. Neural network – Multilayer 𝑎1 (2) 𝑎2 (2) 𝑎3 (2) 𝑎0 (2) ℎΘ 𝑥 Layer 1 “Output” 𝑥1 𝑥2 𝑥3 𝑥0 Layer 2 (hidden) Layer 3 Slide credit: Andrew Ng
  • 12. Neural network 𝑎1 (2) 𝑎2 (2) 𝑎3 (2) 𝑎0 (2) ℎΘ 𝑥 𝑥1 𝑥2 𝑥3 𝑥0 𝑎𝑖 (𝑗) = “activation” of unit 𝑖 in layer 𝑗 Θ 𝑗 = matrix of weights controlling function mapping from layer 𝑗 to layer 𝑗 + 1 𝑎1 (2) = 𝑔 Θ10 (1) 𝑥0 + Θ11 (1) 𝑥1 + Θ12 (1) 𝑥2 + Θ13 (1) 𝑥3 𝑎2 (2) = 𝑔 Θ20 (1) 𝑥0 + Θ21 (1) 𝑥1 + Θ22 (1) 𝑥2 + Θ23 (1) 𝑥3 𝑎3 (2) = 𝑔 Θ30 (1) 𝑥0 + Θ31 (1) 𝑥1 + Θ32 (1) 𝑥2 + Θ33 (1) 𝑥3 ℎΘ(𝑥) = 𝑔 Θ10 (2) 𝑎0 (2) + Θ11 (1) 𝑎1 (2) + Θ12 (1) 𝑎2 (2) + Θ13 (1) 𝑎3 (2) 𝑠𝑗 unit in layer 𝑗 𝑠𝑗+1 units in layer 𝑗 + 1 Size of Θ 𝑗 ? 𝑠𝑗+1 × (𝑠𝑗 + 1) Slide credit: Andrew Ng
  • 13. Neural network 𝑎1 (2) 𝑎2 (2) 𝑎3 (2) 𝑎0 (2) ℎΘ 𝑥 𝑥1 𝑥2 𝑥3 𝑥0 𝑥 = 𝑥0 𝑥1 𝑥2 𝑥3 z(2) = z1 (2) z2 (2) z3 (2) 𝑎1 (2) = 𝑔 Θ10 (1) 𝑥0 + Θ11 (1) 𝑥1 + Θ12 (1) 𝑥2 + Θ13 (1) 𝑥3 = 𝑔(z1 (2) ) 𝑎2 (2) = 𝑔 Θ20 (1) 𝑥0 + Θ21 (1) 𝑥1 + Θ22 (1) 𝑥2 + Θ23 (1) 𝑥3 = 𝑔(z2 (2) ) 𝑎3 (2) = 𝑔 Θ30 (1) 𝑥0 + Θ31 (1) 𝑥1 + Θ32 (1) 𝑥2 + Θ33 (1) 𝑥3 = 𝑔(z3 (2) ) ℎΘ 𝑥 = 𝑔 Θ10 2 𝑎0 2 + Θ11 1 𝑎1 2 + Θ12 1 𝑎2 2 + Θ13 1 𝑎3 2 = 𝑔(𝑧(3)) “Pre-activation” Slide credit: Andrew Ng Why do we need g(.)?
  • 14. Neural network 𝑎1 (2) 𝑎2 (2) 𝑎3 (2) 𝑎0 (2) ℎΘ 𝑥 𝑥1 𝑥2 𝑥3 𝑥0 𝑥 = 𝑥0 𝑥1 𝑥2 𝑥3 z(2) = z1 (2) z2 (2) z3 (2) 𝑎1 (2) = 𝑔(z1 (2) ) 𝑎2 (2) = 𝑔(z2 (2) ) 𝑎3 (2) = 𝑔(z3 (2) ) ℎΘ 𝑥 = 𝑔(𝑧(3) ) “Pre-activation” 𝑧(2) = Θ(1) 𝑥 = Θ(1) 𝑎(1) 𝑎(2) = 𝑔(𝑧(2) ) Add 𝑎0 (2) = 1 𝑧(3) = Θ(2) 𝑎(2) ℎΘ 𝑥 = 𝑎(3) = 𝑔(𝑧(3) ) Slide credit: Andrew Ng
  • 15. Flow graph - Forward propagation 𝑎 (2) 𝑧 (2) 𝑎 (3) ℎΘ 𝑥 X 𝑊 (1) 𝑏 (1) 𝑧 (3) 𝑊 (2) 𝑏 (2) 𝑧(2) = Θ(1) 𝑥 = Θ(1) 𝑎(1) 𝑎(2) = 𝑔(𝑧(2) ) Add 𝑎0 (2) = 1 𝑧(3) = Θ(2) 𝑎(2) ℎΘ 𝑥 = 𝑎(3) = 𝑔(𝑧(3) ) How do we evaluate our prediction?
  • 16. Cost function Logistic regression: Neural network: Slide credit: Andrew Ng
  • 17. Gradient computation Need to compute: Slide credit: Andrew Ng
  • 18. Gradient computation Given one training example 𝑥, 𝑦 𝑎(1) = 𝑥 𝑧(2) = Θ(1) 𝑎(1) 𝑎(2) = 𝑔(𝑧(2) ) (add a0 (2) ) 𝑧(3) = Θ(2) 𝑎(2) 𝑎(3) = 𝑔(𝑧(3) ) (add a0 (3) ) 𝑧(4) = Θ(3) 𝑎(3) 𝑎(4) = 𝑔 𝑧 4 = ℎΘ 𝑥 Slide credit: Andrew Ng
  • 19. Gradient computation: Backpropagation Intuition: ߜ𝑗 (𝑙) = “error” of node 𝑗 in layer 𝑙 For each output unit (layer L = 4) Slide credit: Andrew Ng ߜ(4) = 𝑎(4) − 𝑦 ߜ(3) = ߜ(4) 𝜕ߜ(4) 𝜕𝑧(3) = ߜ(4) 𝜕ߜ(4) 𝜕𝑎(4) 𝜕𝑎(4) 𝜕𝑧(4) 𝜕𝑧(4) 𝜕𝑎(3) 𝜕𝑎(3) 𝜕𝑧(3) = 1 *Θ 3 𝑇 ߜ(4) .∗ 𝑔′ 𝑧 4 .∗ 𝑔′(𝑧(3) ) 𝑧(3) = Θ(2) 𝑎(2) 𝑎(3) = 𝑔(𝑧(3) ) 𝑧(4) = Θ(3) 𝑎(3) 𝑎(4) = 𝑔 𝑧 4
  • 20. Backpropagation algorithm Training set 𝑥(1) , 𝑦(1) … 𝑥(𝑚) , 𝑦(𝑚) Set Θ(1) = 0 For 𝑖 = 1 to 𝑚 Set 𝑎(1) = 𝑥 Perform forward propagation to compute 𝑎(𝑙) for 𝑙 = 2. . 𝐿 use 𝑦(𝑖) to compute ߜ (𝐿) = 𝑎(𝐿) − 𝑦(𝑖) Compute ߜ (𝐿−1) , ߜ (𝐿−2) … ߜ (2) Θ(𝑙) = Θ(𝑙) − 𝑎(𝑙) ߜ (𝑙+1) Slide credit: Andrew Ng
  • 21. Activation - sigmoid • Partial derivative 𝑔 𝑥 = 1 1 + 𝑒−𝑥 Slide credit: Hugo Larochelle 𝑔′ 𝑥 = 𝑔 𝑥 1 − 𝑔 𝑥
  • 22. Activation - hyperbolic tangent (tanh) 𝑔 𝑥 = tanh 𝑥 = 𝑒𝑥 − 𝑒−𝑥 𝑒𝑥 + 𝑒−𝑥 Slide credit: Hugo Larochelle • Partial derivative 𝑔′ 𝑥 = 1 − 𝑔 𝑥 2
  • 23. Activation - rectified linear(relu) Slide credit: Hugo Larochelle 𝑔 𝑥 = relu 𝑥 = max 0, 𝑥 • Partial derivative 𝑔′ 𝑥 = 1𝑥 > 0
  • 24. Initialization • For bias • Initialize all to 0 • For weights • Can’t initialize all weights to the same value • we can show that all hidden units in a layer will always behave the same • need to break symmetry • Recipe: U[-b, b] • the idea is to sample around 0 but break symmetry Slide credit: Hugo Larochelle
  • 25. Putting it together Pick a network architecture Slide credit: Hugo Larochelle • No. of input units: Dimension of features • No. output units: Number of classes • Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better) • Grid search
  • 26. Putting it together Early stopping Slide credit: Hugo Larochelle • Use a validation set performance to select the best configuration • To select the number of epochs, stop training when validation set error increases
  • 27. Other tricks of the trade Slide credit: Hugo Larochelle • Normalizing your (real-valued) data • Decaying the learning rate • as we get closer to the optimum, makes sense to take smaller update steps • mini-batch • can give a more accurate estimate of the risk gradient • Momentum • can use an exponential average of previous gradients
  • 28. Dropout Slide credit: Hugo Larochelle • Idea: «cripple» neural network by removing hidden units • each hidden unit is set to 0 with probability 0.5 • hidden units cannot co-adapt to other units • hidden units must be more generally useful

Editor's Notes

  1. Two ears, two eye, color, coat of hair  puppy Edges, simple pattern  ear/eye Low level feature  high level feature
  2. If activated  activation
  3. Non-linear