SlideShare a Scribd company logo
1 of 4
Download to read offline
• Train – Forward propagation
1. Assume, drop out ratio of layer 2 is 0.3
2. Assume, layer 2 activation vector [aL2N1, aL2N2 ] = [0.23, 0.47]
3. Bernoulli mask vector – A vector where every value follows Bernoulli distribution with probability of failure same as Drop out ratio.
a) Assume layer 2 mask vector [mL2N1, mL2N2 ] = [False, True]
4. Updated activation vector = Activation vector (dot) Mask Vector.
a) “Dot” operation will make corresponding values in activation vector disappear.
b) Updated activation vector = [ __, 0.47] which means Node 1 dropped off from layer 2
c) Inference – Node 1 from layer 2 gets dropped of 30% times out of all the iterations.
d) Please note that the dropped nodes remain dropped for all samples in a batch. Network modifies/ drops different nodes only at
the start of a new batch.
5. This process of node 1 from layer 2 getting dropped 30% times reduces the z values which are present in layer 3 in similar proportion.
Hence, we try to upscale the z values in layer 3 using below method.
a) z3 ~ w3. a2 — z in layer 3 depends on activations in layer 2.
b) We divide the values of a2 with (1 – drop out ratio = 1 – 0.3 = 0.7) while computing z3.
• Train – Backward propagation – The same nodes that were disconnected in Forward propagation remain disconnected in backward
propagation.
• Test – No Change – All nodes remain connected. No nodes are dropped.
• Keras and Pytorch implements Inverse Dropout
Inverse Dropout – Example 1
Inverse Dropout
1. Drop out ratio of layer 2 is 0.1 (10%)
2. Nodes in layer 2 is 50
3. Layer 2 activation vector [aL2N1, aL2N2…………………….., aL2N50] = [0.23, 0.47……………………..,0.6]
4. Layer 2 mask vector [mL2N1, mL2N2…………………….., mL2N50] = [True, False…………………….., True]
5. Updated activation vector = [0.23, __, ……………………..,0.6]
6. This process of node 2 from layer 2 getting dropped 10% times reduces the z values which are present in layer 3 in similar proportion. Hence,
we try to upscale the z values in layer 3 using below method.
1. z3 ~ w3. a2
2. We divide the values of a2 with (1 – drop out ratio = 1 – 0.1 = 0.9) while computing z3.
3. Numeric example:
a) Drop out = 0.1
b) Nodes dropped = 0.1 * 50 = 5
c) aL2N2 gets dropped.
d) While computing z3, we divide the values of other a’s from layer 2 by 0.9.
• For example, aL2N50 becomes 0.6/0.9 = 0.66. 0.66 is 109% of 0.6 which means we scaled the value of aL2N50 by 9% (approx.)
• Inference – We made a reduction in 10% in number of nodes in layer 2. We compensated this loss by a proportionate
increase while computing parameters of layer 3.
Inverse Dropout – Example 2
1. Train - The nodes are disconnected as was the case in Inverse Drop out. The difference is we do not compensate for the loss of nodes
in case of Drop out.
2. The loss compensation happens at the time of Test Phase.
3. Test – Weights corresponding to the layer whose nodes are disconnected are upscaled.
4. In below example, nodes 2, 4 in layer 2 are disconnected in Training phase. During test phase, weights ranging from w1 through w8
are all divided by (1 – drop out ratio).
Dropout

More Related Content

Similar to Drop Out in Deep Learning

Conjugate Gradient Methods
Conjugate Gradient MethodsConjugate Gradient Methods
Conjugate Gradient MethodsMTiti1
 
ECET345 Signals and Systems Homework #4 Name of Stud.docx
ECET345 Signals and Systems Homework #4 Name of Stud.docxECET345 Signals and Systems Homework #4 Name of Stud.docx
ECET345 Signals and Systems Homework #4 Name of Stud.docxSALU18
 
Principle of Definite Integra - Integral Calculus - by Arun Umrao
Principle of Definite Integra - Integral Calculus - by Arun UmraoPrinciple of Definite Integra - Integral Calculus - by Arun Umrao
Principle of Definite Integra - Integral Calculus - by Arun Umraossuserd6b1fd
 
Definite Integral
Definite IntegralDefinite Integral
Definite IntegralArun Umrao
 
Electron diffraction: Tutorial with exercises and solutions (EMAT Workshop 2017)
Electron diffraction: Tutorial with exercises and solutions (EMAT Workshop 2017)Electron diffraction: Tutorial with exercises and solutions (EMAT Workshop 2017)
Electron diffraction: Tutorial with exercises and solutions (EMAT Workshop 2017)Joke Hadermann
 
Simplex part 2 of 4
Simplex part 2 of 4Simplex part 2 of 4
Simplex part 2 of 4Ed Dansereau
 
Nonlinear Systems and Control1. Sliding Mode Control Warm-Up W.docx
Nonlinear Systems and Control1. Sliding Mode Control Warm-Up W.docxNonlinear Systems and Control1. Sliding Mode Control Warm-Up W.docx
Nonlinear Systems and Control1. Sliding Mode Control Warm-Up W.docxhenrymartin15260
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksESCOM
 
Integer programming branch and bound
Integer programming   branch and boundInteger programming   branch and bound
Integer programming branch and boundAlejandro Angulo
 
Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)asghar123456
 
SAMPLE QUESTIONExercise 1 Consider the functionf (x,C).docx
SAMPLE QUESTIONExercise 1 Consider the functionf (x,C).docxSAMPLE QUESTIONExercise 1 Consider the functionf (x,C).docx
SAMPLE QUESTIONExercise 1 Consider the functionf (x,C).docxanhlodge
 
B61301007 matlab documentation
B61301007 matlab documentationB61301007 matlab documentation
B61301007 matlab documentationManchireddy Reddy
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIJMER
 

Similar to Drop Out in Deep Learning (20)

Deadbeat Response Design _8th lecture
Deadbeat Response Design _8th lectureDeadbeat Response Design _8th lecture
Deadbeat Response Design _8th lecture
 
Conjugate Gradient Methods
Conjugate Gradient MethodsConjugate Gradient Methods
Conjugate Gradient Methods
 
ECET345 Signals and Systems Homework #4 Name of Stud.docx
ECET345 Signals and Systems Homework #4 Name of Stud.docxECET345 Signals and Systems Homework #4 Name of Stud.docx
ECET345 Signals and Systems Homework #4 Name of Stud.docx
 
Principle of Definite Integra - Integral Calculus - by Arun Umrao
Principle of Definite Integra - Integral Calculus - by Arun UmraoPrinciple of Definite Integra - Integral Calculus - by Arun Umrao
Principle of Definite Integra - Integral Calculus - by Arun Umrao
 
Definite Integral
Definite IntegralDefinite Integral
Definite Integral
 
Electron diffraction: Tutorial with exercises and solutions (EMAT Workshop 2017)
Electron diffraction: Tutorial with exercises and solutions (EMAT Workshop 2017)Electron diffraction: Tutorial with exercises and solutions (EMAT Workshop 2017)
Electron diffraction: Tutorial with exercises and solutions (EMAT Workshop 2017)
 
Simplex part 2 of 4
Simplex part 2 of 4Simplex part 2 of 4
Simplex part 2 of 4
 
Nonlinear Systems and Control1. Sliding Mode Control Warm-Up W.docx
Nonlinear Systems and Control1. Sliding Mode Control Warm-Up W.docxNonlinear Systems and Control1. Sliding Mode Control Warm-Up W.docx
Nonlinear Systems and Control1. Sliding Mode Control Warm-Up W.docx
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networks
 
2. Springs (4).ppt
2. Springs (4).ppt2. Springs (4).ppt
2. Springs (4).ppt
 
Integer programming branch and bound
Integer programming   branch and boundInteger programming   branch and bound
Integer programming branch and bound
 
Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)Amth250 octave matlab some solutions (2)
Amth250 octave matlab some solutions (2)
 
SAMPLE QUESTIONExercise 1 Consider the functionf (x,C).docx
SAMPLE QUESTIONExercise 1 Consider the functionf (x,C).docxSAMPLE QUESTIONExercise 1 Consider the functionf (x,C).docx
SAMPLE QUESTIONExercise 1 Consider the functionf (x,C).docx
 
B61301007 matlab documentation
B61301007 matlab documentationB61301007 matlab documentation
B61301007 matlab documentation
 
project report(1)
project report(1)project report(1)
project report(1)
 
2. tutorial 2 memo
2. tutorial 2 memo2. tutorial 2 memo
2. tutorial 2 memo
 
Daa chapter8
Daa chapter8Daa chapter8
Daa chapter8
 
Implicit schemes for wave models
Implicit schemes for wave modelsImplicit schemes for wave models
Implicit schemes for wave models
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR Filters
 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
 

Recently uploaded

Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 

Recently uploaded (16)

Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 

Drop Out in Deep Learning

  • 1. • Train – Forward propagation 1. Assume, drop out ratio of layer 2 is 0.3 2. Assume, layer 2 activation vector [aL2N1, aL2N2 ] = [0.23, 0.47] 3. Bernoulli mask vector – A vector where every value follows Bernoulli distribution with probability of failure same as Drop out ratio. a) Assume layer 2 mask vector [mL2N1, mL2N2 ] = [False, True] 4. Updated activation vector = Activation vector (dot) Mask Vector. a) “Dot” operation will make corresponding values in activation vector disappear. b) Updated activation vector = [ __, 0.47] which means Node 1 dropped off from layer 2 c) Inference – Node 1 from layer 2 gets dropped of 30% times out of all the iterations. d) Please note that the dropped nodes remain dropped for all samples in a batch. Network modifies/ drops different nodes only at the start of a new batch. 5. This process of node 1 from layer 2 getting dropped 30% times reduces the z values which are present in layer 3 in similar proportion. Hence, we try to upscale the z values in layer 3 using below method. a) z3 ~ w3. a2 — z in layer 3 depends on activations in layer 2. b) We divide the values of a2 with (1 – drop out ratio = 1 – 0.3 = 0.7) while computing z3. • Train – Backward propagation – The same nodes that were disconnected in Forward propagation remain disconnected in backward propagation. • Test – No Change – All nodes remain connected. No nodes are dropped. • Keras and Pytorch implements Inverse Dropout Inverse Dropout – Example 1
  • 3. 1. Drop out ratio of layer 2 is 0.1 (10%) 2. Nodes in layer 2 is 50 3. Layer 2 activation vector [aL2N1, aL2N2…………………….., aL2N50] = [0.23, 0.47……………………..,0.6] 4. Layer 2 mask vector [mL2N1, mL2N2…………………….., mL2N50] = [True, False…………………….., True] 5. Updated activation vector = [0.23, __, ……………………..,0.6] 6. This process of node 2 from layer 2 getting dropped 10% times reduces the z values which are present in layer 3 in similar proportion. Hence, we try to upscale the z values in layer 3 using below method. 1. z3 ~ w3. a2 2. We divide the values of a2 with (1 – drop out ratio = 1 – 0.1 = 0.9) while computing z3. 3. Numeric example: a) Drop out = 0.1 b) Nodes dropped = 0.1 * 50 = 5 c) aL2N2 gets dropped. d) While computing z3, we divide the values of other a’s from layer 2 by 0.9. • For example, aL2N50 becomes 0.6/0.9 = 0.66. 0.66 is 109% of 0.6 which means we scaled the value of aL2N50 by 9% (approx.) • Inference – We made a reduction in 10% in number of nodes in layer 2. We compensated this loss by a proportionate increase while computing parameters of layer 3. Inverse Dropout – Example 2
  • 4. 1. Train - The nodes are disconnected as was the case in Inverse Drop out. The difference is we do not compensate for the loss of nodes in case of Drop out. 2. The loss compensation happens at the time of Test Phase. 3. Test – Weights corresponding to the layer whose nodes are disconnected are upscaled. 4. In below example, nodes 2, 4 in layer 2 are disconnected in Training phase. During test phase, weights ranging from w1 through w8 are all divided by (1 – drop out ratio). Dropout