Deep Learning Interview Questions And Answers | AI & Deep Learning Interview Questions | Simplilearn

Large volume of data on
cats and dogs Train the model Feature Extraction
Learns from large volumes of structured and unstructured data and uses complex
algorithms to train a neural network
Performs complex operations to extract hidden patterns and features
Classify the image
What is Deep Learning?
1

Human brain inspired systems
which replicate the way humans
learn
Consists of 3 layers of network
(input, hidden and output)
Each layer contains neurons
called as nodes that perform
various operations
Used in Deep Learning
algorithms like CNN, RNN,
GAN, etc
Input OutputHidden
Nodes
What is a Neural Network?
2

MultilayerPerceptron
• It has the same structure of a single layer
perceptron with one or more hidden layers.
• Except the input layer, each node in the other
layers uses a nonlinear activation function
• MLP uses supervised learning method called
backpropagation for training the model
• Single layer perceptron can classify only linearly
separable classes with binary output (0, 1) but MLP
can classify non-linear classes Multilayer Perceptron with an input layer, a
hidden layer and an output layer
What is a Multilayer Perceptron (MLP)?
3

Restructure the data and
improve integrity
Preprocessing step to
standardize the data
Used to reduce and eliminate data
redundancy
Rescale values to fit into a particular
range for achieving better convergence
1
2
3
4
What is Data Normalization and why do we need it?
4

Visible/Input
layer Hidden layer
Shallow, two-layer neural nets that make stochastic decisions
whether a neuron should be on or off
1st layer is the visible layer and 2nd layer is the hidden layer
Nodes are connected to each other across layers but no two nodes of
the same layer are connected. Hence it is also known as Restricted
Boltzmann Machine
What is a Boltzmann Machine?
5

 Activation Function decides whether a neuron should be fired or not
 Accepts the weighted sum of inputs and a bias as input to any activation function
 The node which gets fired depends on the Y value
 Step function, Sigmoid, ReLU, Tanh, Softmax are some of the examples of Activations Functions
Y = (Weight * input) + bias
Sigmoid function Step function ReLU function Tanh function
What is the role of Activation Functions in neural network?
6

 Cost Function is the measure to evaluate how good your model’s performance is
 It is also referred as loss or error
 Used to compute the error of the output layer during backpropagation
Mean Squared Error is an example of a popular cost function:
Cost Function: C = ½( Y – Y )
2^ Y -------> Original Output
Y -------> Predicted Output
^
What is a cost function?
7

 Gradient Descent is an optimization algorithm to minimize the cost function in order to maximize the performance of the
model
 Aim is to find the local or global minima of a function
 Determines the direction the model should take to reduce the error
Gradient
C
W
Global Minimum cost
What is Gradient Descent?
8

• Neural Network technique to minimize the
cost function
• Helps to improve the performance of the
network
• Backpropagates the error and updates the
weights to reduce the error
X
X
X
1
3
2
Y^
YW1
W2
W3
W4
W5
W6
W7
W8
W9
w10
W11
W12
W13
W14
W15
W16
W17
W18
W19
W20
W21
W22
W23
W24
W25
W26
What do you understand by Backpropagation?
9

Feedforward Neural Network
• Signals travel in one direction, from input to output
• No feedback (loops)
• Considers only the current input
• Cannot memorize pervious inputs (eg: CNN)
Feed Forward Neural
Network
1
0
What is the difference between Feedforward Neural Network and
Recurrent Neural Network?

Recurrent Neural Network
• Signals travel in both directions, making it a
looped network
• Considers the current input along with the
previously received inputs for generating the
output of a layer
• Has the ability to memorize past data due to
its internal memory
h
y
x
A
B
C
Recurrent Neural Network
1
0
What is the difference between Feedforward Neural Network and
Recurrent Neural Network?

Positive
Sentiment
RNN can be used for Sentiment Analysis and Text Mining
Getting up early in the morning is good for health.
What are some applications of Recurrent Neural Network?
1
1

RNN help you caption an image
Kids are playing football
1
1

Time series problem like predicting the prices of stocks in a month or
quarter or sale of products can be solved using RNN
1
1

• Softmax is an activation function that
generates the output between 0 and 1
• It divides each output such that the total
sum of the outputs is equal to 1
• It is often used in the output layers
1.2
0.9
0.4
0.46
0.34
0.20
Softmax
Softmax(Ln)=
|| e ||
L
Ln
e
What are Softmax and ReLU functions?1
2

• ReLU stands for Rectified Linear Unit and is
the most widely used activation function
• It gives an output of X if X is positive and 0
otherwise
• It is often used in the hidden layers
What are Softmax and ReLU functions?1
2

 A hyperparameter is a parameter whose value is set before the learning process begins
 Determines how a network is trained and the structure of the network
 Number of hidden units, learning rate, epochs, etc are some examples of hyperparameters
Model
Data
Optimized
Model
Train Compare
Modify
hyperparameters
What are hyperparameters?1
3

Learning rate too
low
Learning rate too
high
• Training of the model will
progress very slowly as
we are making very tiny
updates to the weights
• Will take many updates
before reaching the
minimum pointLearning rate
too low
Learning rate
too high
• Causes undesirable
divergent behavior to the
loss function due to
drastic updates in weights
• At times it may fail to
converge or even diverge
What will happen if learning rate is set too low or too high?1
4

 Dropout is a technique of dropping out hidden and visible units of a network randomly to prevent overfitting of data
 It doubles the number of iterations needed to converge the network
Standard Neural Network After Applying Dropout
What is Dropout and Batch Normalization?1
5

 Batch Normalization is the technique to improve the performance and stability of neural network
 The idea is to normalize the inputs in every layer so that they have mean output activation of zero and standard
deviation of one
Benefits
Higher learning
rates
Weights easier to
initialize
Provides
regularization
Trains faster
What is Dropout and Batch Normalization?1
5

Stochastic Gradient Descent
• Stochastic Gradient computes the gradient
using a single sample
• It converges much faster than batch gradient
because it updates weight more frequently
Batch Gradient Descent
• Batch gradient computes the gradient using
the entire dataset
• It takes time to converge because the
volume of data is huge and weights update
slowly
1
6
What is the difference between Batch Gradient Descent and
Stochastic Gradient Descent?

Overfitting
• Overfitting happens when a model learns the details
and noise in the training data to the degree that it
adversely impacts the execution of the model on new
information
• It is more likely to occur with nonlinear models that have
more flexibility when learning a target function
Explain Overfitting and Underfitting and how to combat them.1
7

Underfitting
• Underfitting alludes to a model that is neither well
trained on training data nor can generalize to new
information
• Usually happens when there is less and improper data
to train a model
• Has poor performance and accuracy
7

Combating Overfitting and Underfitting
• Resampling the data to estimate the model
accuracy (k-fold cross validation)
• Having a validation dataset to evaluate the
model
Sample 1 Sample 2 Sample 3
Original Data
7

• All the weights are set to 0. This makes
your model similar to a linear model
• All the neurons in every layer perform the
same operation, giving the same output
and making the deep net useless
Initializing all weights to zero:
Input Layer Hidden Layers Output Layer
W1=0
W2=0
W3=0
W4=0
.
.
.
.
.
.
W11=0
W12=0
W13=0
W14=0
.
.
.
.
.
.
weights are set to 0
w=np.zeros(layer_size[l],layer_size[l-1])
W21=0
W22=0
W23=0
W24=0
.
.
.
.
.
How are weights initialized in a network?1
8

Initializing all weights randomly:
Input Layer Hidden Layers Output Layer
W1=.2
W2=.1
W3=.1
1
W4=.3
.
.
.
.
.
.
weights are set randomly
w=np.random.randn(layer_size[l],layer_size[l-1])
• Here, the weights are assigned randomly
by initializing them very close to zero
• It gives better accuracy to the model since
every neuron performs different
computations
W11=.3
W12=.12
W13=.1
W14=.2
.
.
.
.
.
W21=.4
W22=.12
W23=.13
W24=.31
.
.
.
.
.
How are weights initialized in a network?1
8

Fully connected layer recognises and
classifies the objects in the image
Pooling is a down-sampling operation that
reduces the dimensionality of the feature
map
ReLU brings non-linearity to the network and
converts all the negative pixels to 0. Output is
rectified feature map
1. Convolution Layer 2. ReLU Layer
3. Pooling Layer 4. Fully Connected Layer
Convolution layer that performs the
convolution operation
What are the different layers in CNN?
1
9

Pooling
• Used to reduce the spatial dimensions of
a CNN
• Performs down-sampling operation to
reduce the dimensionality
• Creates a pooled feature map by sliding
a filter matrix over the input matrix
1
2
4
6
2 7
58
3 04
1 2 3 1
7
6 8
4 7
max pooling with 2x2
filters and stride 2
Max(3, 4, 1, 2) = 4
Pooled feature map
Rectified feature map
What is Pooling in CNN and how does it work?
20

LSTMs are special kind of Recurrent Neural Networks, capable of learning long-term dependencies. Remembering
information for long periods of time is their default behavior.
+x
tan
h
x
ta
nh
x
+x
tanh
x
tanh
x
+x
tan
h
x
ta
nh
x
ht-1 ht
ht+1
xt-1 xt Xt+1
A A
How does LSTM network work?
21

There are 3 steps in the process
Step 1 Step 2 Step 3
Decides what to
forget and what
to remember
Selectively
update cell state
values
Output certain
parts of cell state
Ct-1 Ct
21

Decides what to
forget and what
to remember
Selectively
update cell state
values
Output certain
parts of cell state
Ct-1 Ct
21

Decides what to
forget and what
to remember
Selectively
update cell state
values
Decides what
part of the current
state make it to
the output
Ct-1 Ct
21

While training a RNN, your slope can become either too small
or too large and this makes training difficult. When the slope
is too small, the problem is know as Vanishing gradient
Change in Y
Change in X
Y
X
Slope decreases gradually to a very small
value (sometimes negative) and makes
training difficult
What are Vanishing and Exploding gradients?
22

When the slope tends to grow exponentially instead of
decaying, this problem is called Exploding gradient
• Long training time
• Poor performance
• Low accuracy
Issues in Gradient Problem
Slope grows
exponentially
Y
X
What are Vanishing and Exploding gradients?
22

An Epoch represents one
iteration over the entire dataset
Epoch
We cannot pass the entire
dataset into the neural network
at once. So, we divide the
dataset into number of batches
Batch Iteration
If we have 10,000 images as
data and a batch size of 200,
then an epoch should run
10,000/200 = 50 iterations
23 What is the difference between Epoch, Batch and Iteration in Deep Learning?

Provides both C++ and
Python API’s that makes
it easier to work on
Has a faster compilation
time than other Deep
Learning libraries like
Keras and Torch
TensorFlow supports
both CPU’s and GPU’s
computing devices
Why TensorFlow is the most preferred library in Deep Learning?
24

Tensor is a mathematical object represented as arrays of higher dimensions. These arrays
of data with different dimensions and ranks that are fed as input to the neural network are
called Tensors
Input
Layer
Hidden
Layers
Output
Layer
1
6
8
3
9
3
3
4
1
7
4
9
1
5
3
7
1
6
9
2
Tensor of
Dimensions[5,4]
What do you mean by Tensor in TensorFlow?
25

Constants Variables
Variables allow us to add new trainable
parameters to graph. To define a variable, we
use tf.Variable() command and initialize them
before running the graph in a session.
Example:
W = tf.Variable([.3],dtype=tf.float32)
b = tf.Variable([-.3],dtype=tf.float32)
Constants are parameters whose value
does not change. To define a constant, we
use tf.constant() command
Example:
a = tf.constant(2.0, tf.float32)
b = tf.constant(3.0)
Print(a, b)
What are the programming elements in TensorFlow?
26

Constants Variables
Variables allow us to add new trainable
parameters to graph. To define a variable, we
use tf.Variable() command and initialize them
before running the graph in a session
Example:
W = tf.Variable([.3],dtype=tf.float32)
b = tf.Variable([-.3],dtype=tf.float32)
Constants are parameters whose value
does not change. To define a constant, we
use tf.constant() command.
Example:
a = tf.constant(2.0, tf.float32)
Print(a, b)
26

Placeholders
Placeholders allow us to feed data to a
tensorflow model from outside a model. It
permits a value to be assigned later. To define
a placeholder, we use tf.placeholder()
command
Example:
a = tf.placeholder(tf.float32)
b = a*2
with tf.Session() as sess:
result = sess.run(b,feed_dict={a:3.0})
print result
Session
A Session is run to evaluate the nodes. This
is called as the TensorFlow runtime.
Example:
a = tf.constant(2.0)
c = a+b
# Launch Session
sess = tf.Session()
# Evaluate the tensor c
print(sess.run(c))
26

Placeholders
Placeholders allow us to feed data to a
tensorflow model from outside a model. It
permits a value to be assigned later. To define
a placeholder, we use tf.placeholder()
command.
Example:
a = tf.placeholder(tf.float32)
b = a*2
with tf.Session() as sess:
result = sess.run(b,feed_dict={a:3.0})
print result
Session
A Session is run to evaluate the nodes. This
is called as the TensorFlow runtime
Example:
a = tf.constant(2.0)
c = a+b
# Launch Session
sess = tf.Session()
# Evaluate the tensor c
print(sess.run(c))
26

Everything in TensorFlow is based on creating a computational graph
It has a network of nodes where each node performs an operation
Nodes represent mathematical operation and edges represent tensors
Since data flows in the form a graph, it is also called as DataFlow Graph
What do you understand by a Computational Graph?
27

input
input
add
mul
mul
a
b
c
d
e
2
5
4
3
5
20
100
Nodes ----------> Mathematical operation
Edges ----------> Data flow
What do you understand by a Computational Graph?
27

Suppose, there is a wine shop that purchases wine from
dealers which they will resell later
Dealer Wine Shop owner
Explain Generative Adversarial Network along with an example.
28

But, there are some malefactor dealers who sell fake wine. In
this case, the shop owner should be able to distinguish
between fake and authentic wines
Forger Fake wine Shop owner
28

The Forger will try different techniques to sell fake wine and
make sure certain techniques go past the shop owner’s
check
28

The shop owner would probably get some feedback from
wine experts that some of his wine is not original. The owner
would have to improve how he determines whether a wine is
fake or authentic.
28

Goal of forger ------> to create wines that are indistinguishable from the authentic ones
Goal of shop owner ------> to accurately tell if a wine is real or not
28

There are 2 main components of GAN: Generator and Discriminator
Real authentic
wine dataset
(Forger) (Fake wine) (Shop owner)
Real
Fake
Noise
vector
Generator
Generated/Fake wine Discriminator
28

The Generator is a CNN that keeps producing images that are closer in
appearance to the real images while the Discriminator tries to determine the
difference between real and fake images. The ultimate aim is to make the
Discriminator learn to identify real and fake images/data.
28

Outputs are same as inputs
Input
Hidden
(encoding)
Output
It is a neural network that has 3 layers
The network’s target output is same as
the input. It uses dimensionality reduction
to restructure the input.
The network is trained to reconstruct its
inputs. Here, the input neurons are equal
to the output neurons
What is an auto-encoder?
29

It works by compressing the input to a latent-space representation and then
reconstructing the output from this representation.
What is an auto-encoder?
29

Train data
Randomly select data into the bags and
train your model separately
Bagging
Model 1 Model 2 Model 3 Model 4
TrainTrainTrain Train
Vote the best model
Bag 1 Bag 2 Bag 3 Bag 4
 Bagging and Boosting are ensemble techniques where the idea is to train multiple models using the same
learning algorithm and then take a call
Dataset
Test data
What is Bagging and Boosting?
30

 In Boosting, the emphasis is to select the data points which give wrong output in order the improve the accuracy
Boosting
Train
Train
Bag 1
Repeat
Model 1
Data points with
wrong prediction
Model 2
Bag 2
Train data
Dataset
Test data
What is Bagging and Boosting?
30

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview Questions | Simplilearn

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview Questions | Simplilearn

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Learning Interview Questions And Answers | AI & Deep Learning Interview Questions | Simplilearn

Similar to Deep Learning Interview Questions And Answers | AI & Deep Learning Interview Questions | Simplilearn (20)

More from Simplilearn

More from Simplilearn (20)

Recently uploaded

Recently uploaded (20)

Deep Learning Interview Questions And Answers | AI & Deep Learning Interview Questions | Simplilearn

Editor's Notes