Deep learning is a subset of machine learning that uses artificial neural networks. Neural networks are composed of interconnected layers of nodes that process input data. Activation functions introduce non-linearity between layers to increase the model's ability to learn complex patterns. Models are trained via backpropagation to minimize loss by adjusting weights to better match predictions to actual outputs. Overfitting can occur if the model becomes too complex for the data.
2. • Artificial Neural Networks (ANN): Deep
learning is a subset of machine learning that
uses artificial neural networks to learn from
data.
• Layers: ANNs are composed of layers of
interconnected nodes or neurons. The input
layer receives the data, the output layer
produces the predictions, and the hidden
layers perform the computations in between.
3. • Activation functions: Activation functions are applied
to the outputs of each layer to introduce non-linearity
and increase the model's expressiveness. Common
activation functions include ReLU, sigmoid, and tanh.
• Backpropagation: This is a method for training neural
networks by iteratively adjusting the weights in each
layer to minimize the difference between the predicted
outputs and the actual outputs. It works by
propagating the error backwards from the output layer
to the input layer and adjusting the weights accordingly
4. • Loss function: The loss function measures the
difference between the predicted outputs and the
actual outputs. The goal of training is to minimize the
loss function by adjusting the weights in each layer.
• Optimization algorithms: Optimization algorithms,
such as stochastic gradient descent (SGD) and Adam,
are used to adjust the weights in each layer during
training to minimize the loss function.
• Overfitting: Overfitting occurs when a model is too
complex and starts to memorize the training data
instead of learning the underlying patterns. This can be
prevented by using techniques
5. Activation Functions in Artificial Neural
Networks (ANNs):
• Activation functions are mathematical functions
used in ANNs to introduce non-linearity into the
output of a neuron or a layer.
• They are typically applied to the weighted sum of
inputs and biases before being passed through to
the next layer of the network.
• Without activation functions, ANNs would simply
be a linear regression model, which can only
model linear relationships between input and
output.
6. • Common activation functions include Sigmoid, Tanh, ReLU, and
Softmax.
• Sigmoid and Tanh functions are sigmoidal, meaning they produce
an S-shaped curve. They are used to squash the output of a neuron
to a range between 0 and 1 or -1 and 1, respectively.
• ReLU (Rectified Linear Unit) function is non-sigmoidal and is defined
as f(x) = max(0, x). It is one of the most commonly used activation
functions due to its simplicity and effectiveness.
• Softmax function is used in the output layer of a network to
produce a probability distribution over multiple classes.
• Choosing the right activation function can have a significant impact
on the performance of a neural network, and it is often an area of
active research.
7. Non Linearity
• In the context of machine learning, nonlinearity is
an important property of neural networks. Neural
networks are composed of many interconnected
processing units (neurons), which apply a
nonlinear activation function to their inputs
before passing them to the next layer of the
network. This nonlinearity allows neural
networks to model complex relationships
between inputs and outputs, and to learn
representations that are not directly observable
in the input data.
13. • For example, a linear model can only learn a linear
decision boundary between two classes, which may
not be sufficient to accurately classify complex data. In
contrast, a nonlinear model such as a neural network
can learn more complex decision boundaries that can
better separate the classes.
• Some common nonlinear activation functions used in
neural networks include the Rectified Linear Unit
(ReLU), sigmoid, tanh, and others. These functions
introduce nonlinearity into the network, allowing it to
learn more complex representations of the input data.
14. Rectified Linear Unit (ReLU) activation
function:
• ReLU is a non-linear activation function
commonly used in neural networks.
• It takes an input value x and returns the
maximum of 0 and x as the output value.
• The formula for ReLU is f(x) = max(0, x).
• ReLU is computationally efficient, since it
requires only simple thresholding of the input
value, compared to other activation functions
like sigmoid or tanh.
15. Sigmoid
• Sigmoid is a non-linear activation function commonly used in neural networks.
• It takes an input value x and maps it to a range between 0 and 1 using the formula
f(x) = 1 / (1 + e^-x).
• The output of the sigmoid function can be interpreted as a probability or
likelihood, since it always produces a value between 0 and 1.
• Sigmoid was one of the earliest activation functions used in neural networks, and
it is still used in some applications, such as logistic regression.
• Sigmoid is smooth and differentiable, which makes it useful for backpropagation
and gradient descent optimization.
• One limitation of sigmoid is that it suffers from the "vanishing gradient" problem,
where the gradient becomes very small as the input value becomes very large or
very small, making it difficult for the network to learn.
• Another limitation of sigmoid is that it is not zero-centered, which can slow down
the convergence of gradient descent.
• Due to these limitations, sigmoid is not as commonly used as other activation
functions like ReLU or its variants.
•
16.
17.
18. Tanh
• Tanh is a non-linear activation function commonly used in neural networks.
• It takes an input value x and maps it to a range between -1 and 1 using the formula f(x) = (e^x - e^-
x) / (e^x + e^-x).
• Tanh is a shifted and rescaled version of the sigmoid function, with the output value zero-centered.
• Like sigmoid, tanh is smooth and differentiable, which makes it useful for backpropagation and
gradient descent optimization.
• Tanh is often used in the hidden layers of neural networks, especially in recurrent neural networks
(RNNs) and long short-term memory (LSTM) networks.
• One limitation of tanh is that it also suffers from the "vanishing gradient" problem, where the
gradient becomes very small as the input value becomes very large or very small, making it difficult
for the network to learn.
• Another limitation of tanh is that it is more computationally expensive than ReLU or its variants,
since it involves exponentials.
• Despite its limitations, tanh can be useful in certain situations, such as when the input data is
standardized and zero-centered, or when the network needs to model both positive and negative
values.
19. Difference between Activation
Function and ML Algorithm
• An activation function is a mathematical function used in artificial neural networks
to introduce non-linearity into the output of a neuron.
• Activation functions are used to decide whether the neuron should be activated or
not based on the input it receives.
• Common activation functions include sigmoid, ReLU, tanh, and softmax.
• On the other hand, a machine learning algorithm is a method or set of methods
used to learn patterns and relationships
• in data in order to make predictions or decisions. Machine learning algorithms can
be supervised, unsupervised,
• or semi-supervised, and can be used for a wide range of tasks, such as regression,
classification, clustering, and reinforcement learning.
• While activation functions are used in neural networks to introduce non-linearity
and make them more expressive, machine learning algorithms are used to learn
patterns and relationships in data and make predictions based on that learning.
20. Why Accuracy of ML is better than DL
• Complexity: ML models are often simpler and more
interpretable than ANNs, which can make them easier to
train and optimize. In some cases, a simpler model may be
sufficient to achieve good performance on a given task,
without the need for a complex neural network.
• Data size: ANNs require large amounts of data to train
effectively, and may not perform well on small datasets. In
contrast, some ML models, such as decision trees or logistic
regression, can perform well even on smaller datasets.
• Feature engineering: ANNs often require extensive feature
engineering and preprocessing of input data, which can be
time-consuming and require domain expertise. In contrast,
some ML models, such as decision trees or Naive Bayes,
can perform well with minimal feature engineering.
21. • Model selection: Choosing the right ANN architecture and
hyperparameters can be a challenging task, and may require
extensive experimentation and tuning. In contrast, some ML
models, such as decision trees or Naive Bayes, have fewer
hyperparameters to tune and may be easier to select and optimize.
• Overfitting: ANNs are prone to overfitting, where the model
becomes too complex and performs well on the training data but
poorly on new, unseen data. ML models may be less prone to
overfitting, especially when regularized or constrained in some way.
• Overall, the choice of model depends on the specific task and
dataset at hand, and there is no one-size-fits-all solution. In some
cases, an ML model may be more suitable, while in other cases, an
ANN may be necessary to achieve the desired performance