•13 gefällt mir•11,327 views

Downloaden Sie, um offline zu lesen

Melden

Introduction to Deep Learning

- 1. Introduction to Deep Learning RATNAKAR PANDEY
- 2. Is Artificial Intelligence, Machine Learning and Deep Learning the same thing? What about Data Science?
- 4. Artificial Intelligence • AI is any technique, code or algorithm that enables machines to develop, demonstrate and mimic human cognitive behavior or intelligence and hence the name “Artificial Intelligence” • AI doesn’t mean that everything machines will be doing, rather AI can be better represented as “Augmented Intelligence”, i.e. Man+Machine to solve business problems better and faster • AI won’t replace managers, but managers who use AI will replace those who don’t. • Some of the most successful applications of AI around us can be seen in Robotics, Computer Vision, Virtual Reality, Speech Recognition, Automation, Gaming and so on…
- 5. Machine Learning • Machine learning is the sub field of AI, which gives machines the ability to improve its performance over time without explicit intervention or help from the human being • In this approach machines are shown thousands or millions of examples and trained how to correctly solve a problem. • Most of the current applications of the machine learning leverage supervised learning • Other usage of ML can be broadly classified between unsupervised learning and reinforced learning. Source: https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence
- 6. Data Science • Data Science is a field which intersects AI, Machine Learning and Deep Learning and enables statistically driven decision making. • Data science is the Art and Science of drawing actionable insights from the data. • Data Science + Business Knowledge = Impact/Value Creation for the Business. • Generally speaking, Data Scientists and Analytics Professionals try to answer following questions via their analysis- • Descriptive Analytics ( What has happened?) • Diagnostic Analytics ( Why it has happened?) • Predictive Analytics ( What may happen in future?) • Prescriptive Analytics ( What plan of action we should follow?)
- 7. Deep Learning • Deep learning is a sub field of Machine Learning that very closely tries to mimic human brain's working using neurons. • These techniques focus on building Artificial Neural Networks (ANN) using several hidden layers. • There are variety of deep learning networks such as Multilayer Perceptron ( MLP), Autoencoders (AE), Convolution Neural Network (CNN), Recurrent Neural Network (RNN) Source: https://www.quora.com/What-are-the-types-of-deep-neural-networks-and-how-can-one-categorize-them-and-their-related-algorithms-as- either-shallow-or-deep/answer/Ratnakar-Pandey-RP
- 8. Why Deep Learning is Growing • Processing power needed for Deep learning is readily becoming available using GPUs, Distributed Computing and powerful CPUs • Moreover, as the data amount grows, Deep Learning models seem to outperform Machine Learning models • Explosion of features and datasets • Focus on customization and real time decisioning
- 9. Why Deep Learning is Growing • Uncover hard to detect patterns (using traditional techniques) when the incidence rate is low • Find latent features (super variables) without significant manual feature engineering • Real time fraud detection and self learning models using streaming data (KAFKA, MapR) • Ensure consistent customer experience and regulatory compliance • Higher operational efficiency 10,000 + Features Unstructured Transactional Social Device & IP Third Parties Bureau
- 10. Challenges with Deep Learning • Works better with large amount of data • Some models are very hard to train, may take weeks or months • Overfitting • Black box and hence may have regulatory challenges, particularly for BFSI
- 12. Deep Learning Building Blocks
- 13. Multilayer Perceptron (MLP) • These are the most basic networks and feed forward the inputs to create output. They consist of an input layer and an output layer and many interconnected hidden layers and neurons between the input and the output layers. • They generally use some non linear activation function such as Relu or Tanh and compute the losses ( the difference between the true output and computed output) such as Mean Square Error ( MSE), Logloss. • This loss is backward propagated to adjust the weights and training to minimize the losses or make the models more accurate. w1 w2 wn A c t i v a t i o n Activation Function Inputs Weights Bias
- 14. Key Components and Hyperparameters • Number of layers- Input layer, output layer and hidden layers. More the number of layers, deeper the network. • Number of Neurons- how many neurons in each layer. Input layer neurons depend of the number of features, output layer neurons on number of outputs and hidden layer neurons need to be optimized • Weights- importance given to each factor in computing the output. Typically chosen randomly in the first run and optimized using backward propagation. • Activation Function- Function used to generate outputs by matrix multiplication of inputs and weights along with bias • Forward Propagation- Weights for each input are initialized to make predictions and compute error. Output from each layer is fed forward to the next layer. • Loss Function- To compute error between actual and prediction values and measure models performance. Hyperparameters are fine tuned to minimize the loss function. Some common loss functions are- Mean Square Error, Log loss, Cross entropy,
- 15. Popular Activation Functions Most of the activation functions are non-linear as most of the real world problems are non linear Source: https://en.wikipedia.org/wiki/Activation_function
- 16. Key Components and Hyperparameters • Backpropagation- Back propagate the error (starting from the output layer) to the previous layer and update weights • Gradient Descent and Optimization Algorithms- Used for optimize weights based on the error signal backward propagated and chain rules • Epochs- One complete set of feedforward and back propagation to train the entire network. • Batch Size- No of input observation which are processed in one epoch. • Dropout- x% of nodes are dropped out to ensure weight regularization and overfitting and leverage community effects of neuron, rather than dependence on few players • Optimizer and Learning Rate- Optimizer are used to optimize learning rates by Stochastic Gradient Descent (SGD) and find the best solution. If network learns very fast, it may find suboptimal solutions If it learns very slow, it will take very long to train a network. Common optimizers are Adam, SGD, RMSprop etc.
- 17. Autoencoders • Autoencoders follow “Representation Learning” • The concept of the AE is quite simple- here input vectors are used to compute the output vectors, but output vectors are same as the input vectors. • The reconstruction error is computed and data points with the higher reconstruction error are supposed to be outliers • AE are used for unsupervised learning, feature reduction, speech and image recognition. w1 w2 wn
- 18. Convolution Neural Network (CNN) • Convolution Neural Networks (CNN) significantly enhances the capabilities of the feed forward network such as MLP by inserting convolution layers. • They are particularly suitable for spatial data, object recognition and image analysis using multidimensional neurons structures. • CNNs use convolutions ( a linear operation) rather than matrix multiplication as in MLP • Typically a CNN will have three stages- convolution stage, detector layer ( non linear activator) and pooling layer w1 w2 wn
- 19. Convolution Neural Network (CNN) • Convolution Layer- The most important component in the CNN. The layer has Kernels ( learnable filters) and the input x and y dimensions are convoluted ( dot product) to generate feature map • Detector Layer- The feature maps are passed to this stage using a not linear activation function such as ReLU activation function to accentuate the non linear components of the feature maps • Pooling Layer- A pooling layer such as “max pooling” summarizes (sub-sampling) the responses from several inputs from the previous layer and serves to reduce the size of the spatial representation. Allowing the next layer to look at bigger region w1 w2 wn Source : MIT Deeplearningbook
- 20. Recurrent Neural Network(RNN) • RNNs are also a feed forward network, however with recurrent memory loops which take the input from the previous and/or same layers or states. • This gives them a unique capability to model along the time dimension and arbitrary sequence of events and inputs. • RNNs are used for sequenced data analysis such as time-series, sentiment analysis, NLP, language translation, speech recognition, image captioning, and script recognition among other things. • These are also called networks with the memory, as the previous inputs or states may persist (stored) in the model to do a sequential analysis. These memories become an input as well w1 w2 wn
- 21. Recurrent Neural Network(RNN) • Long Short Term Memory is one of the most frequently ( LSTM) used RNN model • These sort of models help us overcome the NLP challenges which can’t be solved by “Bag of Words” analysis - “ The flight was good, not bad at all” vs “ The flight was bad, not good at all” w1 w2 wn