1. The document discusses Convolutional Neural Networks (CNNs) for object recognition and scene understanding. It covers the biological inspiration from the human visual cortex, classical computer vision techniques, and the foundations of CNNs including LeNet and learning visual features.
2. CNNs apply successive layers of convolutions, nonlinear activations, and pooling to learn hierarchical representations of images. Modern CNN architectures have millions of parameters and dozens of layers to learn increasingly complex features.
3. CNNs have countless applications in areas like image classification, segmentation, detection, generation, and more due to their general architecture for learning spatial hierarchies of features from data.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
classify images from the CIFAR-10 dataset. The dataset consists of airplanes, dogs, cats, and other objects.we'll preprocess the images, then train a convolutional neural network on all the samples. The images need to be normalized and the labels need to be one-hot encoded.
Summary:
There are three parts in this presentation.
A. Why do we need Convolutional Neural Network
- Problems we face today
- Solutions for problems
B. LeNet Overview
- The origin of LeNet
- The result after using LeNet model
C. LeNet Techniques
- LeNet structure
- Function of every layer
In the following Github Link, there is a repository that I rebuilt LeNet without any deep learning package. Hope this can make you more understand the basic of Convolutional Neural Network.
Github Link : https://github.com/HiCraigChen/LeNet
LinkedIn : https://www.linkedin.com/in/YungKueiChen
In machine learning, a convolutional neural network is a class of deep, feed-forward artificial neural networks that have successfully been applied fpr analyzing visual imagery.
The document discusses convolutional neural networks (CNNs). It begins with an introduction and overview of CNN components like convolution, ReLU, and pooling layers. Convolution layers apply filters to input images to extract features, ReLU introduces non-linearity, and pooling layers reduce dimensionality. CNNs are well-suited for image data since they can incorporate spatial relationships. The document provides an example of building a CNN using TensorFlow to classify handwritten digits from the MNIST dataset.
This document provides an agenda for a presentation on deep learning, neural networks, convolutional neural networks, and interesting applications. The presentation will include introductions to deep learning and how it differs from traditional machine learning by learning feature representations from data. It will cover the history of neural networks and breakthroughs that enabled training of deeper models. Convolutional neural network architectures will be overviewed, including convolutional, pooling, and dense layers. Applications like recommendation systems, natural language processing, and computer vision will also be discussed. There will be a question and answer section.
This document discusses convolutional neural networks (CNNs). It explains that CNNs were inspired by research on the human visual system and take a similar approach to teach computers to identify objects in images. The document outlines the key components of CNNs, including convolutional and pooling layers to extract features from images, as well as fully connected layers to classify objects. It also notes that CNNs take pixel data as input and use many examples to generalize and make predictions, similar to how humans learn visual recognition.
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
This Deep Learning Presentation will help you in understanding what is Deep learning, why do we need Deep learning, applications of Deep Learning along with a detailed explanation on Neural Networks and how these Neural Networks work. Deep learning is inspired by the integral function of the human brain specific to artificial neural networks. These networks, which represent the decision-making process of the brain, use complex algorithms that process data in a non-linear way, learning in an unsupervised manner to make choices based on the input. This Deep Learning tutorial is ideal for professionals with beginners to intermediate levels of experience. Now, let us dive deep into this topic and understand what Deep learning actually is.
Below topics are explained in this Deep Learning Presentation:
1. What is Deep Learning?
2. Why do we need Deep Learning?
3. Applications of Deep Learning
4. What is Neural Network?
5. Activation Functions
6. Working of Neural Network
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms.
There is booming demand for skilled deep learning engineers across a wide range of industries, making this deep learning course with TensorFlow training well-suited for professionals at the intermediate to advanced level of experience. We recommend this deep learning online course particularly for the following professionals:
1. Software engineers
2. Data scientists
3. Data analysts
4. Statisticians with an interest in deep learning
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
classify images from the CIFAR-10 dataset. The dataset consists of airplanes, dogs, cats, and other objects.we'll preprocess the images, then train a convolutional neural network on all the samples. The images need to be normalized and the labels need to be one-hot encoded.
Summary:
There are three parts in this presentation.
A. Why do we need Convolutional Neural Network
- Problems we face today
- Solutions for problems
B. LeNet Overview
- The origin of LeNet
- The result after using LeNet model
C. LeNet Techniques
- LeNet structure
- Function of every layer
In the following Github Link, there is a repository that I rebuilt LeNet without any deep learning package. Hope this can make you more understand the basic of Convolutional Neural Network.
Github Link : https://github.com/HiCraigChen/LeNet
LinkedIn : https://www.linkedin.com/in/YungKueiChen
In machine learning, a convolutional neural network is a class of deep, feed-forward artificial neural networks that have successfully been applied fpr analyzing visual imagery.
The document discusses convolutional neural networks (CNNs). It begins with an introduction and overview of CNN components like convolution, ReLU, and pooling layers. Convolution layers apply filters to input images to extract features, ReLU introduces non-linearity, and pooling layers reduce dimensionality. CNNs are well-suited for image data since they can incorporate spatial relationships. The document provides an example of building a CNN using TensorFlow to classify handwritten digits from the MNIST dataset.
This document provides an agenda for a presentation on deep learning, neural networks, convolutional neural networks, and interesting applications. The presentation will include introductions to deep learning and how it differs from traditional machine learning by learning feature representations from data. It will cover the history of neural networks and breakthroughs that enabled training of deeper models. Convolutional neural network architectures will be overviewed, including convolutional, pooling, and dense layers. Applications like recommendation systems, natural language processing, and computer vision will also be discussed. There will be a question and answer section.
This document discusses convolutional neural networks (CNNs). It explains that CNNs were inspired by research on the human visual system and take a similar approach to teach computers to identify objects in images. The document outlines the key components of CNNs, including convolutional and pooling layers to extract features from images, as well as fully connected layers to classify objects. It also notes that CNNs take pixel data as input and use many examples to generalize and make predictions, similar to how humans learn visual recognition.
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
This Deep Learning Presentation will help you in understanding what is Deep learning, why do we need Deep learning, applications of Deep Learning along with a detailed explanation on Neural Networks and how these Neural Networks work. Deep learning is inspired by the integral function of the human brain specific to artificial neural networks. These networks, which represent the decision-making process of the brain, use complex algorithms that process data in a non-linear way, learning in an unsupervised manner to make choices based on the input. This Deep Learning tutorial is ideal for professionals with beginners to intermediate levels of experience. Now, let us dive deep into this topic and understand what Deep learning actually is.
Below topics are explained in this Deep Learning Presentation:
1. What is Deep Learning?
2. Why do we need Deep Learning?
3. Applications of Deep Learning
4. What is Neural Network?
5. Activation Functions
6. Working of Neural Network
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms.
There is booming demand for skilled deep learning engineers across a wide range of industries, making this deep learning course with TensorFlow training well-suited for professionals at the intermediate to advanced level of experience. We recommend this deep learning online course particularly for the following professionals:
1. Software engineers
2. Data scientists
3. Data analysts
4. Statisticians with an interest in deep learning
Residual neural networks (ResNets) solve the vanishing gradient problem through shortcut connections that allow gradients to flow directly through the network. The ResNet architecture consists of repeating blocks with convolutional layers and shortcut connections. These connections perform identity mappings and add the outputs of the convolutional layers to the shortcut connection. This helps networks converge earlier and increases accuracy. Variants include basic blocks with two convolutional layers and bottleneck blocks with three layers. Parameters like number of layers affect ResNet performance, with deeper networks showing improved accuracy. YOLO is a variant that replaces the softmax layer with a 1x1 convolutional layer and logistic function for multi-label classification.
Convolutional neural networks (CNNs) are a type of neural network designed to process images. CNNs use a series of convolution and pooling layers to extract features from images. Convolution multiplies the image with filters to produce feature maps, while pooling reduces the size of the representation to reduce computation. This process allows the network to learn increasingly complex features from the input image and classify it. CNNs have applications in areas like facial recognition, document analysis, and image classification.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make train-
ing faster, we used non-saturating neurons and a very efficient GPU implemen-
tation of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry.
Convolutional Neural Network (CNN) is a type of neural network that can take in an input image, assign importance to areas in the image, and distinguish objects in the image. CNNs use convolutional layers and pooling layers, which help introduce translation invariance to allow the network to recognize patterns and objects regardless of their position in the visual field. CNNs have been very effective for tasks involving visual imagery like image classification but may be less effective for natural language processing tasks that rely more on word order and sequence. Recurrent neural networks (RNNs) that can model sequential data may perform better than CNNs for some natural language processing tasks like text classification.
This document provides an overview of deep learning, including definitions of AI, machine learning, and deep learning. It discusses neural network models like artificial neural networks, convolutional neural networks, and recurrent neural networks. The document explains key concepts in deep learning like activation functions, pooling techniques, and the inception model. It provides steps for fitting a deep learning model, including loading data, defining the model architecture, adding layers and functions, compiling, and fitting the model. Examples and visualizations are included to demonstrate how neural networks work.
This document provides an overview of convolutional neural networks (CNNs). It defines CNNs as multiple layer feedforward neural networks used to analyze visual images by processing grid-like data. CNNs recognize images through a series of layers, including convolutional layers that apply filters to detect patterns, ReLU layers that apply an activation function, pooling layers that detect edges and corners, and fully connected layers that identify the image. CNNs are commonly used for applications like image classification, self-driving cars, activity prediction, video detection, and conversion applications.
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...Edureka!
The document discusses autoencoders, an unsupervised machine learning technique where the target values are equal to the inputs. It covers the need for autoencoders, their properties and training, different types including convolutional, sparse, and deep autoencoders, and applications such as dimensionality reduction, denoising images, and watermark removal.
This document provides an introduction to deep learning in medical imaging. It explains that artificial neural networks are modeled after biological neurons and use multiple hidden layers to approximate complex functions. Convolutional neural networks are commonly used for image data, applying filters over images to extract features. Modern deep learning platforms perform cross-correlation instead of convolution for efficiency. The key process for improving deep learning models is backpropagation, which calculates the gradient of the loss function to update weights and biases in a direction that reduces loss. Deep learning has applications in medical imaging modalities like MRI, ultrasound, CT, and PET.
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Artificial Neural Network seminar presentation using ppt.Mohd Faiz
- Artificial neural networks are inspired by biological neural networks and learning processes. They attempt to mimic the workings of the brain using simple units called artificial neurons that are connected in networks.
- Learning in neural networks involves modifying the synaptic strengths between neurons through mathematical optimization techniques. The goal is to minimize an error function that measures how well the network can approximate or complete a task.
- Neural networks can learn complex nonlinear functions through training algorithms like backpropagation that determine how to adjust the synaptic weights to improve performance on the learning task.
Handwritten digit recognition uses convolutional neural networks to recognize handwritten digits from images. The MNIST dataset, containing 60,000 training images and 10,000 test images of handwritten digits, is used to train models. Convolutional neural network architectures for this task typically involve convolutional layers to extract features, followed by flatten and dense layers to classify digits. When trained on the MNIST dataset, convolutional neural networks can accurately recognize handwritten digits in test images.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This document provides an overview of convolutional neural networks and summarizes four popular CNN architectures: AlexNet, VGG, GoogLeNet, and ResNet. It explains that CNNs are made up of convolutional and subsampling layers for feature extraction followed by dense layers for classification. It then briefly describes key aspects of each architecture like ReLU activation, inception modules, residual learning blocks, and their performance on image classification tasks.
The document provides an overview of convolutional neural networks (CNNs) and their layers. It begins with an introduction to CNNs, noting they are a type of neural network designed to process 2D inputs like images. It then discusses the typical CNN architecture of convolutional layers followed by pooling and fully connected layers. The document explains how CNNs work using a simple example of classifying handwritten X and O characters. It provides details on the different layer types, including convolutional layers which identify patterns using small filters, and pooling layers which downsample the inputs.
Deep learning is introduced along with its applications and key players in the field. The document discusses the problem space of inputs and outputs for deep learning systems. It describes what deep learning is, providing definitions and explaining the rise of neural networks. Key deep learning architectures like convolutional neural networks are overviewed along with a brief history and motivations for deep learning.
This document provides an overview of convolutional neural networks (CNNs). It explains that CNNs are a type of neural network that has been successfully applied to analyzing visual imagery. The document then discusses the motivation and biology behind CNNs, describes common CNN architectures, and explains the key operations of convolution, nonlinearity, pooling, and fully connected layers. It provides examples of CNN applications in computer vision tasks like image classification, object detection, and speech recognition. Finally, it notes several large tech companies that utilize CNNs for features like automatic tagging, photo search, and personalized recommendations.
Residual neural networks (ResNets) solve the vanishing gradient problem through shortcut connections that allow gradients to flow directly through the network. The ResNet architecture consists of repeating blocks with convolutional layers and shortcut connections. These connections perform identity mappings and add the outputs of the convolutional layers to the shortcut connection. This helps networks converge earlier and increases accuracy. Variants include basic blocks with two convolutional layers and bottleneck blocks with three layers. Parameters like number of layers affect ResNet performance, with deeper networks showing improved accuracy. YOLO is a variant that replaces the softmax layer with a 1x1 convolutional layer and logistic function for multi-label classification.
Convolutional neural networks (CNNs) are a type of neural network designed to process images. CNNs use a series of convolution and pooling layers to extract features from images. Convolution multiplies the image with filters to produce feature maps, while pooling reduces the size of the representation to reduce computation. This process allows the network to learn increasingly complex features from the input image and classify it. CNNs have applications in areas like facial recognition, document analysis, and image classification.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make train-
ing faster, we used non-saturating neurons and a very efficient GPU implemen-
tation of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry.
Convolutional Neural Network (CNN) is a type of neural network that can take in an input image, assign importance to areas in the image, and distinguish objects in the image. CNNs use convolutional layers and pooling layers, which help introduce translation invariance to allow the network to recognize patterns and objects regardless of their position in the visual field. CNNs have been very effective for tasks involving visual imagery like image classification but may be less effective for natural language processing tasks that rely more on word order and sequence. Recurrent neural networks (RNNs) that can model sequential data may perform better than CNNs for some natural language processing tasks like text classification.
This document provides an overview of deep learning, including definitions of AI, machine learning, and deep learning. It discusses neural network models like artificial neural networks, convolutional neural networks, and recurrent neural networks. The document explains key concepts in deep learning like activation functions, pooling techniques, and the inception model. It provides steps for fitting a deep learning model, including loading data, defining the model architecture, adding layers and functions, compiling, and fitting the model. Examples and visualizations are included to demonstrate how neural networks work.
This document provides an overview of convolutional neural networks (CNNs). It defines CNNs as multiple layer feedforward neural networks used to analyze visual images by processing grid-like data. CNNs recognize images through a series of layers, including convolutional layers that apply filters to detect patterns, ReLU layers that apply an activation function, pooling layers that detect edges and corners, and fully connected layers that identify the image. CNNs are commonly used for applications like image classification, self-driving cars, activity prediction, video detection, and conversion applications.
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...Edureka!
The document discusses autoencoders, an unsupervised machine learning technique where the target values are equal to the inputs. It covers the need for autoencoders, their properties and training, different types including convolutional, sparse, and deep autoencoders, and applications such as dimensionality reduction, denoising images, and watermark removal.
This document provides an introduction to deep learning in medical imaging. It explains that artificial neural networks are modeled after biological neurons and use multiple hidden layers to approximate complex functions. Convolutional neural networks are commonly used for image data, applying filters over images to extract features. Modern deep learning platforms perform cross-correlation instead of convolution for efficiency. The key process for improving deep learning models is backpropagation, which calculates the gradient of the loss function to update weights and biases in a direction that reduces loss. Deep learning has applications in medical imaging modalities like MRI, ultrasound, CT, and PET.
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Artificial Neural Network seminar presentation using ppt.Mohd Faiz
- Artificial neural networks are inspired by biological neural networks and learning processes. They attempt to mimic the workings of the brain using simple units called artificial neurons that are connected in networks.
- Learning in neural networks involves modifying the synaptic strengths between neurons through mathematical optimization techniques. The goal is to minimize an error function that measures how well the network can approximate or complete a task.
- Neural networks can learn complex nonlinear functions through training algorithms like backpropagation that determine how to adjust the synaptic weights to improve performance on the learning task.
Handwritten digit recognition uses convolutional neural networks to recognize handwritten digits from images. The MNIST dataset, containing 60,000 training images and 10,000 test images of handwritten digits, is used to train models. Convolutional neural network architectures for this task typically involve convolutional layers to extract features, followed by flatten and dense layers to classify digits. When trained on the MNIST dataset, convolutional neural networks can accurately recognize handwritten digits in test images.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This document provides an overview of convolutional neural networks and summarizes four popular CNN architectures: AlexNet, VGG, GoogLeNet, and ResNet. It explains that CNNs are made up of convolutional and subsampling layers for feature extraction followed by dense layers for classification. It then briefly describes key aspects of each architecture like ReLU activation, inception modules, residual learning blocks, and their performance on image classification tasks.
The document provides an overview of convolutional neural networks (CNNs) and their layers. It begins with an introduction to CNNs, noting they are a type of neural network designed to process 2D inputs like images. It then discusses the typical CNN architecture of convolutional layers followed by pooling and fully connected layers. The document explains how CNNs work using a simple example of classifying handwritten X and O characters. It provides details on the different layer types, including convolutional layers which identify patterns using small filters, and pooling layers which downsample the inputs.
Deep learning is introduced along with its applications and key players in the field. The document discusses the problem space of inputs and outputs for deep learning systems. It describes what deep learning is, providing definitions and explaining the rise of neural networks. Key deep learning architectures like convolutional neural networks are overviewed along with a brief history and motivations for deep learning.
This document provides an overview of convolutional neural networks (CNNs). It explains that CNNs are a type of neural network that has been successfully applied to analyzing visual imagery. The document then discusses the motivation and biology behind CNNs, describes common CNN architectures, and explains the key operations of convolution, nonlinearity, pooling, and fully connected layers. It provides examples of CNN applications in computer vision tasks like image classification, object detection, and speech recognition. Finally, it notes several large tech companies that utilize CNNs for features like automatic tagging, photo search, and personalized recommendations.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
This covers a end-to-end coverage of neural networks,CNN internals , Tensorflow and Keras basic , intution on object detection and face recognition and AI on Android x86.
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
1. The document discusses CNN architecture and concepts like convolution, pooling, and fully connected layers.
2. Convolutional layers apply filters to input images to generate feature maps, capturing patterns like edges. Pooling layers downsample these to reduce parameters.
3. Fully connected layers at the end integrate learned features for classification tasks like image recognition. CNNs exploit spatial structure in images unlike regular neural networks.
- Artificial neural networks are computational models inspired by the human brain that are composed of interconnected nodes (neurons) that can learn from examples.
- Knowledge is acquired by adjusting the synaptic connections between neurons based on a learning process. These connections store the knowledge gained during learning.
- There are different network architectures including single-layer feedforward networks, multi-layer feedforward networks, and recurrent networks. Activation functions determine whether a neuron is active.
- Applications of ANNs include pattern recognition, function approximation, and associative memory. Common current models are deep learning architectures and multilayer perceptrons.
Deep learning is a type of machine learning that uses neural networks with multiple layers to progressively extract higher-level features from raw input. Lower layers may identify simple elements like edges in images while higher layers identify more complex concepts like digits or faces. Deep learning models learn representations of data by using backpropagation to indicate how a machine should change its internal parameters to best fit the training data. Convolutional neural networks are a type of deep learning model that use convolution operations to identify patterns in grid-like data like images or text.
The document discusses convolutional neural networks (CNNs) and object detection. It provides an overview of CNN architecture including convolutional layers, pooling layers, and fully connected layers. It covers practical aspects like initialization, regularization, and transfer learning. It then discusses object detection techniques, including sliding windows, region proposals, IoU for evaluating detections, and non-maximum suppression. Finally, it gives examples of one-stage detectors like YOLO and two-stage detectors like R-CNN, Fast R-CNN, and Faster R-CNN.
This document provides an overview and introduction to deep learning. It discusses motivations for deep learning such as its powerful learning capabilities. It then covers deep learning basics like neural networks, neurons, training processes, and gradient descent. It also discusses different network architectures like convolutional neural networks and recurrent neural networks. Finally, it describes various deep learning applications, tools, and key researchers and companies in the field.
This document provides an introduction to neural networks. It discusses how neural networks have recently achieved state-of-the-art results in areas like image and speech recognition and how they were able to beat a human player at the game of Go. It then provides a brief history of neural networks, from the early perceptron model to today's deep learning approaches. It notes how neural networks can automatically learn features from data rather than requiring handcrafted features. The document concludes with an overview of commonly used neural network components and libraries for building neural networks today.
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
The document discusses scene classification using convolutional neural networks (CNNs). It begins with an outline of the topic, then provides background on computer vision as an AI problem and the importance and challenges of scene classification. It introduces CNNs as a deep learning technique for visual pattern recognition, describing their hierarchical organization and components like convolution and pooling layers. The document also discusses traditional machine learning approaches versus deep learning for scene classification and frameworks like Caffe that can be used to implement CNNs.
Deep Learning AtoC with Image PerspectiveDong Heon Cho
Deep learning models like CNNs, RNNs, and GANs are widely used for image classification and computer vision tasks. CNNs are commonly used for tasks like classification, detection, segmentation through learning hierarchical image features. Fully convolutional networks with encoder-decoder architectures like SegNet and Mask R-CNN can perform pixel-level semantic segmentation and instance segmentation by combining classification and bounding box detection. Deep learning has achieved state-of-the-art performance on many image applications due to its ability to learn powerful visual representations from large datasets.
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima PanditPurnima Pandit
The document discusses soft computing and its techniques, including artificial neural networks (ANN). It provides an overview of ANN, including how biological neurons inspired the basic ANN model. A neuron has inputs, outputs, weights, and an activation function. Networks can be single or multilayer. Learning involves updating weights to minimize error, with backpropagation commonly used for multilayer networks. Applications include pattern recognition, function approximation, and parameter estimation. A simple example is provided to estimate the slope and intercept of a line using ANN.
Principles of Hierarchical Temporal Memory - Foundations of Machine IntelligenceNumenta
This document provides an overview of a workshop on hierarchical temporal memory (HTM) held by Numenta on October 17, 2014. The key points discussed include:
1) Numenta's mission is to discover the operating principles of the neocortex and create machine intelligence technology based on these principles.
2) HTM is based on theories about how the neocortex works, and models the neocortex as a hierarchical system that learns sequences and makes predictions.
3) Numenta's research roadmap focuses on developing HTM applications for tasks like prediction, anomaly detection, and goal-oriented behavior.
Neural networks are computing systems inspired by biological neural networks in the brain. They are composed of interconnected artificial neurons that process information using a connectionist approach. Neural networks can be used for applications like pattern recognition, classification, prediction, and filtering. They have the ability to learn from and recognize patterns in data, allowing them to perform complex tasks. Some examples of neural network applications discussed include face recognition, handwritten digit recognition, fingerprint recognition, medical diagnosis, and more.
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
Automatic Attendance System will recognize the face of the student through the camera in the class and mark the attendance. It was built in Python with Machine Learning.
This document provides an introduction to deep learning. It discusses the history of machine learning and how neural networks work. Specifically, it describes different types of neural networks like deep belief networks, convolutional neural networks, and recurrent neural networks. It also covers applications of deep learning, as well as popular platforms, frameworks and libraries used for deep learning development. Finally, it demonstrates an example of using the Nvidia DIGITS tool to train a convolutional neural network for image classification of car park images.
This document introduces data mining and knowledge discovery. It discusses how the growth of data from various sources has led to a data flood that makes it impossible for humans to analyze on their own. Data mining aims to discover useful patterns in large data sets through techniques like classification, clustering, and association rule mining. Major application areas are described such as customer relationship management, fraud detection, and bioinformatics. The document also covers controversies around using data mining for security and privacy concerns.
IP Security (IPsec) provides authentication, integrity, and confidentiality for network traffic at the IP layer. It can be implemented in firewalls and routers to securely encrypt all network traffic. IPsec uses the Authentication Header (AH) and Encapsulating Security Payload (ESP) protocols to provide integrity and encryption. AH provides integrity and authentication while ESP adds confidentiality. IPsec can operate in transport mode for host-to-host traffic or tunnel mode to create virtual private networks.
This document provides guidance on how to write a scientific paper. It discusses the key components of a scientific paper, including the title, abstract, introduction, methods, results, discussion, and references. The most common structure for a scientific paper is IMRAD, which stands for introduction, methods, results, and discussion. The methods section should provide enough detail that other researchers could replicate the experiments. Tables and figures should supplement the text in the results section. The discussion section interprets the results and relates them to previous research. References must be cited in the text and listed at the end following a standardized format. Proper scientific writing aims to clearly communicate new findings to peers.
The document provides 7 steps for writing a research paper: 1) Choose a topic and narrow it down, 2) Gather materials and develop a focus question, 3) Take notes from sources and organize them, 4) Decide on a thesis statement that makes an argument, 5) Create an outline to develop the thesis, 6) Write drafts while focusing on the thesis and referencing sources, 7) Revise drafts based on feedback and refine the paper. The steps guide choosing a topic, researching sources, developing a thesis, outlining an argument, drafting and revising the paper for a clear and coherent final product.
King Harald Bluetooth unified warring Viking tribes in the 10th century. In the 21st century, a wireless Bluetooth network is named after him. Bluetooth allows for personal ad-hoc networks, cable replacement, and landline data/voice access through access points. It operates in the unlicensed 2.4GHz band using frequency hopping and supports data rates up to 1Mbps.
This document provides guidance on writing abstracts. It defines an abstract as a short, 150-250 word summary that states the purpose, methods, results, and conclusions of a research project. The document outlines the components an abstract should contain and provides a step-by-step process for writing each section. It emphasizes that abstracts should be concise, clear, cohesive and complete. Tips are provided on revising, editing and improving abstract writing skills.
Intruders accessing computer systems are a significant security issue. Chapter 18 discusses intrusion techniques, password guessing, password capture, and approaches to intrusion detection. Intrusion detection systems use statistical anomaly detection, rule-based anomaly detection, or rule-based penetration identification to detect intrusions by analyzing audit records and comparing activities to rules. Distributed intrusion detection has systems working together to detect intrusions across networked systems. Managing passwords through policies, education, and password checking helps strengthen defenses against intruders.
Digital signatures provide message authentication and verification of authorship. They depend on the message content, use unique sender information, and are difficult to forge. Authentication protocols are used to establish identity and exchange session keys securely. The Digital Signature Standard (DSS) is the approved digital signature algorithm used in the US. It creates 320-bit signatures using the SHA hash algorithm and signing and verifying processes based on discrete logarithms.
Bluetooth technology allows for short-range wireless connectivity between various digital devices. It was named after the 10th century Danish king Harald Bluetooth who united Denmark and Norway. The Bluetooth Special Interest Group was founded in 1998 by several large technology companies to develop the open standard. Bluetooth operates using frequency hopping spread spectrum in the 2.4GHz band to connect up to eight devices in a short-range personal area network. While offering connectivity and replacing cables, Bluetooth also has some security weaknesses that can allow for attacks like bluesnarfing, bluejacking, and issues arise from its use of short PINs and cryptographic algorithms like E0 that are not as secure as originally intended.
PGP and S/MIME are protocols that provide security enhancements for email such as confidentiality, authentication, integrity, and non-repudiation. PGP uses public/private key encryption and a "web of trust" model where users can sign each other's keys, while S/MIME uses X.509 certificates and a hybrid PKI/web of trust approach. Both protocols generate session keys to encrypt email contents and attach digital signatures to authenticate senders and detect modifications. PGP and S/MIME transform encrypted data into ASCII format for transmission over standard email protocols.
The document discusses convolutional neural networks (CNNs) for image recognition. It provides 3 key properties of images that CNNs exploit: 1) Some patterns are much smaller than the whole image so neurons can detect local patterns; 2) The same patterns appear in different image regions so filters can have shared parameters; 3) Subsampling pixels does not change objects so the image can be downsampled to reduce parameters. It then explains the basic CNN architecture including convolution, max pooling, and fully connected layers. Convolution applies filters to extract features, max pooling downsamples, and fully connected layers perform classification.
This document summarizes key points from Chapter 7 of Andrew Tanenbaum's book "Computer Networks" which covers the application layer of the OSI model. It discusses the Domain Name System (DNS) which maps domain names to IP addresses, electronic mail (email) protocols and formats, and the basic architecture of email systems including user agents and message transfer. Real applications like the World Wide Web and multimedia are also mentioned.
Here is a short note on Bluetooth technology:
Bluetooth is a wireless technology standard for exchanging data over short distances between devices like mobile phones, laptops, personal computers, printers, digital cameras, and video game consoles. It was developed in 1994 by telecom vendor Ericsson and was originally conceived as a wireless alternative to RS-232 data cables.
Some key aspects of Bluetooth technology:
- It operates in the unlicensed 2.4 GHz short-range radio frequency band, allowing devices to connect within a range of around 30 feet.
- It uses a frequency-hopping spread spectrum technique to change between 79 designated frequencies in the 2.4 GHz band, avoiding interference and jamming.
Bluetooth is a wireless technology standard that allows short-range exchange of data between fixed and mobile devices over short distances using radio transmissions. It was originally developed in 1994 as a cable replacement technology. The Bluetooth specifications have evolved through several versions with improvements in speed and functionality. Bluetooth devices operate within piconets, and multiple piconets can be interconnected to form scatternets. The Bluetooth protocol stack includes various layers like the radio, baseband, link manager, and L2CAP layers to manage connections and transfer of data packets. Bluetooth provides advantages like wireless connectivity and ease of use but also has limitations such as short range and potential security issues.
This document contains a format for applicants to compile their academic and research scores according to UGC guidelines. It includes sections to document details of publications, books, conferences, research projects, patents, awards, and more. Applicants must provide supporting evidence for their claims. Scores are calculated based on the type of work, authorship status, impact factor, and other criteria. There are notes on calculating joint work and capping scores from certain categories at 30% of the total. The format allows applicants to self-assess their research scores which are then reviewed and final scores awarded.
This PowerPoint presentation introduces educational technology. It defines educational technology as utilizing modern machines and devices to increase learning and develop student interest. It discusses the historical development of educational technology from ancient teachers to modern devices. It also outlines the roles of educational technology in learning, including as a tool to support knowledge construction, as an information vehicle, and to support learning by doing and conversing. Overall, the presentation provides an overview of key concepts in educational technology.
Arrays are linear arrangements of elements of the same type stored in contiguous memory locations. An array must be declared with the maximum number of elements. Elements are accessed using indexes from 0 to size-1. Arrays can be passed to functions, but the array itself is not copied. Common array operations include insertion, deletion, retrieval of elements, and traversing the entire array.
This document discusses Python functions. It covers built-in functions, user-defined functions, fruitful and void functions, parameters and arguments, recursion, and strings. Specifically, it provides examples and explanations of various math functions like abs(), ceil(), exp(), and more. It also covers defining new functions, importing functions from modules, boolean functions, and using for loops to traverse strings.
This document discusses operators, operands, expressions, and statements in C programming. It defines key terms like operators, operands, and expressions. It explains that operators take operands to perform computations and that expressions combine operands and operators. Expressions can evaluate to values and produce side effects. The document provides numerous examples of different types of C operators, including arithmetic, relational, logical, assignment, and more. It explains the behavior and usage of unary, binary, and ternary operators.
The document discusses C arrays, including:
1) C arrays allow storing a collection of related data elements of the same type in contiguous memory locations. This simplifies declaring and accessing multiple variables compared to individual declarations.
2) Arrays can be initialized during declaration by providing an initializer list of values. Multi-dimensional arrays can store elements in rows and columns accessed using multiple indices.
3) Examples demonstrate declaring, initializing, and accessing one-dimensional and two-dimensional arrays. Operations like summing elements, finding minimum/maximum values, and searching for a value in an array are also illustrated.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
cnn.pptx Convolutional neural network used for image classication
CNN Algorithm
1. 6.874, 6.802, 20.390, 20.490, HST.506
Computational Systems Biology
Deep Learning in the Life Sciences
Lecture 3:
Convolutional Neural Networks
Prof. Manolis Kellis
http://mit6874.github.io
Slides credit: 6.S191, Dana Erlich, Param Vir Singh,
David Gifford, Alexander Amini, Ava Soleimany
2. Today: Convolutional Neural Networks (CNNs)
1. Scene understanding and object recognition for machines (and humans)
– Scene/object recognition challenge. Illusions reveal primitives, conflicting info
– Human neurons/circuits. Visual cortex layers==abstraction. General cognition
2. Classical machine vision foundations: features, scenes, filters, convolution
– Spatial structure primitives: edge detectors & other filters, feature recognition
– Convolution: basics, padding, stride, object recognition, architectures
3. CNN foundations: LeNet, de novo feature learning, parameter sharing
– Key ideas: learn features, hierarchy, re-use parameters, back-prop filter learning
– CNN formalization: representations(Conv+ReLU+Pool)*N layers + Fully-connected
4. Modern CNN architectures: millions of parameters, dozens of layers
– Feature invariance is hard: apply perturbations, learn for each variation
– ImageNet progression of best performers
– AlexNet: First top performer CNN, 60M parameters (from 60k in LeNet-5), ReLU
– VGGNet: simpler but deeper (819 layers), 140M parameters, ensembles
– GoogleNet: new primitive=inception module, 5M params, no FC, efficiency
– ResNet: 152 layers, vanishing gradients fit residuals to enable learning
5. Countless applications: General architecture, enormous power
– Semantic segmentation, facial detection/recognition, self-driving, image
colorization, optimizing pictures/scenes, up-scaling, medicine, biology, genomics
3. 1a. What do you see, and how?
Can we teach machines to see?
6. What computers ‘see’: Images as Numbers
What the computer "sees"
Levin
Image
Processing
&
Computer
Vision
An image is just a matrix of numbers [0,255].i.e.,1080x1080x3 for an RGB image.
Question: is this Lincoln?Washington? Jefferson? Obama?
How can the computer answer this question?
What you see
Input Image Input Image + values Pixel intensity values
(“pix-el”=picture-element)
What you both see
Can I just do classification on the 1,166400-long image vector directly?
No. Instead: exploit image spatial structure. Learn patches. Build them up
8. Inspiration: human/animal visual cortex
• Layers of neurons: pixels, edges, shapes, primitives, scenes
• E.g. Layer 4 responds to bands w/ given slant, contrasting edges
9. Primitives: Neurons & action potentials
•Chemical accumulation across
dendritic connections
•Pre-synaptic axon
post-synaptic dendrite
neuronal cell body
•Each neuron receives multiple
signals from its many dendrites
•When threshold crossed, it fires
•Its axon then sends outgoing
signal to downstream neurons
•Weak stimuli ignored
•Sufficiently strong cross
activation threshold
•Non-linearity within
each neuronal level
•Neurons connected into circuits (neural networks): emergent properties, learning, memory
•Simple primitives arranged in simple, repetitive, and extremely large networks
•86 billion neurons, each connects to 10k neurons, 1 quadrillion (1012) connections
10. Abstraction layers: edges, bars, dir., shapes, objects, scenes
LGN: Small dots
V1: Orientation,
disparity, some color
V4: Color, basic shapes,
2D/3D, curvature
VTC: Complex features
and objects(VTC: ventral temporal cortex
•Abstraction layers visual cortex layers
•Complex concepts from simple parts, hierarchy
•Primitives of visual concepts encoded in
neuronal connection in early cortical layers
11. • Massive recent expanse of human brain has re-used a
relatively simple but general learning architecture
General “learning machine”, reused widely
• Hearing, taste, smell, sight, touch all re-
use similar learning architecture
Motor Cortex
Visual Cortex • Interchangeable
circuitry
• Auditory cortex
learns to ‘see’ if
sent visual signals
• Injury area tasks
shift to uninjured
areas
• Not fully-general learning, but well-adapted to our world
• Humans co-opted this circuitry to many new applications
• Modern tasks accessible to any homo sapiens (<70k years)
• ML primitives not too different from animals: more to come?
human
chimp
Hardware
expansion
12. Today: Convolutional Neural Networks (CNNs)
1. Scene understanding and object recognition for machines (and humans)
– Scene/object recognition challenge. Illusions reveal primitives, conflicting info
– Human neurons/circuits. Visual cortex layers==abstraction. General cognition
2. Classical machine vision foundations: features, scenes, filters, convolution
– Spatial structure primitives: edge detectors & other filters, feature recognition
– Convolution: basics, padding, stride, object recognition, architectures
3. CNN foundations: LeNet, de novo feature learning, parameter sharing
– Key ideas: learn features, hierarchy, re-use parameters, back-prop filter learning
– CNN formalization: representations(Conv+ReLU+Pool)*N layers + Fully-connected
4. Modern CNN architectures: millions of parameters, dozens of layers
– Feature invariance is hard: apply perturbations, learn for each variation
– ImageNet progression of best performers
– AlexNet: First top performer CNN, 60M parameters (from 60k in LeNet-5), ReLU
– VGGNet: simpler but deeper (819 layers), 140M parameters, ensembles
– GoogleNet: new primitive=inception module, 5M params, no FC, efficiency
– ResNet: 152 layers, vanishing gradients fit residuals to enable learning
5. Countless applications: General architecture, enormous power
– Semantic segmentation, facial detection/recognition, self-driving, image
colorization, optimizing pictures/scenes, up-scaling, medicine, biology, genomics
14. Using Spatial Structure
Idea: connect
patches of input to
neurons in hidden
layer.
Neuron connected
to region of input.
Only “sees”these
values.
Input: 2D
image.
Array of pixel
values
15. Using Spatial Structure
Connect patch in input layer to a single neuron in subsequent layer.
Use a sliding window to define connections.
How can we weight the patch to detect particular features?
16. Feature Extraction with Convolution
- Filter of size 4x4 :16 different weights
- Apply this same filter to 4x4 patches in input
- Shift by 2 pixels for next patch
This“patchy” operation isconvolution
1) Apply a set of weights – a filter – to extract local features
2) Use multiple filters to extract different features
3) Spatially share parameters of each filter
17. Fully Connected Neural Network
Fully Connected:
• Each neuron in
hidden layer
connected to all
neurons in input
layer
• No spatial information
• Many, many
parameters
Input:
• 2D image
• Vector of pixel
values
Key idea: Use spatial structure in input to inform architecture
of the network
18. High Level Feature Detection
Let’s identify key features in each image category
Wheels,License Plate,
Headlights
Door,Windows,Steps
Nose,Eyes,Mouth
25. X or X?
Image is represented as matrix of pixel values… and computers are literal!
We want to be able to classify an X as an X even if it’s shifted,shrunk,rotated, deformed.
Rohrer How do CNNs work?
27. (Goodfellow 2016)
Zero Padding Controls Output Size
• Full convolution: zero pad input so output is produced whenever an output value
contains at least one input value (expands output)
• Valid-only convolution: output only when
entire kernel contained in input (shrinks output)
• Same convolution: zero pad input so output
is same size as input dimensions
x = tf.nn.conv2d(x, W, strides=[1,strides,strides,1],padding='SAME')
• TF convolution operator takes stride and zero fill option as parameters
• Stride is distance between kernel applications in each dimension
• Padding can be SAME or VALID
28.
29. Today: Convolutional Neural Networks (CNNs)
1. Scene understanding and object recognition for machines (and humans)
– Scene/object recognition challenge. Illusions reveal primitives, conflicting info
– Human neurons/circuits. Visual cortex layers==abstraction. General cognition
2. Classical machine vision foundations: features, scenes, filters, convolution
– Spatial structure primitives: edge detectors & other filters, feature recognition
– Convolution: basics, padding, stride, object recognition, architectures
3. CNN foundations: LeNet, de novo feature learning, parameter sharing
– Key ideas: learn features, hierarchy, re-use parameters, back-prop filter learning
– CNN formalization: representations(Conv+ReLU+Pool)*N layers + Fully-connected
4. Modern CNN architectures: millions of parameters, dozens of layers
– Feature invariance is hard: apply perturbations, learn for each variation
– ImageNet progression of best performers
– AlexNet: First top performer CNN, 60M parameters (from 60k in LeNet-5), ReLU
– VGGNet: simpler but deeper (819 layers), 140M parameters, ensembles
– GoogleNet: new primitive=inception module, 5M params, no FC, efficiency
– ResNet: 152 layers, vanishing gradients fit residuals to enable learning
5. Countless applications: General architecture, enormous power
– Semantic segmentation, facial detection/recognition, self-driving, image
colorization, optimizing pictures/scenes, up-scaling, medicine, biology, genomics
31. Key idea:
learn hierarchy of features
directly from the data
(rather than hand-engineering them)
Low level features Mid level features High level features
Lee+ ICML 2009
Eyes,ears,nose
Edges,dark spots Facial structure
32. Key idea: re-use parameters
Convolution shares parameters
Example 3x3 convolution on a 5x5 image
33. Feature Extraction with Convolution
1) Apply a set of weights – a filter – to extract local features
2) Use multiple filters to extract different features
3) Spatially share parameters of each filter
34. LeNet-5
• Gradient Based Learning Applied To Document Recognition -
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner; 1998
• Helped establish how we use CNNs today
• Replaced manual feature extraction
[LeCun et al., 1998]
35. LeNet-5
⋮ ⋮
�
𝑦𝑦
32×32×1 28×28×6 14×14×6 10×10×16
5×5×16
120 84
5 × 5
s = 1
f = 2
s = 2
avg pool
5 × 5
s = 1
avg pool
f = 2
s = 2
. . .
. . .
Reminder:
Output size = (N+2P-F)/stride + 1
10
conv conv
FC FC
[LeCun et al., 1998]
This slide is taken from Andrew Ng
36. LeNet-5
• Only 60K parameters
• As we go deeper in the network: 𝑁𝑁𝐻𝐻 ↓, 𝑁𝑁𝑊𝑊↓, 𝑁𝑁𝐶𝐶 ↑
• General structure:
conv->pool->conv->pool->FC->FC->output
• Different filters look at different channels
• Sigmoid and Tanh nonlinearity
[LeCun et al., 1998]
40. Representation Learning in Deep CNNs
Mid level features
Low level features High level features
Edges,dark spots
Conv Layer 1
Lee+ ICML 2009
Eyes,ears,nose
Conv Layer 2
Facial structure
Conv Layer 3
41. CNNs for Classification
1. Convolution:Apply filters to generate feature maps.
2. Non-linearity:Often ReLU.
3. Pooling:Downsampling operation on each feature map.
Trainmodel with image data.
Learn weights of filters in convolutional layers.
tf.keras.layers.Conv2
D
tf.keras.activations.
*
tf.keras.layers.MaxPool2
D
43. Convolutional Layers: Local Connectivity
For a neuron in
hidden layer:
- Take inputs from patch
- Compute weighted
sum
- Apply bias
tf.keras.layers.
Conv2D
44. Convolutional Layers: Local Connectivity
For a neuron in hidden layer:
• Take inputs from patch
• Compute weighted sum
• Apply bias
4x4 filter:
matrix of
weights wij for neuron (p,q) in hidden layer
1) applying a window of weights
2) computing linear combinations
3) activating with non-linear function
tf.keras.layers.Conv2D
45. CNNs: Spatial Arrangement of Output
Volume
depth
width
height
Layer Dimensions:
ℎ w d
where h and w are spatial
dimensions d (depth) = number of
filters
Stride:
Filter step size
Receptive Field:
Locations in input image
that a node is path
connected to
tf.keras.layers.Conv2D( filters=d, kernel_size=(h,w), strides=s )
46. Introducing Non-Linearity
Rectified Linear Unit
(ReLU)
- Apply after every convolution operation
(i.e.,after convolutional layers)
- ReLU:pixel-by-pixel operation that replaces
all negative values by zero.
- Non-linear operation
tf.keras.layers.ReLU
Karn Intuitive CNNs
48. The REctified Linear Unit (RELU) is a common
non-linear detector stage after convolution
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
x = tf.nn.bias_add(x, b)
x= tf.nn.relu(x)
f(x) = max(0, x)
When will we backpropagate through this?
Once it “dies” what happens to it?
49. Pooling reduces dimensionality by giving up
spatial location
• max pooling reports the maximum output
within a defined neighborhood
• Padding can be SAME or VALID
x = tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME')
Output Input Pooling Batch H W Input channel
Neighborhood
[batch, height, width, channels]
51. 91
CNNs for Classification: Feature Learning
1. Learn features in input image through convolution
2. Introduce non-linearity through activation function (real-world data is
non-linear!)
3. Reduce dimensionality and preserve spatial invariance with pooling
52. CNNs for Classification: Class Probabilities
- CONV and POOL layers output high-level features of input
- Fully connected layer uses these features for classifying input image
- Express output as probability of image belonging to a particular class
53. Putting it all together
import tensorflow as tf
def generate_model():
model = tf.keras.Sequential([
# first convolutional layer
tf.keras.layers.Conv2D(32, filter_size=3, activation='relu’),
tf.keras.layers.MaxPool2D(pool_size=2, strides=2),
# second convolutional layer
tf.keras.layers.Conv2D(64, filter_size=3, activation='relu’),
tf.keras.layers.MaxPool2D(pool_size=2, strides=2),
# fully connected classifier
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1024, activation='relu’),
tf.keras.layers.Dense(10, activation=‘softmax’)
# 10 outputs
])
return model
54. Today: Convolutional Neural Networks (CNNs)
1. Scene understanding and object recognition for machines (and humans)
– Scene/object recognition challenge. Illusions reveal primitives, conflicting info
– Human neurons/circuits. Visual cortex layers==abstraction. General cognition
2. Classical machine vision foundations: features, scenes, filters, convolution
– Spatial structure primitives: edge detectors & other filters, feature recognition
– Convolution: basics, padding, stride, object recognition, architectures
3. CNN foundations: LeNet, de novo feature learning, parameter sharing
– Key ideas: learn features, hierarchy, re-use parameters, back-prop filter learning
– CNN formalization: representations(Conv+ReLU+Pool)*N layers + Fully-connected
4. Modern CNN architectures: millions of parameters, dozens of layers
– Feature invariance is hard: apply perturbations, learn for each variation
– ImageNet progression of best performers
– AlexNet: First top performer CNN, 60M parameters (from 60k in LeNet-5), ReLU
– VGGNet: simpler but deeper (819 layers), 140M parameters, ensembles
– GoogleNet: new primitive=inception module, 5M params, no FC, efficiency
– ResNet: 152 layers, vanishing gradients fit residuals to enable learning
5. Countless applications: General architecture, enormous power
– Semantic segmentation, facial detection/recognition, self-driving, image
colorization, optimizing pictures/scenes, up-scaling, medicine, biology, genomics
57. How can computers recognize objects?
Challenge:
• Objects can be anywhere in the scene, in any orientation, rotation, color hue, etc.
• How can we overcome this challenge?
Answer:
• Learn a ton of features (millions) from the bottom up
• Learn the convolutional filters, rather than pre-computing them
60. LeNet-5
• Gradient Based Learning Applied To Document Recognition -
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner; 1998
• Helped establish how we use CNNs today
• Replaced manual feature extraction
[LeCun et al., 1998]
61. ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) winners
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
62. AlexNet
• ImageNet Classification with Deep Convolutional
Neural Networks - Alex Krizhevsky, Ilya Sutskever,
Geoffrey E. Hinton; 2012
• Facilitated by GPUs, highly optimized convolution
implementation and large datasets (ImageNet)
• One of the largest CNNs to date
• Has 60 Million parameter compared to 60k
parameter of LeNet-5
[Krizhevsky et al., 2012]
63. ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) winners
• The annual “Olympics” of computer vision.
• Teams from across the world compete to see who has the
best computer vision model for tasks such as classification,
localization, detection, and more.
• 2012 marked the first year where a CNN was used to
achieve a top 5 test error rate of 15.3%.
• The next best entry achieved an error of 26.2%.
64. ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) winners
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
65. AlexNet
[Krizhevsky et al., 2012]
Architecture
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
Max POOL3
FC6
FC7
FC8
• Input: 227x227x3 images (224x224 before
padding)
• First layer: 96 11x11 filters applied at stride 4
• Output volume size?
(N-F)/s+1 = (227-11)/4+1 = 55 ->
[55x55x96]
• Number of parameters in this layer?
(11*11*3)*96 = 35K
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
67. AlexNet
[Krizhevsky et al., 2012]
• Input: 227x227x3 images (224x224 before
padding)
• After CONV1: 55x55x96
• Second layer: 3x3 filters applied at stride 2
• Output volume size?
(N-F)/s+1 = (55-3)/2+1 = 27 -> [27x27x96]
• Number of parameters in this layer?
0!
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
Architecture
CONV1
MAX POOL1
NORM1
CONV2
MAX POOL2
NORM2
CONV3
CONV4
CONV5
Max POOL3
FC6
FC7
FC8
68. AlexNet
. . .
227×227 ×3 55×55 × 96 27×27 ×96 27×27 ×256
13×13
×256
13×13 ×384 13×13 ×384 13×13 ×256 6×6 ×256
11 × 11
s = 4
P = 0
3 × 3
s = 2
max pool
5 × 5
S = 1
P = 2
3 × 3
s = 2
max pool
3 × 3
S = 1
P = 1
3 × 3
s = 1
P = 1
3 × 3
S = 1
P = 1
3 × 3
s = 2
max pool
conv conv
conv conv conv
. . .
[Krizhevsky et al., 2012]
. . .
This slide is taken from Andrew Ng
69. AlexNet
. . .
4096 4096
Softmax
1000
⋮ ⋮
[Krizhevsky et al., 2012]
FC FC
This slide is taken from Andrew Ng
70. AlexNet
[Krizhevsky et al., 2012]
Details/Retrospectives:
• first use of ReLU
• used Norm layers (not common anymore)
• heavy data augmentation
• dropout 0.5
• batch size 128
• 7 CNN ensemble
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
71. AlexNet
[Krizhevsky et al., 2012]
• Trained on GTX 580 GPU with only 3 GB of memory.
• Network spread across 2 GPUs, half the neurons (feature
maps) on each GPU.
• CONV1, CONV2, CONV4, CONV5:
Connections only with feature maps on same GPU.
• CONV3, FC6, FC7, FC8:
Connections with all feature maps in preceding layer,
communication across GPUs.
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
72. AlexNet
AlexNet was the coming out party for CNNs in the computer
vision community. This was the first time a model performed
so well on a historically difficult ImageNet dataset. This
paper illustrated the benefits of CNNs and backed them up
with record breaking performance in the competition.
[Krizhevsky et al., 2012]
73. ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) winners
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
74. ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) winners
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
75. VGGNet
• Very Deep Convolutional Networks For Large Scale
Image Recognition - Karen Simonyan and Andrew
Zisserman; 2015
• The runner-up at the ILSVRC 2014 competition
• Significantly deeper than AlexNet
• 140 million parameters
[Simonyan and Zisserman, 2014]
76. VGGNet
• Smaller filters
Only 3x3 CONV filters, stride 1, pad 1
and 2x2 MAX POOL , stride 2
• Deeper network
AlexNet: 8 layers
VGGNet: 16 - 19 layers
• ZFNet: 11.7% top 5 error in ILSVRC’13
• VGGNet: 7.3% top 5 error in ILSVRC’14
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9. [Simonyan and Zisserman, 2014]
Input
3x3 conv, 64
3x3 conv, 64
Pool 1/2
3x3 conv, 128
3x3 conv, 128
Pool 1/2
3x3 conv, 256
3x3 conv, 256
Pool 1/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
Pool 1/2
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
Pool 1/2
FC 4096
FC 4096
FC 1000
Softmax
77. VGGNet
[Simonyan and Zisserman, 2014]
• Why use smaller filters? (3x3 conv)
Stack of three 3x3 conv (stride 1) layers has the same effective
receptive field as one 7x7 conv layer.
• What is the effective receptive field of three 3x3 conv (stride
1) layers?
7x7
But deeper, more non-linearities
And fewer parameters: 3 * (32C2) vs. 72C2 for C channels per layer
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
78. VGGNet
[Simonyan and Zisserman, 2014]
VGG16:
TOTAL memory: 24M * 4 bytes ~= 96MB / image
TOTAL params: 138M parameters
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
Input
3x3 conv, 64
3x3 conv, 64
Pool
3x3 conv, 128
3x3 conv, 128
Pool
3x3 conv, 256
3x3 conv, 256
3x3 conv, 256
Pool
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
Pool
3x3 conv, 512
3x3 conv, 512
3x3 conv, 512
Pool
FC 4096
FC 4096
FC 1000
Softmax
80. VGGNet
[Simonyan and Zisserman, 2014]
Details/Retrospectives :
• ILSVRC’14 2nd in classification, 1st in localization
• Similar training procedure as AlexNet
• No Local Response Normalisation (LRN)
• Use VGG16 or VGG19 (VGG19 only slightly better, more
memory)
• Use ensembles for best results
• FC7 features generalize well to other tasks
• Trained on 4 Nvidia Titan Black GPUs for two to three weeks.
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
81. VGGNet
VGG Net reinforced the notion that convolutional neural
networks have to have a deep network of layers in order for
this hierarchical representation of visual data to work.
Keep it deep.
Keep it simple.
[Simonyan and Zisserman, 2014]
82. ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) winners
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
83. GoogleNet
• Going Deeper with Convolutions - Christian Szegedy et
al.; 2015
• ILSVRC 2014 competition winner
• Also significantly deeper than AlexNet
• x12 less parameters than AlexNet
• Focused on computational efficiency
[Szegedy et al., 2014]
84. GoogleNet
• 22 layers
• Efficient “Inception” module - strayed from
the general approach of simply stacking conv
and pooling layers on top of each other in a
sequential structure
• No FC layers
• Only 5 million parameters!
• ILSVRC’14 classification winner (6.7% top 5
error)
[Szegedy et al., 2014]
85. GoogleNet
“Inception module”: design a good local network topology (network within
a network) and then stack these modules on top of each other
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9. [Szegedy et al., 2014]
Filter
concatenation
Previous layer
1x1
convolution
3x3
convolution
5x5
convolution
1x1
convolution
1x1
convolution
1x1
convolution
3x3 max
pooling
86. GoogleNet
Details/Retrospectives :
• Deeper networks, with computational efficiency
• 22 layers
• Efficient “Inception” module
• No FC layers
• 12x less params than AlexNet
• ILSVRC’14 classification winner (6.7% top 5 error)
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9. [Szegedy et al., 2014]
87. GoogleNet
Introduced the idea that CNN layers didn’t always have to be
stacked up sequentially. Coming up with the Inception
module, the authors showed that a creative structuring of
layers can lead to improved performance and
computationally efficiency.
[Szegedy et al., 2014]
88. ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) winners
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
89. ResNet
• Deep Residual Learning for Image Recognition -
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun;
2015
• Extremely deep network – 152 layers
• Deeper neural networks are more difficult to train.
• Deep networks suffer from vanishing and
exploding gradients.
• Present a residual learning framework to ease the
training of networks that are substantially deeper
than those used previously.
[He et al., 2015]
90. ResNet
• ILSVRC’15 classification winner (3.57% top 5
error, humans generally hover around a 5-
10% error rate)
Swept all classification and detection
competitions in ILSVRC’15 and COCO’15!
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9. [He et al., 2015]
91. ResNet
• What happens when we continue stacking deeper layers on a
convolutional neural network?
• 56-layer model performs worse on both training and test error
-> The deeper model performs worse (not caused by overfitting)!
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9. [He et al., 2015]
92. ResNet
• Hypothesis: The problem is an optimization problem. Very
deep networks are harder to optimize.
• Solution: Use network layers to fit residual mapping instead
of directly trying to fit a desired underlying mapping.
• We will use skip connections allowing us to take the activation
from one layer and feed it into another layer, much deeper
into the network.
• Use layers to fit residual F(x) = H(x) – x
instead of H(x) directly
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9. [He et al., 2015]
93. ResNet
Residual Block
Input x goes through conv-relu-conv series and gives us F(x).
That result is then added to the original input x. Let’s call that
H(x) = F(x) + x.
In traditional CNNs, H(x) would just be equal to F(x). So, instead
of just computing that transformation (straight from x to F(x)),
we’re computing the term that we have to add, F(x), to the
input, x.
[He et al., 2015]
95. ResNet
Full ResNet architecture:
• Stack residual blocks
• Every residual block has two 3x3 conv layers
• Periodically, double # of filters and
downsample spatially using stride 2 (in each
dimension)
• Additional conv layer at the beginning
• No FC layers at the end (only FC 1000 to
output classes)
[He et al., 2015]
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
96. ResNet
• Total depths of 34, 50, 101, or 152 layers for
ImageNet
• For deeper networks (ResNet-50+), use
“bottleneck” layer to improve efficiency
(similar to GoogLeNet)
[He et al., 2015]
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
97. ResNet
Experimental Results:
• Able to train very deep networks without degrading
• Deeper networks now achieve lower training errors as
expected
[He et al., 2015]
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
98. ResNet
The best CNN architecture that we currently have and is a
great innovation for the idea of residual learning.
Even better than human performance!
[He et al., 2015]
99. Accuracy comparison
The best CNN architecture that we currently have and is a
great innovation for the idea of residual learning.
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
100. Forward pass time and power
consumption
The best CNN architecture that we currently have and is a
great innovation for the idea of residual learning.
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
101. ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) winners
Slide taken from Fei-Fei & Justin Johnson & Serena Yeung. Lecture 9.
102. Today: Convolutional Neural Networks (CNNs)
1. Scene understanding and object recognition for machines (and humans)
– Scene/object recognition challenge. Illusions reveal primitives, conflicting info
– Human neurons/circuits. Visual cortex layers==abstraction. General cognition
2. Classical machine vision foundations: features, scenes, filters, convolution
– Spatial structure primitives: edge detectors & other filters, feature recognition
– Convolution: basics, padding, stride, object recognition, architectures
3. CNN foundations: LeNet, de novo feature learning, parameter sharing
– Key ideas: learn features, hierarchy, re-use parameters, back-prop filter learning
– CNN formalization: representations(Conv+ReLU+Pool)*N layers + Fully-connected
4. Modern CNN architectures: millions of parameters, dozens of layers
– Feature invariance is hard: apply perturbations, learn for each variation
– ImageNet progression of best performers
– AlexNet: First top performer CNN, 60M parameters (from 60k in LeNet-5), ReLU
– VGGNet: simpler but deeper (819 layers), 140M parameters, ensembles
– GoogleNet: new primitive=inception module, 5M params, no FC, efficiency
– ResNet: 152 layers, vanishing gradients fit residuals to enable learning
5. Countless applications: General architecture, enormous power
– Semantic segmentation, facial detection/recognition, self-driving, image
colorization, optimizing pictures/scenes, up-scaling, medicine, biology, genomics
108. Self-Driving Cars: Navigation from Visual Perception
Raw
Perception
I
(ex.camera)
Coarse
Maps
M
(ex.GPS)
Possible Control Commands
Amini+ ICRA 2019
109. End-to-End Framework for Autonomous Navigation
Entire model trained end-to-end
without any human labelling or annotations
Amini+ ICRA 2019
114. Breast Cancer Screening
6
.
Breast cancer case
missed by radiologist
but detected byAI
AI
MD
Readers
AI
MD
Readers
CNN-based system outperformed expert
radiologists at detecting breast
cancer from mammograms