I wanted to give you a quick reminder of what we’ve learnt so far that will help us through out the presentation, so that it will facilitate the presentation, and make It more understandable.
This is what a real neural network looks like.In our brains, billions of billions of neurons are spread all over, and connected to each other. This lights shows the pulses, passed between each other.
But for us– as engineers, this is the form of Neural Network we are intrested in.
his is the form It is a series of neurons connected to each other.
We mimic the human brain in our artificial neural networks.
So what’s a neuron then?
It’s both of these. The below one is the model we’ve found the represent the above – original one.
It take inputs, applies some procedure and gives some output.
neuron should fire or not.
different for on the use cases.
most basic activation function gives either 0 or 1.
Threshold
- if x >= 0 then 1, else it is 0
sigmoid
- Practical: derivative is easier to take : y(1-y)
value between 0 and 1
probability
- Rectifier
more in the Convolutional NN.
relation between neurons
change to minimize error is called as learning.
Do you know what this is? Any guesses?
This is a section of our brains that is called the cerebellum(seröbelum) beyincik responsible for motor
skills and our balance and some language skills.
And the little black things - like little ants - are neurons and there are billions and billions of them are connected. And we are trying to mimic, copy, recreate that structure for our learning purposes,
neurons looks like ants
billions of neurons connected
trying to recreate this
---
So do you think this would be enough?
Is this enough?
We need this!
And this is called Deep Neural Network.
so many hidden layers.
Define Learning: We defined learning as the weights representing connection between neurons and output results changing in order to minimize the error.
Back prop. : So this could be done by applying back propagation.
Adding a layer between input and output nodes, adds the real meaning to a network. Allowing them to adjust the connection between features (input layer) and output.
-----
Why not have lots of them then?
Why did not do it earlier?
-----
1989: Scientists were able to create algorithms that used deep neural networks, but training times for the systems were measured in days, making them impractical for real-world use.
1 Petaflop means that Quadrillion calculations in a second. A quadriallion is a number with 15 zeroes.
Our daily computers let’s say can do 1 billion, with 9 zeroes, so it’s a million times quicker than ours.
Say “Hi!” to Sunway TaihuLight
Here is a inforgraphic I made based information we have related to history of Deep Learning.
Mention Geoffrey and Yann Lee Cun as the father of Deep Learning.
Now in Google DeepMind.
Why do we call it Deep Learning?
lots of Hidden layers -> having more hidden layers increases the quality of learning.
One layer learns only one dimension, another layer learns one another dimension.
goes deeper and deeper as the number of layers increases in the hidden layers section. Hence the name comes deep learning.
shallow network -> less number of hidden layers.
shallow network can fit any function, it will need to be really fat.
That causes the number of parameters to increase a lot.
brain gets dazzled -> you are trying to decide what do you actually see, when you suddenly look over your shoulder and try to recognise something and you may fail at that point.
Look at these pictures. These are common hypnotizing, dazzling pictures all around the internet.
Try to recgonize what you see.
I know, your eyes try to match the eyes with the face, the nose with the mouth.
They keep going up and down to match the key fetarues. The features you learnt that all human faces need to have.
Your brain gets dazzled a bit when you are trying to decide what do you actually see, when you suddenly look over your shoulder and try to recognise something and you may fail at that point.
Because we cannot catch the key features in that matter of time.
Google Trends
Convolutional NN is taking over Artifical Intelligence. It’s a field of study that can be classified as Deep Learning.
It basically used for image analysis.
Combine two functions and derive another function.
Filter.
This feature enables us to filter the key features of the input.
In 1D this input is a Audio, in 2D this input is an image.
We are going to cover the 2D part mainly.
Image input is like this for a CNN.
Depending on the proporties of the image, input can be change
Your brain gets dazzled a bit when you are trying to decide what do you actually see, when you suddenly look over your shoulder and try to recognise something and you may fail at that point.
Convolutional NN is taking over Artifical Intelligence. It’s a field of study that can be classified as Deep Learning. It basically used for image analysis.
- History
Based on Cat’s Visual Vortex(1986 - hubel and wesel), layered overlap covers all field, eficcient to carry signal.
In 1986, 2 researchers Hubel and Wesel examine a cat’s visual cortex, they discovered that its receptive field consisted of sub-regions which were layered over each other to cover the entire visual field. These layers act as filters that process input images, which are then passed on to subsequent layers. This is thought to be a more efficient way to carry signals.
- Image
We will get to this image at the end again.
- Comparison
In a normal NN:
the signal is only passed forward and this is called forward feed,
the signal is not allowed to loopback to network.
This required nodes - neurons - to be fully connected.
And this created the basis of Convolutional NN.
Let’s understand CNN step by step!
Convolution
Given image is represented as input series (0-1) are over layered by feature detector matrices to detect the matching inputs.
Filtered results create new layers, called convolved layers.
LITERALLY FEATURES of the image, the thing we use in a image to recognize that image.
simplifies the image down to its key features.
multiple filters to create lots of feature maps -> called convolution layers.
Greater values -> at that point similarity to a feature is higher.
This filter is a edge filter applied to an image.
Sharpens Edges
ReLU - Rectifier Linear Unit
This is a part of Convolution step.
Go from the formula.
Removes all negative values, since we are only interested in the matching features meaning they are positive.
This is another activation function with the basic formula of f(x) = max(0, x).
We apply this function to our Convolutional Layer. So basically it will be removing all the negative values from them.
2- Pooling
Here there are some different positions, sizes, rotated and squashed images of the same cheetah we want our NN to be able to recognize this.
Actually we want it to recognise all types of cheetahs.
The process is like feature detection, we use 2x2 squares to do that. This can change by the way there is no rule for this but this is the most commonly used one overall.
* Usability
Even though the image is rotated or squashed the same feature will be in the same point after we do max pooling.
For example if 4 here is shows where the nose of a cat is, even it was rotated and 4 was somewhere else in Feature Map, thanks to taking pooling as n x n squares we still would be able to locate the 4 to where it originally is.
* Benefits
We are again removing information by preserving important features, so it will help our model to prevent overfitting to our input data.
Less data with the same meaning -> Easier to compute
* Note
There’s also a sub-sampling is where you take average instead of max pooling and other methods.
Flattening
We flatten pooling arrays into 1xn sized vector
To be able to feed this as input for our ANN.
After all those steps, we feed our flattened input vector into an ANN.
We have something different here!
Here we may have - most of the time we will have two or more outputs when doing classification. It usually is enough to use 1 if there are 2 classes, but for the sake of making an example we can will be using 2 outputs here.
When training we know which is dog and which is not, depending on the values weights between outputs and hidden layer are updated.
Let’s say the first output neuron is showing a dog and the second one is a cat. This neurons are both fully connected to previous hidden layer’s neuron so how do we show the importance of each neuron here? We will be doing it as we always do actually. Let’s say these neurons that have high values like 1, 0.9 and such. And passed to both of the ourput neurons. The dog and cat neuron will check whether is that feature belongs to them. Because at the end we know which image is what and the process goes like this. 1, 0.9 are both goes to cat and dog neurons, and if it is a dog, dog will neuron detects that I need to be fired up so these neurons are my key neurons, these neurons will affect me, if it was cat neuron that needs to be firing than the cat neuron will detect that these neurons are mine. And showing that these neurons belongs to one of them, they will increase the weight between them and the neurons themselves. So basically having a higher weight value will show that these neuron has high effect on this results. Let’s say neurons in the hidden layer with value 1 and 0.9 respectively represent the feature “Bging ears”, “Big nose” - definitely a dog, when 1 and 0.9 is sent both dog and cat, we (NN) knows this needs to be a dog, so it fires the dog neuron, and updates the weight between dog output neuron and these hidden layer neurons to have a higher value and for the cat these weights are decreased.
And if it was a “small nose”, “small tounge” feature the weights between cat and that neurons will be increased. This process is part of learning and will be ran so many times, so many iterations.
When the prediction time comes this time the output neurons does not know if that image is a dog or a cat, it is actually the goal of the prediction to find out wether it is a dog or a cat. But the do know which neurons to listen.
And here dog will have a great value of prediction lets say 0.87 and for cat its 0.13 and winner is clear.
Now you all know this steps, and what do they do.
As a result, we can say that CNN is basically used for taking the real definitive features of an input, and feeds it into an ANN.
So a twisted, blured, squashed, resized image of an object, is first analyzed into its key features first, and then an ANN is trained with that.
http://www.cs.cmu.edu/~aharley/vis/conv/flat.html
First Step -> convolved image, filtered.
Second Step-> Downsampling -> pooling : Even though the data in this layer looks less, it basically shows that features are preserved, and we have less data in total which is a great benefit of pooling.
So what is next?
Emotions, Tesla story about hitting a girl on the road for saving the family in the car.
Lack of Moral Decisions - Empathy
http://moralmachine.mit.edu/
There are some scenarios where you choose what should the car do.