The document provides an overview of machine learning concepts including linear regression, artificial neural networks, and convolutional neural networks. It discusses how artificial neural networks are inspired by biological neurons and can learn relationships in data. The document uses the MNIST dataset example to demonstrate how a neural network can be trained to classify images of handwritten digits using backpropagation to adjust weights to minimize error. TensorFlow is introduced as a popular Python library for building machine learning models, enabling flexible creation and training of neural networks.
3. Linear regression
Finding the relation between the age
and the salary.
Predicting the salary for any given age
3
Historical
Data points
Experience
Salary
4. Historical
Data points
Salary (dependent)
Minimize the error
The Error (or Residual) is the offset of
the dependent variable from the
independent variable.
The goal of any regression is to minimize
the error for the training data and to
FIND THE OPTIMAL LINE (or curve in
case of logistic regression).
4
Error
Experience (independent)
6. Minimize the error with Stochastic Gradient
Descent (SGD)
Error =
1
𝑁 𝑖=1
𝑁
(𝑦𝑖 − 𝑦𝑖)2
N -> number of historical data points
1. Initialize some value for the slope
and intercept.
2. Find the current value of the error
function.
6
Error
Slope
Intercept
3. Find the slope at the current point (partial derivative) and move slightly
downwards in the direction.
4. Repeat until you reach a minimum OR stop after certain number of iterations
8. Multiple Linear Regression
• Simple linear regression:
𝑌 = 𝑏0 + 𝑏1*𝑥1
• Multiple linear regression:
𝑌 = 𝑏0 + 𝑏1*𝑥1 + 𝑏2*𝑥2 + … + 𝑏 𝑛∗𝑥 𝑛
Important note:
You need to exclude variables that will “mess” the prediction and keep the ones
that actually help predicting the desired result.
8
11. “Traditional” ML vs. “Representation” ML
• “Traditional” ML based systems rely on experts to decide what features to pay
attention to.
• “Representation” ML based systems figure out by themselves what features to pay
attention to.
• The most common representation ML algorithm is called Artificial Neural Network
• ANN are commonly used for:
• Image/video/audio processing
• Speech recognition
• Natural language processing (NLP)
• Games
11
13. Artificial Neural Networks - ANN
• Inspired by the neurons in the human mind.
• Can learn and organize data and thus create an understanding of relationships.
13
14. Artificial Neuron
14
Neuron
Input Signal 1 (X1)
Input Signal 2 (X2)
Input Signal n (Xn)
Output Signal
⁞
Independent variables
Dependent variable
Can be:
• Continuous (price)
• Binary (Yes/No)
• Categorical
The neuron behaves like a function
W1
W2
Wn
15. The neural network flow
In neural networks, the activation functions are non-linear.
15
17. MNIST Example
• NIST = US National Institute of Standards and
Technologies
• MNIST – a subset of NIST’s handwritten digit
data set
• Consists of a training set of 60,000 samples and
a test set of 10,000 samples.
• 28x28 pixels grayscale images and digit labels
for each image.
• http://Yann.lecun.com/exdb/mnist
17
21. Using “softmax” activation function
• In this example we will use “softmax” activation function:
• Good for classification problems.
• Increases the differences so the output gets closer to 1 or closer to 0
21
23. Minimize the error with Gradient Descent
Optimization Function
Error =
1
𝑁 𝑖=1
𝑁
𝑒𝑖
2
N -> number of historical datapoints
1. Initialize some value for the slope
and intercept.
2. Find the current value of the error
function.
23
Error
Slope
Intercept
3. Find the slope at the current point (partial derivative) and move slightly
downwards in the direction.
4. Repeat until you reach a minimum OR stop after certain number of iterations
24. Training the neural network
• How can we know what should be the weights and biases?
• Through training the network
• The code will figure out the correct values BY ITSELF
• How does the training work?
1. Starting with zero weights and bias, we multiply the input values by the weights and add the bias
2. We get an incorrect output
But we know what the correct output should be.
1. The system measures the difference between the incorrect output and the correct output. This is
call “loss measurement function”.
• The loss measurement function calculates how big the error is.
2. Now the system will change the weights and biases to minimize the error. This is called
“optimization function” and goes back to step 3 until it cannot reduce the error anymore.
24
25. Back propagation - adjusting the weights
Get Input
Values
Multiply input values by the
weights and add biases
Run activation
function and get
predictions
Calculate the
distance from the
Correct results
Apply optimization on the
weights to reduce the error
25
26. Back propagation - adjusting the weights
Get Input
Values
Multiply input values by the
weights and add biases
Run activation
function and get
predictions
Calculate the
distance from the
Correct results
Apply optimization on the
weights to reduce the error
26
9876543210
0100000000
9876543210
0.20.40.10.30.10.80.20.60.10.2
27. Back propagation - adjusting the weights
Get Input
Values
Multiply input values by the
weights and add biases
Run activation
function and get
predictions
Calculate the
distance from the
Correct results
Apply optimization on the
weights to reduce the error
27
9876543210
0100000000
9876543210
0.20.50.10.30.10.70.20.40.10.1
28. Back propagation - adjusting the weights
Get Input
Values
Multiply input values by the
weights and add biases
Run activation
function and get
predictions
Calculate the
distance from the
Correct results
Apply optimization on the
weights to reduce the error
28
9876543210
0100000000
9876543210
0.10.60.10.20.10.60.20.30.10.1
29. Back propagation - adjusting the weights
Get Input
Values
Multiply input values by the
weights and add biases
Run activation
function and get
predictions
Calculate the
distance from the
Correct results
Apply optimization on the
weights to reduce the error
29
9876543210
0100000000
9876543210
0.10.70.10.20.10.40.10.20.10.1
30. Back propagation - adjusting the weights
Get Input
Values
Multiply input values by the
weights and add biases
Run activation
function and get
predictions
Calculate the
distance from the
Correct results
Apply optimization on the
weights to reduce the error
30
9876543210
0100000000
9876543210
0.10.90.10.100.10.10.10.10.1
Correct!
33. Tensor
• An n-dimensional array or list used to represent data
• Defined by the 3 properties:
• Rank: Scalar (number), Vector (1-dim array), Matrix (2-dim array), Cube, etc.
• Shape
• Type
33
TypeShapeRankExample
Int32[]0 (scalar)1
Int32[5]1 (vector)[1, 5, 3, 6, 2]
Int32[2, 5]2 (matrix)[[1, 5, 3, 8, 4], [3, 2, 6, 4, 7] ]
Int32[3, 2, 3]3 (cube)[ [ [1, 6, 3], [2, 4, 3] ]
[ [2, 6, 2], [3, 7, 4] ]
[ [1, 9, 2], [4, 8, 3] ] ]
34. What is TensorFlow
• The most popular Python library for building ensemble algorithms – mainly NN.
• Initially developed by Google and today it is open sourced
• Provides a library of predefined versions of many common ML algorithms, but also
enables to flexibly create your own algorithm.
• Can harness the GPUs
• Scalable – using “execution master” you can run on a laptop as well as on a large
scale cluster in remote servers.
34
35. Tensor Features and Tools
• Name property - used to identify elements in the graph
• Name Scope property – used for grouping elements (like “conv1” for 1st conv layer)
• Summary class – has methods for writing summaries to log files. Can capture how
elements change over time.
• TensorBoard – A web server that uses the log files to visualize the computation
graph and training progress. Can be used from remote desktops.
• Common add-ons (for easier developement):
• TFLearn - Simplifies the use of TensorFlow only and can converse with TF data types.
• Keras – Simplification which supports multiple frameworks (including Microsoft CNTK).
35
36. Training neural networks with TensorFlow
With TensorFlow you need define the following:
1. The input data:
• “Placeholders” – The input training data.
• “Variables” – What we ask TF to compute through training. With neural network these are
weights and biases.
2. The inference function (which is applied on the weights and biases).
3. Loss/error measurement function (example: “Cross Entropy”)
4. Optimization function to minimize loss (example: “Gradient Descent”)
36
37. TensorFlow - MNIST demo
37
ImplementationConcept
MNIST dataPrepared Data
Sum(X* weight) + bias -> ActivationInference
Cross EntropyLoss Measurement
Gradient descent optimizerOptimize to minimize loss
39. Why Convolutional Neural Networks (CNN)
• Problem – Flattening the images caused us to lose the shape information.
• When we see a digit, we recognize the lines and curves.
• We need to “zoom out” slowly from the picture.
39
42. Deep Learning
• Use of multi layered neural network is called Deep Learning
• Some applications:
• Natural language processing (NLP)
• Face recognition
• Image analysis (what’s in the picture)
• Image search
• Voice analysis
• Video analysis
42
56. Filtering: The math behind the match
1. Line up the feature and the image patch.
2. Multiply each image pixel by the corresponding feature pixel.
3. Add them up.
4. Divide by the total number of pixels in the feature.
76. Pooling: Shrinking the image stack
1. Pick a window size (usually 2 or 3).
2. Pick a stride (usually 2). A stride = step.
3. Walk your window across your filtered images.
4. From each window, take the maximum value.
94. Fully connected layer
• Vote depends on how strongly a value predicts X or O
X
O
1.00
0.55
0.55
1.00
1.00
0.55
0.55
0.55
0.55
1.00
1.00
0.55
95. Fully connected layer
• Vote depends on how strongly a value predicts X or O
X
O
0.55
1.00
1.00
0.55
0.55
0.55
0.55
0.55
1.00
0.55
0.55
1.00
96. Fully connected layer
• Future values vote on X or O
X
O
0.9
0.65
0.45
0.87
0.96
0.73
0.23
0.63
0.44
0.89
0.94
0.53
97. Fully connected layer
• Future values vote on X or O
X
O
0.9
0.65
0.45
0.87
0.96
0.73
0.23
0.63
0.44
0.89
0.94
0.53
98. Fully connected layer
• Future values vote on X or O
X
O
0.9
0.65
0.45
0.87
0.96
0.73
0.23
0.63
0.44
0.89
0.94
0.53
99. Fully connected layer
• Future values vote on X or O
X
O
0.9
0.65
0.45
0.87
0.96
0.73
0.23
0.63
0.44
0.89
0.94
0.53
100. Fully connected layer
• Future values vote on X or O
X
O
0.9
0.65
0.45
0.87
0.96
0.73
0.23
0.63
0.44
0.89
0.94
0.53
101. Fully connected layer
• Future values vote on X or O
X
O
0.9
0.65
0.45
0.87
0.96
0.73
0.23
0.63
0.44
0.89
0.94
0.53
102. Fully connected layer
• A list of feature values becomes a list of votes.
X
O
0.9
0.65
0.45
0.87
0.96
0.73
0.23
0.63
0.44
0.89
0.94
0.53
103. Putting it all together
• A set of pixels becomes a set of votes.
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
X
O
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5
104. Gradient descent
• For each feature pixel and voting
weight, adjust it up and down a
bit and see how the error
changes.
weighterror
105. Gradient descent
• For each feature pixel and voting
weight, adjust it up and down a
bit and see how the error
changes.
weighterror
106. Tuning the CNN
• Architecture
• How many of each type of layer?
• In what order?
• Convolution
• Number of features
• Size of features
• Pooling
• Window size
• Window stride
• Fully Connected
• Number of neurons
107. CNN - Not just for images
Things closer together are more closely related than things far away:
• 2D Images.
• 3D Images.
• Audio
• Video
• Signal processing
• NLP – semantic parsing, sentence modelling and more.
• Drug discovery - Chemical interactions,
109. Machine Learning in the near future
There is a lot of research around ML in the academia and in commercial companies
and a lot of money is invested there….
• ML will be used adopted in much greater scales across almost every industry.
• ML will be embedded everywhere
• Specialized hardware for ML will enable deeper and faster learning
• Machine Learning as a Service (MLaaS) market will grow substantially.
• ML will save more lives.
• ML will automate more repetitive tasks.
109
110. Why should developers/data
engineers/DBAs invest time in ML?
• Data is the fuel of every ML system – comes from the data platforms DBAs
manage.
• The data preparation before the training is the most time consuming part.
• The DBAs can definitely assist here.
• ML – not just for data scientists (up to a certain level)
• Developers already use ML
• Data engineers use ML.
• ML can be used by DBAs too – why not?
• ML will become more and more easy to use:
• Azure ML
• AWS ML
110