This document provides an introduction to machine learning and decision trees. It defines key concepts like deep learning, artificial intelligence, and machine learning. It then discusses different machine learning algorithms like supervised learning, unsupervised learning, and decision trees. The document explains how decision trees are built by choosing features to split on at each node based on metrics like information gain and entropy. It provides an example of calculating entropy and information gain to select the best feature to split the root node on.
3. Before We Begin...
• Deep Learning (Subset of ML) - Uses Deep Neural Networks (a shallow network has one hidden
layer, a deep network has more than one) to learn features of the data in a hierarchical manner (e.g.
pixels from one layer recombine to form a line in the next layer)
– computer vision
– speech recognition
– natural language processing
• Artificial Intelligence – Basically a computer program doing something “smart”
– A bunch of if-then statements
–Machine Learning
• Machine Learning (Subset of AI) – A broad umbrella term for the technology that finds patterns in your
existing data, and uses them to make predictions on new data points
– Fraud Detection
– Deep Learning
4. AI | ML | DL – Maybe a picture is better?
Great Resource:
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
5. Timeline Of Machine Learning
1950 1952 1957 1979 1986 1997 2011 2012 2014 2016
The Learning
Machine (Alan Turing)
Machine Playing Checker
(Author Samuel)
Perceptron
(Frank Rosenblatt)
Stanford Cart
Backpropagation
(D. Rumelhart, G. Hinton, R. Williams)
Deep Blue Beats
Kasparov
Watson Wins Jeopardy
DeepMind Wins GoGoogle NN recognizing
cat in Youtube
Facebook DeepFace, Amazon
Echo, Turing Test Passed
6. Explosion in AI and ML Use Cases
Image recognition and tagging for photo organization
Object detection, tracking and navigation for Autonomous Vehicles
Speech recognition & synthesis in Intelligent Voice Assistants
Algorithmic trading strategy performance improvement
Sentiment analysis for targeted advertisements
17. Supervised Learning – How Machines Learn
Human intervention and validation required
e.g. Photo classification and tagging
Input
Label
Machine
Learning
Algorithm
Labrador
Prediction
Cat
Training Data
?
Label
Labrador
Adjust Model
18. Unsupervised Learning (learning without labels)
No human intervention required
(e.g. Customer segmentation)
Input
Machine
Learning
Algorithm
Prediction
19. Machine Learning Use Cases
Supervised Learning
Ø Classification
• Spam detection
• Customer churn prediction
Ø Regression
• House price prediction
• Demand forecasting
Unsupervised Learning
Ø Clustering
• Customer segmentation
There are other types as well
(Reinforcement Learning, for example)
but these two are the primary areas today
20. There are Lots of Machine Learning Algorithms
machinelearningmastery.com
21. There are Lots of Machine Learning Algorithms
machinelearningmastery.com
22. Color Size Fruit
Red Big Apple
Red Small Apple
Yellow Small Lemon
Red Big Apple
Green Big Apple
Yellow Big Lemon
Green Small Lemon
Red Big Apple
Yellow Big Lemon
Green Big Apple
Input Feature Target Label
Some Dataset
23. Decision Tree might look like …
Size of the fruit ?Apple
Color of the fruit ?
Apple Lemon
Lemon
Red
Green
Yellow
Big Small
Root
Branches
Leaf
Splitting
26. But the question is…given a dataset, how can
we build a tree like this ?
Size of the fruit ?Apple
Color of the fruit ?
Apple Lemon
Lemon
Red
Green
Yellow
Big Small
Root
Branches
Leaf
Splitting
27. But the question is…given a dataset, how can
we build a tree like this ?
Size of the fruit ?Apple
Color of the fruit ?
Apple Lemon
Lemon
Red
Green
Yellow
Big Small
Root
Branches
Leaf
Splitting
29. General DT structure
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Size of the fruit ?Apple
Color of the fruit ?
Apple Lemon
Lemon
Red
Green
Yellow
Big Small
Root
Branches
Leaf
Splitting
30. Training flow of a Decision Tree
• Prepare the labelled data set
• Try to pick the best feature as the root node
• Grow the tree until we get a stopping criteria
• Pass through the prediction data query through the tree
until we arrive at some le
• Once we get the leaf node, we have the prediction!! :)
31. Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Training data, everything is known
32. Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Training data, everything is known
33. Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
???
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Training data, everything is known
Prediction Data, only Feature 1 to 4 is known
UNKNOWN
34. Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
???
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Training data, everything is known
Prediction Data, only Feature 1 to 4 is known
UNKNOWN
Send the Query/Inference
35. Feature 1 Feature 2 Feature 3 Feature 4 Target Label
Feature 1 Feature 2 Feature 3 Feature 4 Target Label
???
Root
Interior
Interior
Leaf Leaf
Leaf
Leaf Interior
Leaf Leaf
Training data, everything is known
Prediction Data, only Feature 1 to 4 is known
UNKNOWN
Send the Query/Inference
Get the prediction
37. Entropy
It is the notion of the impurity of the data, now what is this new term impurity of
the data?
38. Entropy
It is the notion of the impurity of the data, now what is this new term impurity of
the data?
pure
39. Entropy
It is the notion of the impurity of the data, now what is this new term impurity of
the data?
pure less pure
40. Entropy
It is the notion of the impurity of the data, now what is this new term impurity of
the data?
impurepure less pure
41. Entropy
H(x) = - ∑ P(k) * log2(P(k))
k = ranges from 1 through n
H(x) = Entropy of x
P(k) = Probability of random variable x when x=k
42. Entropy
H(x) = - ∑ P(k) * log2(P(k))
k = ranges from 1 through n
H(x) = Entropy of x
P(k) = Probability of random variable x when x=k
43. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
44. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
X = “Play Ball”
45. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
P(k=Yes) => 9/14 = 0.64
X = “Play Ball”
46. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
X = “Play Ball”
47. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
P(k=Yes) => 9/14 = 0.64
P(k=No) => 5/14 = 0.36
log2 (0.64) = -0.64
log2 (0.36) = -1.47
X = “Play Ball”
53. Information Gain(IG)
Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Overcast Hot High FALSE Yes
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Sunny Mild Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Sunny Mild High TRUE No
Dataset – D
54. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
55. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
56. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
57. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
58. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
59. Outlook Temperature Humidity Windy Play ball
Rainy Hot High FALSE No
Rainy Hot High TRUE No
Rainy Mild High FALSE No
Rainy Cool Normal FALSE Yes
Rainy Mild Normal TRUE Yes
Outlook Temperature Humidity Windy Play ball
Overcast Hot High FALSE Yes
Overcast Cool Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Outlook Temperature Humidity Windy Play ball
Sunny Mild High FALSE Yes
Sunny Cool Normal FALSE Yes
Sunny Cool Normal TRUE No
Sunny Mild Normal FALSE Yes
Sunny Mild High TRUE No
Outlook
Sub-Dataset – D1
Sub-Dataset – D2
Sub-Dataset – D3
Dataset – D
HD1(”Play Ball”) = 0.69
HD2(”Play Ball”) = 0
HD3(”Play Ball”) = 0.97
Weighted Entropy
0.69
5/14tim
es
5/14times
4/14 times
IGOutlook = Entropy(D) -Weighted Entropy
= 0.97 - 0.69
= 0.25
60. IGOutlook = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Outlook
= 0.94 – 0.69
= 0.25
61. IGOutlook = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Outlook
= 0.94 – 0.69
= 0.25
IGTemperature = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Temperature
= 0.94 - 0.91
= 0.03
IGHumidity = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Humidity
= 0.94 - 0.79
= 0.15
IGWindy = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Windy
= 0.94 - 0.90
= 0.04
62. Maximum IG ? - Outlook
IGOutlook = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Outlook
= 0.94 – 0.69
= 0.25
IGTemperature = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Temperature
= 0.94 - 0.91
= 0.03
IGHumidity = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Humidity
= 0.94 - 0.79
= 0.15
IGWindy = HD(“Play Ball”) - Weighted Entropy after breaking the dataset with Windy
= 0.94 - 0.90
= 0.04
63. Here is the algorithmic steps
1. First the entropy of the total dataset is calculated for the target
label/class.
64. Here is the algorithmic steps
1. First the entropy of the total dataset is calculated for the target
label/class.
2. The dataset is then split on different features.
a) The entropy for each branch is calculated. Then it is added
proportionally, to get total weighted entropy for the split.
b) The resulting entropy is subtracted from the entropy before the split.
c) The result is the Information Gain.
65. Here is the algorithmic steps
1. First the entropy of the total dataset is calculated for the target
label/class.
2. The dataset is then split on different features.
a) The entropy for each branch is calculated. Then it is added
proportionally, to get total weighted entropy for the split.
b) The resulting entropy is subtracted from the entropy before the split.
c) The result is the Information Gain.
3. The feature that yields the largest IG is chosen for the decision node.
66. Here is the algorithmic steps
1. First the entropy of the total dataset is calculated for the target
label/class.
2. The dataset is then split on different features.
a) The entropy for each branch is calculated. Then it is added
proportionally, to get total weighted entropy for the split.
b) The resulting entropy is subtracted from the entropy before the split.
c) The result is the Information Gain.
3. The feature that yields the largest IG is chosen for the decision node.
4. Repeat step #2 and #3, for each subset of the data(for each internal
node) until:
a) All the dependent features are exhausted
b) The stopping criteria are met.
67. Thankfully, we do not have to do all this(like calculating
Entropy, IG, etc.), we have lots of libraries/packages
available in Python which we can use to solve a problem
with decision tree.
69. Amazon
Rekognition
Amazon
Personalize
Amazon
Textract
Amazon
Comprehend
Amazon
Translate
Amazon
Polly
Amazon
Transcribe
+ Medical
Amazon
Lex
V I S I O N T E X T C H A T B O T SS P E E C H P E R S O N A L I Z A T I O N
Ground Truth
data labelling
ML
Marketplace
SageMaker Studio IDE
SageMaker
Notebooks
SageMaker
Experiments
SageMaker
Debugger
SageMaker
Autopilot
SageMaker
Model Monitor
Model
training
Model
tuning
Model
hosting
Built-in
algorithms
SageMaker
Neo
N E W !
N E W !
N E W ! N E W !N E W !
Deep Learning
AMIs & Containers
GPUs and
CPUs
Inferentia
Elastic
Inference
FPGA
N E W !
N E W ! N E W ! N E W !
A M A Z O N
S A G E M A K E R
M L F R A M E W O R K S
& I N F R A S T R U C T U R E
A I S E R V I C E S
Amazon
Forecast
F O R E C A S T I N G
AWS ML Stack
Broadest and most complete set of Machine Learning capabilities
70. Amazon
Rekognition
Amazon
Personalize
Amazon
Textract
Amazon
Comprehend
Amazon
Translate
Amazon
Polly
Amazon
Transcribe
+ Medical
Amazon
Lex
V I S I O N T E X T C H A T B O T SS P E E C H P E R S O N A L I Z A T I O N
Ground Truth
data labelling
ML
Marketplace
SageMaker Studio IDE
SageMaker
Notebooks
SageMaker
Experiments
SageMaker
Debugger
SageMaker
Autopilot
SageMaker
Model Monitor
Model
training
Model
tuning
Model
hosting
Built-in
algorithms
SageMaker
Neo
N E W !
N E W !
N E W ! N E W !N E W !
Deep Learning
AMIs & Containers
GPUs and
CPUs
Inferentia
Elastic
Inference
FPGA
N E W !
N E W ! N E W ! N E W !
A M A Z O N
S A G E M A K E R
M L F R A M E W O R K S
& I N F R A S T R U C T U R E
A I S E R V I C E S
Amazon
Forecast
F O R E C A S T I N G
AWS ML Stack
Broadest and most complete set of Machine Learning capabilities