Machine Learning : why we should know and how it works

Machine Learning –
Why we should know
and how it works
Kevin Lee, AVP of ML & AI Consultant

The Agenda
➢ Why Machine Learning?
➢ Introduction of Machine Learning
– Supervised, Unsupervised, Deep
Neural Network
➢ Machine Learning
Implementation
➢ PhUSE Machine Learning Working
Group
➢ Questions and Discussion

Kevin, do you
know about
Machine
Learning?

Why did people ask / expect me if I
know about Machine Learning?
• Programming
• Statistics / modeling
• Working with data all the times

What is ML?
An application of
artificial intelligence
(AI) that provides
systems the ability to
automatically learn
and improve from
experience without
being explicitly
programmed.

Explicit
Programing
Machine Learning

How does Human Learn?
- Experience

How does Machine learn?
Algorithm
Input
Data

How does Machine learn?
Algorithm
Test Data (cats)
Real Data
cat

How does Machine Learning
work?
X0 X1 X2 … Xn Y
• Hypothesis Function
- hθ(x) = θx + b
• Minimize Cost
Function,
J(θ) = hθ(x) – Y,
using Gradient
Descent
Input data
Algorithm

Machine Learning Algorithms
• Hypothesis function
• Model for data
• Y = θ0x0 + θ1x1 + θ2x2 + θ3x3 + …. + θnxn (Y = 2X + 30)
• Cost function
• measures how well hypothesis function fits into data.
• Difference between actual data point and
hypothesized data point.
• Gradient Descent
• Engine that minimizes cost function

Machine Learning Training Process
1. Select Data and hypothesis function
2. Start with random parameters of hypothesis function
3. Process first sets of data in hypothesis function
4. Find predicted values from hypothesis function
5. Calculate cost function ( actual – predicted)
6. Minimize cost function using Gradient Descent
7. Update parameters of hypothesis function based on cost
function
8. Repeat steps ( 3 and 7) with next sets of data until the cost
function becomes 0 or at the lowest point

Machine finds best model
using input data
X
Y
hθ(x) = x + 30
hθ(x) = 2x + 30

Model building in Machine Learning
Hypothesis
Function
New
Iteration
Cost
Function
Calculation
Applying
Gradient
Descent
Updating
Parameters

Best model can provide best
predicted value
X
Y
Xi
Yi

More data,
Better model,
Better prediction
X
Y
X
Y
Different data,
Different model,
Different prediction

X
Y
Garbage
in
Garbage
out
Bad data, Bad model, Bad prediction

Machine
Learning Type
• Supervised - we know
the correct answers
• Unsupervised – no
answers
• Artificial Neural
Network – like human
neural network

Machine Learning Type
• Supervised
• Input data labeled : has correct
answers
• Specific purpose
• Types
• Classification for distinct output
values
• Regression for continuous output
values
• Unsupervised
• Input data not-labeled : No correct
answers
• Exploratory
• Types : Clustering (clusters)
X0 X1 X2 … Xn Y
X0 X1 X2 … Xn

Classification
X1
• Categorical target
• Binary or Multiple
Classification
• Example : Yes/No, 0 to 9,
mild/moderate/severe
• Logistic Regression, SVM,
K-nearest Neighbor, Naïve
Bayes, Decision Tree,
Random Forests, XGBoost
X2

Support Vector Machine (SVM)
• SVM is one of the most powerful classification
model (both binary & multi), especially for
complex, but small/mid-sized datasets.
• Best for binary classification
• Easy to train
• It finds out a line/ hyper-plane (in multidimensional
space that separate outs classes) that maximizes
margin between two classes.
• parameters – Kernel (linear, polynomial), gamma,
margin(a separation of line to the closest class
points)
*** SAS codes;
proc svmachine data=x_train C=1.0; kernel linear;
input x1 x2 x3 x4 / level=interval;
target y;
run;

SVM in SAS Visual
Data Mining and
Machine Learning
SAS Machine
Learning portal
can provide an
interactive
modeling.

SVM in SAS Visual Data
Mining and Machine
Learning
SVM in SAS Visual Data Mining and
Machine Learning

Python codes for SVM
#import ML algorithm
from sklearn.svm import SVC
#prepare train and test datasets
x_train = …
y_train = ….
x_test = ….
#select and train model
svm = SVC(kernel=‘linear’, C=1.0, random_state=1)
svm.fit(x_train, y_train)
#predict output
predicted = svm.predict(x_test)

Decision Trees
The widely used
machine learning
algorithm that use tree
to visually and explicitly
represent decision and
its conditions.
It is used for both
classification (more
popular) and regression.
Why Decision Trees?
• Easy to understand and interpret
• Require a very little data preparation
• Possible to validate model using statistical tests.

Python codes for Decision Tree
from sklearn.tree import DecisionClassifier
x_train = …
y_train = ….
x_test = ….
d_tree = DecisionClassifier(max_depth=4)
d_tree.fit(x_train, y_train)
#predict output
predicted = d_tree.predict(x_test)

Decision Tree in SAS Visual Data
Mining and Machine Learning

Python in Machine Learning
Most popular programming language in Machine
Learning
• Scikit-Learn
• simple and efficient ML tool, open-source
• Classification, Regression, SVM, Clustering,
Dimensionality Reduction, Decision Trees,
Random Forests
• TensorFlow
• developed by Google, open source since
Nov.’2015
• Artificial Neural Network, Convolutional NN,
Recurrent NN, Autoencoder, Reinforcement
Learning

Logistic Regression
• Output: Binary target (Yes/No)
• Input: Continuous/categorical variables
• Example : predicting if the tumor is malignant based on tumor size
Tumor size
(mm)
Yes
(1)
No (0)
Yes
(1)
No (0) Tumor size
(mm)
Malignant?Malignant?
Threshold : 0.5

Logistic Regression Architecture
• Hypothesis function : hθ(x) = 1 / (1 + e-z)
where z= ( θ0x0 + θ1x1 + θ2x2 + θ3x3 +,, + θnxn )
x1
x2
x3
xn
∑ f(z)
θ1
θ2
θ3
θn
x0
z hθ(x)
• What are the parameters?θ0

Python codes for Logistic Regression
from sklearn.linear_model import LogisticRegression
x_train = …
y_train = ….
x_test = ….
lr = LogisticRegression()
lr.fit(x_train, y_train)
#predict output
predicted = lr.predict(x_test)

Regression
x
• To predict values of a desired target
quantity when the target quantity is
continuous.
• Output: Continuous variables
• Example : predicting house price per sqft
• Types: Linear Regression, Polynomial
Regression, Decision Tree, Random Forest
• Hypothesis function:
• hθ(x) = θ0x0 + θ1x1 + θ2x2 + θ3x3 +,, + θnxn
• y = 10 + 2x
• Accuracy of Regression model
• Square Error(SE): sum (y – pred)2
• Mean Square Error (MSE) = (1/n)*SE
• Root Mean Square Error (RMSE) = square
root of MSE
• Relative MSE (rMSE) = MSE / Var(y)
• R2 = 1 - rMSE
y

Python codes for ML Linear Regression
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
x_train = …
y_train = ….
x_test = ….
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)
#predict output
predicted = linear.predict(x_test)
#evaluate model (MSE)
Mean_squiared_error(y_test, predicted)

Unsupervised Machine Learning
• Not-labeled input data : no correct answers
• Exploratory
• Clustering : the assignment of set of observations into subsets
(clusters) so that data in the same clusters are more similar to each
other than to those from different clusters.
• Algorithms : k-means
• Industry implementation - the grouping of documents, music, movies
by different topics, finding customers that share similar interests based
on common purchase behaviors.
• Customer segmentation
• Document clustering
• Image segmentation
• Recommendation engines

k-means clustering
• The simplest and popular unsupervised machine learning algorithm
• To group similar data points together and discover underlying patterns. To
achieve this objective, K-means looks for a fixed number (k) of clusters in a
dataset.
• the K-means algorithm identifies k number of centroids, and then allocates
every data point to the nearest cluster, while keeping the centroids as
small as possible.
How it works
1. Randomly pick k centroids as initial
cluster centers.
2. Assign each sample to the nearest
centroid.
3. Move the centroids to the center of the
samples that were assigned to it.
4. Repeat 2 and 3 until the cluster
assignments do not change under a
tolerance.

Python codes for k-means
from sklearn.cluster import KMeans
#prepare datasets of X whose size is (100,2)
X = …
# build k means model with 2 clusters
kmeans = KMeans(n_clusters = 2, tol=0.0001, max_iter=100)
kmeans.fit(X)
#print the results of the value of centroids
print(kmeans.cluster_centers_) # two centroids
---- array([[-0.94665068, -0.97138368],[ 2.01559419, 2.02597093]])
#print the results of the value of centroids
print(kmeans.labels_) # 100 labels based on two centroids.
---- array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Artificial Neural Network (ANN)
• Most powerful ML
algorithm
• Game Changer
• Works very similar to
human brain – Neural
Network
• Types – DNN, CNN, RNN
• Architecture
• Artificial Neurons
• Input Layers
• Hidden Layers
• Output Layers

Human Neural Network vs Artificial
Neural Network (ANN)
➢ Human
Neurons :
Neural
Network – 100
billions
➢ Artificial
Neurons :
Artificial Neural
Networks

Artificial Neuron
• Input (x1, x2, x3,,,,xn) – numeric values that goes into neuron model
• Weight (w1, w2, w3,,,,wn) – weight of each input values
• Bias (b1) – bias of (input * weight)
• Artificial neuron
• Model (input values * weights + bias): z1 = x1w1,1 + x2w1,2 +
x3w1,3 +,,,, + xnw1,n + b1
• Activation function : a1 = f(z1)
x1
x2
x3
xn
∑
Activation
function
W1,1
W1,2
W1,3
W1,n
b1
z1 a1

Neural Network in SAS using
proc nnet
Proc nnet data=Train;
architecture mlp;
hidden 4;
hidden 2;
input x1 x2 x3 ;
target Y;
Run;

Neural Network in SAS Visual Data
Mining and Machine Learning

Python codes for DNN
#import ANN - TensorFlow
Import tensorflow as tf
X = tf.placeholder(..)
Y = tf.placeholder(..)
hidden1 = tf.layer.dense(X, 4, activation=tf.nn.relu)
hidden2 = tf.layer.dense(hidden1, 2, activation=tf.nn.relu)
logits = neuron_layer(hidden2, 2)
….
loss = tf.reduce_mean(….)
optimizer = tf.train.GradientDescentOptimezer(0.1)
traing_op = optimizer.minimizer(loss)
tf.Session.run(training_op, feed_dict={X:x_train, Y:y_train})

Tensor Flow Demo
http://playground.tensorflow.org

Where is SAS in ML?
SAS Visual Data Mining and Machine Learning
• Linear Regression
• Logistic Regression
• Support Vector Machine
• Deep Neural Networks ( limited layers)

How ML is
being used in
our daily life

Recommendation
• Amazon
• Netflix
• Spotify

Customer
Service
• Online
Chatting
• Call

Amazon Go – Cashless
Shopping

Why is AI(ML) so popular now?
• Cost effective
• Automate a lot of works
• Can replace or enhance human labors
• “Pretty much anything that a normal person
can do in <1 sec, we can now automate with
AI” Andrew Ng
• Accurate
• Better than humans
• Can solve a lot of complex business
problems

Now, how Pharma goes into AI/ML
market
• GSK sign $43 million contract with
Exscientia to speed drug discovery
• aiming ¼ time and ¼ cost
• identifying a target for disease intervention to a
molecule from 5.5 to 1 year
• J&J
• Surgical Robotics – partners with Google.
Leverage AI/ML to help surgeons by interpreting
what they see or predict during surgery

market
• Roche
• With GNS Healthcare, use ML to find novel targets
for cancer therapy using cancer patient data
• Pfizer
• With IBM, utilize Watson for drug discovery
• Watson has accumulated data from 25 million
articles compared to 200 articles a human
researcher can read in a year.

• Novartis
• With IBM Watson, developing a cognitive
solution using real-time data to gain better
insights on the expected outcomes.
• With Cota Healthcare, aiming to accelerate
clinical development of new therapies for
breast cancer.
• New CEO claimed Novartis as Medicine
and Data Science company.
market

• First FDA approval for medical
device using AI/ML
• FDA is open for machine learning
algorithm based on devices.
How FDA sees ML/AL

ML application in Pharma R&D
• Drug discovery
• Drug candidate selection
• Clinical system optimization
• Medical image recognition
• Medical diagnosis
• Optimum site selection /
recruitment
• Data anomality detection
• Personized medicine

Adoption of AI/ML in Pharma
• Slow
• Regulatory restriction
• Machine Learning Black Box challenge
– need to build ML models, statistically
or mathematically proven and
validated, to explain final results.
• Big investment in Healthcare and a lot
of AI Start up aiming Pharma.

Healthcare AI/ML market
• US - 320 million in 2016
• Europe – 270 million in 2016
• 40% annual rate
• 10 billion in 2024
• Short in talents

PhUSE Machine Learning & AI Project
Goals
• To raise awareness in Machine Learning in
Pharma Industry
• To get more helps (volunteers and expertise)
• To Introduce Machine Learning Education
• To curate Machine Learning Materials
• White papers
• Projects

Machine Learning & AI Team
• PhUSE Wiki in
http://www.phusewiki.org/wiki/index.php?titl
e=Machine_Learning_/_Artificial_Intelligence
• Project of “Educating For the Future”

Why Machine
Learning & AI
Team?

• Kevin Lee at
kevin.kyosun.lee@gmail.com
• Nicolas Dupuis at
ndupuis@protonmail.com
• Wendy Dobson at wendy@phuse.com
How to Volunteer

Kevin, do
you know
about
Machine
Learning?

Thanks!!!
Please contact
kevin.kyosun.lee@gmail.com
https://www.linkedin.com/in/
HelloKevinLee/

Machine Learning : why we should know and how it works

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Machine Learning : why we should know and how it works

Ähnlich wie Machine Learning : why we should know and how it works (20)

Mehr von Kevin Lee

Mehr von Kevin Lee (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Machine Learning : why we should know and how it works