Machine learning

Machine
Learning
Andrea Iacono
https://github.com/andreaiacono/MachineLearning

Machine Learning: Intro

What is Machine Learning?
[Wikipedia]: a branch of artificial intelligence
that allows the construction and the study of
systems that can learn from data


Some approaches:
- Regression analysis
- Similarity and metric learning
- Decision tree learning
- Association rule learning
- Artificial neural networks
- Genetic programming
- Support vector machines
(classification and regression analysis)
- Clustering
- Bayesian networks


Supervised learning
vs
Unsupervised learning
Machine learning
vs
Data mining

Machine Learning: Regression analysis

Regression Analysis
A statistical technique for estimating
the relationships among a dependent
variable and independent variables


Prediction of house prices
Size (x)

Price (y)

0.80

70

0.90

83

1.00

74

1.10

93

1.40

89

1.40

58

1.50

85

1.60

114

1.80

95

2.00

100

2.40

138

2.50

111

2.70

124

3.20

172

3.50

172



Hypothesis:

h θ ( x )=θ0 + θ1 x


Hypothesis:

h θ (x )=θ0 + θ1 x

Cost function for linear regression:
m
1
J (θ 0, θ1 )=
(h θ (x (i) )− y(i ) )2
∑
2m i=1


Hypothesis:

h θ (x )=θ0 + θ1 x

Cost function for linear regression:
m
1
J (θ 0, θ1 )=
(h θ (x (i) )− y(i ) )2
∑
2m i=1

Gradient Descent
repeat until convergence :
m
1
(i )
(i )
θ 0=θ 0−α ∑ (hθ ( x )− y )
m i =1
m
1
θ1 =θ1 −α ∑ [(h θ (x (i) )− y(i )) x (i) ]
m i =1


Iterative minimization of cost function
with gradient descent


Hands on


Regression analysis
- one / multiple variables
- linear / higher order curves
- several optimization algorithms
- linear regression
- logistic regression
- simulated annealing
- ...


Overfitting vs underfitting

Machine Learning: Similarity and metric learning

Similarity and metric learning
- concept of distance


Euclidean distance

euclidean distance (p , q )=

√

n

∑ (p i −q i )2

i =1


Manhattan distance

n

manhattan distance (p , q )=∑ ∣(p i −q i )∣
i =1


Pearson's correlation

n

n

∑ pi ∑ qi

n

∑ (p i q i )− i =1
Pearson ' s correlation ( p , q )=

i =1

√

n

n

2
i

(∑ p −
i =1

i =1

n
n

2

(∑ p i )
i =1

n

2

n

(∑ qi )

i =1

n

)( ∑ q 2 −
i

i =1

)


Collaborative filtering
Searches a large group of users for finding a
small subset that have tastes like yours.
Based on what this subset likes or dislikes
the system can recommend you other items.
Two main approaches:
- User based filtering
- Item based filtering


User based filtering
- based on ratings given to
the items, we can measure
the distance among users
- we can recommend to the
user the items that have
the highest ratings among
the closest users


Hands on


Is user based filtering good for
- scalability?
- sparse data?
- quickly changing data?


Is user based filtering good for
- scalability?
- sparse data?
- quickly changing data?

No, it's better to use item
based filtering


Euclidean distance for item based filtering:
nothing has changed!
- based on ratings got from
the users, we can measure
the distance among items
- we can recommend an
item to a user, getting the
items that are closer to
the highest rated by the
user

Machine Learning: Bayes' classifier

Bayes' theorem
P ( A∣B )=

P (B∣A)P (A )
P (B )

Example: given a company where 70% of developers use Java and 30%
use C++, and knowing that half of the Java developers always use
enhanced for loop, if you look at the snippet:
for (int j=0; j<100; j++) {
t = tests[j];
}
which is the probability that the developer who wrote it uses Java?


Bayes' theorem
P ( A∣B )=

P (B∣A)P (A )
P (B )

for (int j=0; j<100; j++) {
t = tests[j];
}
Hint:
A = developer uses Java
B = developer writes old for loops


Bayes' theorem
P ( A∣B )=

P (B∣A)P (A )
P (B )

for (int j=0; j<100; j++) {
t = tests[j];
}
Solution:
A = developer uses Java
B = developer writes old for loops
P(A) = prob. that a developer uses Java = 0.7
P(B) = prob. that any developer uses old for loop = 0.3 + 0.7*0.5 = 0.65
P(B|A) = prob. that a Java developer uses old for loop = 0.5
P (B∣A)P (A) 0.5⋅0.7
P (A∣B )=
=
=0.54
P (B )
0.65


Naive Bayes' classifier
- supervised learning
- trained on a set of known classes
- computes probabilities of elements to be in a class
- smoothing required
n

∏ P (c∣w i )
P c (w 1 , .... , w n )=

i =1

n

n

i =1

i =1

∏ P (c∣w i )+ ∏ (1−P (c∣w i ))


Naive Bayes' classifier
Example
- we want a classifier for Twitter messages
- define a set of classes: {art, tech, home, events,.. }
- trains the classifier with a set of alreay classified tweets
- when a new tweet arrives, the classifier will (hopefully)
tell us which class it belongs to


Hands on


Sentiment analysis
- define two classes: { +, - }
- define a set of words: { like, enjoy, hate, bore, fun, …}
- train a NBC with a set of known +/- comments
- let NBC classify any new comment to know if +/- performance is related to quality of training set

Machine Learning: Clustering

Clustering
- Unsupervised learning
- Different algorithms:
- Hierarchical clustering
- K-Means clustering
- ...
Common use cases:
- navigation habits
- online commerce
- social/political attitudes
- ...


K-Means clustering
K-Means aims at identifying
cluster centroids, such that an
item belonging to a cluster X,
is closer to the centroid of
cluster X than to the centroid
of any other cluster.


K-Means clustering
The algorithm requires a
number of clusters to start, in
this case 3. The centroids are
placed in the item space,
typically in random locations.


K-Means clustering
The algorithm will then assign
to each centroid all items that
are closer to it than to any
other centroid.


K-Means clustering
The centroids are then moved
to the center of mass of the
items in the clusters.


K-Means clustering
A new iteration occurs, taking
into account the new centroid
positions.


K-Means clustering
The centroids are again moved
to the center of mass of the
items in the clusters.


K-Means clustering
Another iteration occurs,
taking into account the new
centroid positions.


K-Means clustering
Another iteration occurs,
taking into account the new
centroid positions. Note that
this
time
the
cluster
membership did not change.
The cluster centers will not
move anymore.


K-Means clustering
The solution is found.


Hands on

Machine Learning: Neural networks

Neural networks
A logical calculus of the ideas immanent in nervous activity
by McCulloch and Pitts in 1943


Neural networks
Feedforward Perceptron


Neural networks
Logic operators with neural networks:
Threshold = 0
X0
-10
-10
-10
-10

X1
0
0
20
20

X2
0
20
0
20

Σ
-10
10
10
30

Result
0
1
1
1

OR operator


Neural networks
Threshold = 0
X0
-30
-30
-30
-30

X1
0
0
20
20

X2
0
20
0
20

Σ

Result

which operator?


Neural networks
Threshold = 0
X0
-30
-30
-30
-30

X1
0
0
20
20

X2
0
20
0
20

Σ
-30
-10
-10
10

Result
0
0
0
1

AND operator


Hands on


Neural networks
Backpropagation
Phase 1: Propagation
- Forward propagation of a training pattern's input
through the neural network in order to generate
the propagation's output activations
- Backward propagation of the propagation's output
activations through the neural network using the
training pattern target in order to generate the
deltas of all output and hidden neurons
Phase 2: Weight update
- Multiply its output delta and input activation to
get the gradient of the weight
- Bring the weight in the opposite direction of the
gradient by subtracting a ratio of it from the weight


Neural networks
Multilayer perceptrons

Machine Learning: Genetic algorithms

Genetic algorithms
GA is a programming technique that mimics
biological evolution as a problem-solving strategy

Steps
- maps the variables of the problem into a sequence of
bits, a chromosome

Chromosome
- creates a random population of chromosomes
- let evolve the population using evolution laws:
- the higher the fitness, the higher the chance of breeding
- crossover of chromosomes
- mutation in chromosomes
- if otpimal solution is found or after n steps the
process is stopped


Genetic algorithms
Mutation

Crossover


Hands on

Machine Learning

Thanks!
The code is available on:
https://github.com/andreaiacono/MachineLearning

Machine learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Machine learning

Ähnlich wie Machine learning (20)

Mehr von Andrea Iacono

Mehr von Andrea Iacono (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Machine learning