Machine Learning: Demystify and Scale

A Gentle Introduction to
Machine Learning
Applications, Algorithms and Scalable Deployment
Eindhoven Developers Meetup
2015-04-29

Machine Learning:
everybody talks about it

Today let’s try to demystify machine learning
Focus less on glorifying the machine learning and
more on the technical details
Demystify and Glorify

The slides and the talk solely represent the speakers’
personal views

 In its general form, Machine Learning means
teaching to a computational machine the way of
solving a problem by giving examples
 Machine learning algorithms then automatically
infer rules to associate inputs to the correspondent
outputs
Machine Learning

Input Program
Output
General
programming
Input Output
Learned
Program
Machine
Learning
New Input
Inferred
Output
General Programming vs Machine Learning

TRAIN TEST
Lion
PandaGiraffe Tiger
Elephant
Input – Output Examples
Tiger

Real Example: Characters Recognition
 Letters have predefined shapes: we can measure and quantify
their relative proportions

 In real world, rules are hard to formulate
 Even if we might have a comprehensive set of rules, it would
be hard to scale and make them robust over different writing
styles
Handwritten Character Recognition in
the Real World

Input Output
Learned
Program
Machine
Learning
Machine Learning: Inputs and Outputs
 We will focus on Supervised Learning i.e.,
learning associations between Inputs and
Outputs
There exist other types of Learning
Unsupervised: learning association
between inputs only
Semi-supervised: learning from few
outputs and a large amount of inputs
Reinforcement: giving rewards for good
associations

 Inputs are generally represented by features
 characteristic and meaningful measures computed on raw-
data
 they provide domain information from human to machine
 they make the learning process easier
Machine Learning: Inputs

 Signal processing: from sound we can extract frequency, maximum
amplitude, power spectrum, etc ...
 Probability and Statistics: from text we can compute probability
distributions of usual words, words co-occurrences, etc …
Nevertheless, many inputs are already structured in feature (numeric)
format
For example: The customer information like income range, payment dues
etc. are the possible features for credit risk profiling.
What Features are?

Classification:
Predict a categorical output
Input Output
Feature1 Feature2 Class
3.2 4.1 1
1.6 1.7 0
1.9 2.8 2
3.2 45.0 0
1.4 11.5 2
2.7 22.0 1
Input Output
Feature1 Feature2
3.2 4.1 1.26
1.6 1.7 0.82
1.9 2.8 2.94
3.2 45.0 0.33
1.4 11.5 22.5
2.7 22.0 12.5
Machine Learning: Outputs
Regression:
Predict a numerical output

How Learning looks like in a Features
space
Using Features we can use Linear Algebra as main tool for the learning process

Signal Processing
Probability and Statistics
Linear Algebra
Inputs Outputs
New Input Predicted Output
Machine Learning Model
…the final ingredients

Separate 2 classes: Linear Classifier
Class 1
Class 2

….but this is also good
Class 1
Class 2

There exist infinite good separations
Class 1
Class 2

Linear SVM
Support
Vectors
Support Vector Machines (SVMs)
Class 1
Class 2

Some problems are not separable with a
line: XOR problem

Trick: let’s add a fake dimension

Non-linear SVMs: The Kernel Trick
Φ: x → φ(x)
The features space can always be mapped to some higher-dimensional
space where the training set is linearly separable

Between the infinite lines …

…let’s pick up one slightly good

…let’s give “bad” rewards to wrong
points

…until we get a complete separation

Original
Training data
....D1
D2 Dt-1 Dt
D
Step 1:
Create Multiple
Data Sets
C1 C2 Ct -1 Ct
Step 2:
Build Multiple
Classifiers
C*
Step 3:
Combine
Classifiers
Ensemble Methods: General Idea
• Construct a set of classifiers from the training data
• Predict class label of previously unseen records by aggregating
predictions made by multiple classifiers

• Suppose there are 25 base classifiers
• Each classifier has error rate,  = 0.35
• Assume classifiers are independent
• Probability that the ensemble classifier makes a wrong
prediction:
25
i
æ
è
ç
ö
ø
÷ei
(1-e)25-i
= 0.06
i=1
25
å
Why does it work
This is the only formula in this presentation.
Enjoy it!!

X-box Kinect: Random Forest Ensemble

Deep Learning: the new Frontier
Brief history of Neural Nets: born in 70s, developed in 80s, dead
in 90s, forgotten in 2000, state of the art in 2010

How do we deploy these algos ?

 Real world problems: Unpredictable size
 The volume and velocity of the input data can be very big
 For example:-
One wearable device streams out more than a million
message per day
Scalable Deployment

 Distributed and Parallel Machine learning
Dealing with the Volume

http://dme.rwth-aachen.de/de/research/projects/mapreduce
Distributed Machine Learning:
Map-Reduce

http://opensource.com/life/14/8/intro-apache-hadoop-big-data
Map-Reduce Implementation: Hadoop

 Machine Learning Libraries available
 Distributed computation geared for machine learning
(Iterative computation)
Resources for Distributed Processing

 Cloud
 Amazon
 Elastic Compute Cloud(EC2)
 Elastic Map-Reduce (EMR)
 Azure
 HDInsight
 Your own server farm
How to Deploy Scalable Solutions

Amazon: Kinesis
 Microsoft: Event-hub
Scalable Ingestion

Data Ingestion
Storage
Consumption
Connecting the components

Machine Learning: Demystify and Scale

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Machine Learning: Demystify and Scale

Ähnlich wie Machine Learning: Demystify and Scale (20)

Machine Learning: Demystify and Scale