Holistic approach to machine learning

@SrcMinistry @MariuszGil
Holistic approach to Machine Learning
Data processing

Use DDD/OOP/AOP/
SOLID/GRASP/XYZ

Solve problems by writing
code, to make users happy
and make money

and make money
Solve problems

and make money
Solve

and make money
problems

Mapping all problems to
DDD/OOP/AOP/SOLID/
GRASP/XYZ

Tens of thousands
historical transactions

A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured
by P, improves with experience E
Tom M. Mitchell

Typical ML techniques
Classification
Regression
Clustering
Dimensionality reduction
Association learning

o
oo
o
oo
o
oo
o o
o
o oo
o
o
o
o oo
o
oo
o
o
o
feature 1
feature2

Typical ML paradigms
Supervised learning
Unsupervised learning
Reinforcement learning

+-------+--------+------+--------+---------+-------+
| brand | model | year | milage | service | price |
+-------+--------+------+--------+---------+-------+
| ford | mondeo | 2005 | 123000 | 9900 | 67000 |
+-------+--------+------+--------+---------+-------+
| ford | mondeo | 2005 | 175000 | 9900 | 30000 |
+-------+--------+------+--------+---------+-------+
| ford | focus | 2010 | 45000 | 6700 | 30000 |
+-------+--------+------+--------+---------+-------+
…

Learning Data
Algorithm Learning
Classifier ModelReal Data Classification

+-------+--------+------+--------+---------+--------+-------+
| brand | model | year | milage | service | repair | price |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 |
+-------+--------+------+--------+---------+--------+-------+
…

+-------+--------+------+--------+---------+--------+-------+
| brand | model | year | milage | service | repair | price |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | mondeo | 2005 | 175000 | 900 | 9000 | 45000 |
+-------+--------+------+--------+---------+--------+-------+
| ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 |
+-------+--------+------+--------+---------+--------+-------+
…

+-------+--------+-----+------+--------+---------+--------+-------+
| brand | model | gen | year | milage | service | repair | price |
+-------+--------+-----+------+--------+---------+--------+-------+
| ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 67000 |
+-------+--------+-----+------+--------+---------+--------+-------+
| ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 30000 |
+-------+--------+-----+------+--------+---------+--------+-------+
| ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 45000 |
+-------+--------+-----+------+--------+---------+--------+-------+
| ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 30000 |
+-------+--------+-----+------+--------+---------+--------+-------+
…

+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| brand | model | gen | year | milage | service | repair | igla | crying German | price |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 0 | 0 | 67000 |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 1 | 1 | 30000 |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 0 | 0 | 45000 |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
| ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 1 | 0 | 30000 |
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
…

http://blogs.adobe.com/digitalmarketing/wp-content/uploads/2013/08/aq2.jpg

Raw Data Collection
Pre-processing
Sampling
Training Dataset
Algorithm Training
Optimization
Post-processing
Final model
Pre-processingFeature Selection
Feature Scaling
Dimensionality Reduction
Performance Metrics
Model Selection
Test Dataset
CrossValidation
Final Model 
Evaluation
Pre-processing Classification
Missing Data
Feature Extraction
Data 
Split
Data

Raw Data Collection
Pre-processing
Sampling
Training Dataset
Algorithm Training
Optimization
Final model
Pre-processingFeature Selection
Feature Scaling
Dimensionality Reduction
Performance Metrics
Model Selection
Test Dataset
CrossValidation
Final Model 
Evaluation
Pre-processing Classification
Missing Data
Feature Extraction
Data 
Split
Post-processing
Data

Classiﬁcation algorithms
Linear Classification
Logistic Regression
Linear Discriminant Analysis
PLS Discriminant Analysis
Non-Linear Classification
Mixture Discriminant Analysis
Quadratic Discriminant Analysis
Regularized Discriminant Analysis
Neural Networks
Flexible Discriminant Analysis
Support Vector Machines
k-Nearest Neighbor
Naive Bayes
Decission Trees for Classification
Classification and Regression Trees
C4.5
PART
Bagging CART
Random Forest
Gradient Booster Machines
Boosted 5.0

Regression algorithms
Linear Regiression
Ordinary Least Squares Regression
Stepwise Linear Regression
Prinicpal Component Regression
Partial Least Squares Regression
Non-Linear Regression /
Penalized Regression
Ridge Regression
Least Absolute Shrinkage
ElasticNet
Multivariate Adaptive Regression
Support Vector Machines
k-Nearest Neighbor
Neural Network
Decission Trees for Regression
Classification and Regression Trees
Conditional Decision Tree
Rule System
Bagging CART
Random Forest
Gradient Boosted Machine
Cubist

Algorithm is only
element in the ML chain

Everything may be
important for ML

Does it do well on 
the training data?
Does it do well on 
the test data?
Better features / 
Better parameters
More data
Done!
No No
Yes
by Andrew Ng
Yes

Calculate, measure,
apply later

import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.mllib.util.MLUtils
// Load training data in LIBSVM format.
val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
// Split data into training (60%) and test (40%).
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0).cache()
val test = splits(1)
// Run training algorithm to build the model
val numIterations = 100
val model = SVMWithSGD.train(training, numIterations)
// Clear the default threshold.
model.clearThreshold()
// Compute raw scores on the test set.
val scoreAndLabels = test.map { point =>
val score = model.predict(point.features)
(score, point.label)
}
// Get evaluation metrics.
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
val auROC = metrics.areaUnderROC()
println("Area under ROC = " + auROC)
// Save and load model
model.save(sc, "myModelPath")
val sameModel = SVMModel.load(sc, "myModelPath")

Art of asking right
questions related to
right data

@SrcMinistry
Thanks!
@MariuszGil

Holistic approach to machine learning

Holistic approach to machine learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Holistic approach to machine learning

Ähnlich wie Holistic approach to machine learning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Holistic approach to machine learning