SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
如何用 SVM 做分類問題
Yiwei Chen
2016.10
import numpy as np
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC
dataset = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
dataset.data, dataset.target,
test_size=0.1, stratify=dataset.target)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_train)
param_grid = {
"C": np.logspace(-5, 15, num=6, base=2),
"gamma": np.logspace(-13, 3, num=5, base=2)
}
grid = GridSearchCV(
estimator=SVC(kernel="rbf", max_iter=10000000),
param_grid=param_grid, cv=5)
grid.fit(X_scaled, y_train)
clf = SVC(kernel="rbf",
C=grid.best_params_["C"],
gamma=grid.best_params_["gamma"],
max_iter=10000000)
clf.fit(X_scaled, y_train)
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])
novel_X_scaled = scaler.transform(novel_X)
print(novel_X_scaled)
print(clf.predict(novel_X_scaled))
X_test_scaled = scaler.transform(X_test)
print(clf.predict(X_test_scaled))
print(clf.score(X_test_scaled, y_test))
如果看得懂前兩頁,
就可以跳出這份投影片了
學習的方式很多
學習的目的也不同
not
sweet
sweet
從經驗中學習冥冥之定數
Learn the Mother Nature
from experience
這份投影片著重在
監督式分類(Supervised classification)
Mother Nature
甜 不甜 不甜 甜 ??
??
甜 / 不甜 ?
train
甜/不甜?
model
甜 不甜 不甜 甜
??
甜 / 不甜 ?
predict
model
甜甜/不甜?
甜 不甜 不甜 甜
Supervised Classification
● 有 training data: 一些物品/事情 + 其類別 (classes)
● 你要訓練出一個模型 (train a model),之後
有新的物品進來,能預測 (predicts) 其類別
類別可以有兩個 (甜/不甜, binary classification)
或者更多個 (台/日/韓, multi-class classification)
Support Vector Machine (SVM)
● 有 training data: 向量 (vectors) + 其類別
● 你要訓練出一個模型 -- 為一個函數 (function),
之後有新的向量進來,能預測其類別
類別可以有兩個 (甜/不甜, binary classification)
或者更多個 (台/日/韓, multi-class classification)
(1.2, 0, 0, 1, …, 57)
train
ƒ: →
model
O
(8.7, 1, 0, 0, …, -3)X
(2.4, 1, 0, 0, …, 22)O
(0.3, 0, 1, 0, …, 33)X
⋮⋮
(1.2, 0, 0, 1, …, 57)
ƒ: →
model
O
(8.7, 1, 0, 0, …, -3)X
(2.4, 1, 0, 0, …, 22)O
(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)
predict
X
O
⋮⋮
Feature engineering
● 用同樣方式,把物品轉成向量
● Size: 8cm or 80mm?
● red/yellow/green: (1,0,0)/(0,1,0)/(0,0,1)
解決監督式分類問題有很多種方法
● SVM
● Decision trees
● Neural networks
● Deep learning
● …
他們可以解決監督式分類問題
不代表他們只能解決監督式分類問題
Agenda
● Supervised classification
● Support Vector Machine
● Software environment
● Use Support Vector Machines
(1.2, 0, 0, 1, …, 57)
train
ƒ: →
model
O
(8.7, 1, 0, 0, …, 22)X
(2.4, 1, 0, 0, …, -3)O
(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)
predict
X
O
⋮⋮
Support Vector Machine ??
例子: 二維的向量,兩個分類
Feature 1
Feature 2
train
Model (function)
Support Vector Machine ??
例子: 二維的向量,兩個分類
predict
Model
?
? Model
Maximum Margin
SVM 的性質
● 和距離相關 (Distance related)
● 分越開越好 (Maximum margin)
Characteristics in SVM
● 和距離相關 (Distance related)
● 分越開越好 (Maximum margin)
● 參數化 (Parameterized)
○ 邊界有可能是彎的
○ 可以分錯,但要懲罰
用不同參數訓練,有不同結果 ...
Agenda
● Supervised classification
● Support Vector Machine
● Software environment
● Use Support Vector Machines
用 python 的話
scikit-learn
(sklearn)
numpy
SVM,
decision trees,
...
arrays, ... scipy
python
variance, ...
Anaconda: 願望一次滿足
● 跑在 python 上的開源科學平台
○ Linux / OSX / Windows
● 想得到的都幫你安裝
● 快。不花腦。
● https://www.continuum.io/anaconda-overview
Agenda
● Supervised classification
● Support Vector Machine
● Software environment
● Use Support Vector Machines
(1.2, 0, 0, 1, …, 57)
train
ƒ: →
model
O
(8.7, 1, 0, 0, …, 22)X
(2.4, 1, 0, 0, …, -3)O
(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)
predict
X
O
⋮⋮
一般流程
定好 評估公式+基礎預測
上線預測訓練
● Accuracy
○ Training accuracy
○ Testing accuracy
● precision, recall, Type I / Type II error, AUC, …
進行任何訓練前,先決定好你要怎麼評估結果!
評估 (Evaluation)
● Simple and easy, 閉著眼睛猜
● 拿來「比較」用(你知道你做的比Baseline還差嗎)
基礎的預測 (Baseline predictor)
train
ALL
用 SVM 的流程
定好 評估公式+基礎預測
處理資料處理資料
縮放 features
尋找最好的參數
訓練模型
縮放 features
預測
dataset = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
dataset.data, dataset.target,
test_size=0.1, stratify=dataset.target)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_train)
param_grid = {
"C": np.logspace(-5, 15, num=6, base=2),
"gamma": np.logspace(-13, 3, num=5, base=2)
}
grid = GridSearchCV(
estimator=SVC(kernel="rbf", max_iter=10000000),
param_grid=param_grid, cv=5)
grid.fit(X_scaled, y_train)
clf = SVC(kernel="rbf",
C=grid.best_params_["C"],
gamma=grid.best_params_["gamma"],
max_iter=10000000)
clf.fit(X_scaled, y_train)
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])
novel_X_scaled = scaler.transform(novel_X)
print(novel_X_scaled)
print(clf.predict(novel_X_scaled))
X_test_scaled = scaler.transform(X_test)
print(clf.predict(X_test_scaled))
print(clf.score(X_test_scaled, y_test))
1. Data preparation
● Transform object → vector
● Whole training data at once
○ X in numpy.array (2-D) or scipy.sparse.csr_matrix
○ y in numpy.array
(1.2, 0, 57)O
(8.7, 1, 22)X
(2.4, 1, -3)O X=np.array([[2.4, 1, -3],
[8.7, 1, 22],
[1.2, 0, 57]])
y=np.array([1,0,1])
2. Feature Scaling
(1.2, 0, 0, …)O
(8.7, 1, 0, …)X
(2.4, 1, 0, …)O
(0.3, 0, 1, …)X
⋮⋮
0.3 ~ 10.3
(n−0.3)
×0.1
0 ~ 1
0 ~ 1
(n+0)
×1
0 ~ 1
(0.09, 0, 0, …)O
(0.84, 1, 0, …)X
O
(0 , 0, 1, …)X
⋮⋮
(0.21, 1, 0, …)
scale
2. Feature Scaling
(1.2, 0, 0, …)O
(8.7, 1, 0, …)X
(2.4, 1, 0, …)O
(0.3, 0, 1, …)X
⋮⋮
(0.09, 0, 0, …)O
(0.84, 1, 0, …)X
O
(0 , 0, 1, …)X
⋮⋮
(0.21, 1, 0, …)
scale
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
3. Search for the best parameter
param_grid = {
"C": np.logspace(-5, 15, num=6, base=2),
"gamma": np.logspace(-13, 3, num=5, base=2)
}
grid = GridSearchCV(
estimator=SVC(kernel="rbf",
max_iter=10000000),
param_grid=param_grid, cv=5)
grid.fit(X_scaled, y_train)
3. Search for best (??) C and
3. what is “best”?
甜 不甜 不甜 甜 ??
train
model
你還不知道
3. Search for the best - validation
train
model
當做新的,
沒看過
validate
甜 不甜 不甜 甜
3. Search for the best - cross-validation
Cross-validation (CV): each fold validates in turn
train validate
train validate train
validate train
Given C=12, =34, the validation accuracy=0.56
3. Search for the best parameter - Grid
C
3. Search for the best parameter
param_grid = {
"C": np.logspace(-5, 15, num=6, base=2),
"gamma": np.logspace(-13, 3, num=5, base=2)
}
grid = GridSearchCV(
estimator=SVC(kernel="rbf",
max_iter=10000000),
param_grid=param_grid, cv=5)
grid.fit(X_scaled, y_train)
4. Train Model
use the best parameter in CV to train
clf = SVC(kernel="rbf",
C=grid.best_params_["C"],
gamma=grid.best_params_["gamma"],
max_iter=10000000)
clf.fit(X_scaled, y_train)
Predict a novel data
● Scaling
● Predict
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])
novel_X_scaled = scaler.transform(novel_X)
print(clf.predict(novel_X_scaled))
Scale Training Data
(1.2, 0, 0, …)O
(8.7, 1, 0, …)X
(2.4, 1, 0, …)O
(0.3, 0, 1, …)X
⋮⋮
0.3 ~ 10.3
(n−0.3)
×0.1
0 ~ 1
0 ~ 1
(n+0)
×1
0 ~ 1
(0.09, 0, 0, …)O
(0.84, 1, 0, …)X
O
(0 , 0, 1, …)X
⋮⋮
(0.21, 1, 0, …)
scale
Scale Testing Data
(2.3, 0, 0, …)O
(-0.7, 1, 1, …)X
(1.3, 1, 1, …)O
(100, 0, 0, …)X
⋮⋮
(n−0.3)
×0.1
(n+0)
×1
(0.20, 0, 0, …)O
(-0.1, 1, 1, …)X
O
(9.97, 0, 0, …)X
⋮⋮
(0.10, 1, 1, …)
scale
dataset = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
dataset.data, dataset.target,
test_size=0.1, stratify=dataset.target)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_train)
param_grid = {
"C": np.logspace(-5, 15, num=6, base=2),
"gamma": np.logspace(-13, 3, num=5, base=2)
}
grid = GridSearchCV(
estimator=SVC(kernel="rbf", max_iter=10000000),
param_grid=param_grid, cv=5)
grid.fit(X_scaled, y_train)
clf = SVC(kernel="rbf",
C=grid.best_params_["C"],
gamma=grid.best_params_["gamma"],
max_iter=10000000)
clf.fit(X_scaled, y_train)
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])
novel_X_scaled = scaler.transform(novel_X)
print(novel_X_scaled)
print(clf.predict(novel_X_scaled))
X_test_scaled = scaler.transform(X_test)
print(clf.predict(X_test_scaled))
print(clf.score(X_test_scaled, y_test))
Agenda
● Supervised classification
● Support Vector Machine
● Software environment
● Use Support Vector Machines
Takeaway…
??
甜 / 不甜 ?
train
甜/不甜?
model
甜 不甜 不甜 甜
??
甜 / 不甜 ?
predict
model
甜甜/不甜?
甜 不甜 不甜 甜
用 SVM 的流程
Evaluation criteria + Baseline predictor
prepare dataprepare data
scale features
search best param:
CV on grid
train model
scale features
predict
知道怎麼正確使用微波爐之後...
● Data collection (準備食材)
● Model evaluation monitoring (客戶滿意?)
● Feature engineering (處理食材)
● Model update from novel data (與時俱進)
● Training / prediction in large scale (大量食材)
● A robust pipeline that integrates these altogether
(開餐廳)
Happy Training!
More materials
“Support” Vectors?
Maximum Margin
Why scaling?
Model Serialization
http://scikit-learn.org/stable/modules/model_persiste
nce.html

Weitere ähnliche Inhalte

Was ist angesagt?

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
A BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineA BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineAboul Ella Hassanien
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svmStéphane Canu
 
Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Palak Sanghani
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVMCarlo Carandang
 
Svm Presentation
Svm PresentationSvm Presentation
Svm Presentationshahparin
 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3Max Kleiner
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in tradingAashay Harlalka
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machinesbutest
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIIMax Kleiner
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineSomnathMore3
 

Was ist angesagt? (19)

Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
End1
End1End1
End1
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
A BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineA BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machine
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svm
 
Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Svm Presentation
Svm PresentationSvm Presentation
Svm Presentation
 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in trading
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 

Andere mochten auch

Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualStéphane Canu
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualStéphane Canu
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalStéphane Canu
 
Analysis of Classification Techniques based on SVM for Face Recognition
Analysis of Classification Techniques based on SVM for Face RecognitionAnalysis of Classification Techniques based on SVM for Face Recognition
Analysis of Classification Techniques based on SVM for Face RecognitionEditor Jacotech
 
Destek vektör makineleri
Destek vektör makineleriDestek vektör makineleri
Destek vektör makineleriozgur_dolgun
 
View classification of medical x ray images using pnn classifier, decision tr...
View classification of medical x ray images using pnn classifier, decision tr...View classification of medical x ray images using pnn classifier, decision tr...
View classification of medical x ray images using pnn classifier, decision tr...eSAT Journals
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetPawandeep Kaur
 
Convex Hull Algorithm Analysis
Convex Hull Algorithm AnalysisConvex Hull Algorithm Analysis
Convex Hull Algorithm AnalysisRex Yuan
 
Basic guide to turf cricket pitch preparation
Basic guide to turf cricket pitch preparationBasic guide to turf cricket pitch preparation
Basic guide to turf cricket pitch preparationDebbie-Ann Hall
 
Mri brain image segmentatin and classification by modified fcm &svm akorithm
Mri brain image segmentatin and classification by modified fcm &svm akorithmMri brain image segmentatin and classification by modified fcm &svm akorithm
Mri brain image segmentatin and classification by modified fcm &svm akorithmeSAT Journals
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 

Andere mochten auch (20)

Svm my
Svm mySvm my
Svm my
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primal
 
Analysis of Classification Techniques based on SVM for Face Recognition
Analysis of Classification Techniques based on SVM for Face RecognitionAnalysis of Classification Techniques based on SVM for Face Recognition
Analysis of Classification Techniques based on SVM for Face Recognition
 
Bayes Aglari
Bayes AglariBayes Aglari
Bayes Aglari
 
Destek vektör makineleri
Destek vektör makineleriDestek vektör makineleri
Destek vektör makineleri
 
Lec12
Lec12Lec12
Lec12
 
26 Computational Geometry
26 Computational Geometry26 Computational Geometry
26 Computational Geometry
 
About SVM
About SVMAbout SVM
About SVM
 
View classification of medical x ray images using pnn classifier, decision tr...
View classification of medical x ray images using pnn classifier, decision tr...View classification of medical x ray images using pnn classifier, decision tr...
View classification of medical x ray images using pnn classifier, decision tr...
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
 
convex hull
convex hullconvex hull
convex hull
 
Convex Hull Algorithm Analysis
Convex Hull Algorithm AnalysisConvex Hull Algorithm Analysis
Convex Hull Algorithm Analysis
 
SVM
SVMSVM
SVM
 
Basic guide to turf cricket pitch preparation
Basic guide to turf cricket pitch preparationBasic guide to turf cricket pitch preparation
Basic guide to turf cricket pitch preparation
 
Mri brain image segmentatin and classification by modified fcm &svm akorithm
Mri brain image segmentatin and classification by modified fcm &svm akorithmMri brain image segmentatin and classification by modified fcm &svm akorithm
Mri brain image segmentatin and classification by modified fcm &svm akorithm
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 

Ähnlich wie How to use SVM for data classification

maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning VMax Kleiner
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedVivian S. Zhang
 
Intelligent System Optimizations
Intelligent System OptimizationsIntelligent System Optimizations
Intelligent System OptimizationsMartin Zapletal
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker DiarizationHONGJOO LEE
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)Thomas Fan
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
support vector regression
support vector regressionsupport vector regression
support vector regressionAkhilesh Joshi
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics nazlitemu
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningBig_Data_Ukraine
 
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12OdessaFrontend
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisYasas Senarath
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171Yaxin Liu
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Cdiscount
 

Ähnlich wie How to use SVM for data classification (20)

maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expanded
 
Xgboost
XgboostXgboost
Xgboost
 
Intelligent System Optimizations
Intelligent System OptimizationsIntelligent System Optimizations
Intelligent System Optimizations
 
Naïve Bayes.pptx
Naïve Bayes.pptxNaïve Bayes.pptx
Naïve Bayes.pptx
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
support vector regression
support vector regressionsupport vector regression
support vector regression
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Xgboost
XgboostXgboost
Xgboost
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 

Kürzlich hochgeladen

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 

Kürzlich hochgeladen (20)

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 

How to use SVM for data classification

  • 2. import numpy as np from sklearn import datasets from sklearn.model_selection import GridSearchCV from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.svm import SVC dataset = datasets.load_iris() X_train, X_test, y_train, y_test = train_test_split( dataset.data, dataset.target, test_size=0.1, stratify=dataset.target) scaler = MinMaxScaler() X_scaled = scaler.fit_transform(X_train) param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2) } grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5) grid.fit(X_scaled, y_train)
  • 3. clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000) clf.fit(X_scaled, y_train) novel_X = np.array([[5.9, 3.2, 3.9, 1.5]]) novel_X_scaled = scaler.transform(novel_X) print(novel_X_scaled) print(clf.predict(novel_X_scaled)) X_test_scaled = scaler.transform(X_test) print(clf.predict(X_test_scaled)) print(clf.score(X_test_scaled, y_test))
  • 9. Mother Nature 甜 不甜 不甜 甜 ??
  • 10. ?? 甜 / 不甜 ? train 甜/不甜? model 甜 不甜 不甜 甜
  • 11. ?? 甜 / 不甜 ? predict model 甜甜/不甜? 甜 不甜 不甜 甜
  • 12. Supervised Classification ● 有 training data: 一些物品/事情 + 其類別 (classes) ● 你要訓練出一個模型 (train a model),之後 有新的物品進來,能預測 (predicts) 其類別 類別可以有兩個 (甜/不甜, binary classification) 或者更多個 (台/日/韓, multi-class classification)
  • 13. Support Vector Machine (SVM) ● 有 training data: 向量 (vectors) + 其類別 ● 你要訓練出一個模型 -- 為一個函數 (function), 之後有新的向量進來,能預測其類別 類別可以有兩個 (甜/不甜, binary classification) 或者更多個 (台/日/韓, multi-class classification)
  • 14. (1.2, 0, 0, 1, …, 57) train ƒ: → model O (8.7, 1, 0, 0, …, -3)X (2.4, 1, 0, 0, …, 22)O (0.3, 0, 1, 0, …, 33)X ⋮⋮
  • 15. (1.2, 0, 0, 1, …, 57) ƒ: → model O (8.7, 1, 0, 0, …, -3)X (2.4, 1, 0, 0, …, 22)O (0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8) predict X O ⋮⋮
  • 16. Feature engineering ● 用同樣方式,把物品轉成向量 ● Size: 8cm or 80mm? ● red/yellow/green: (1,0,0)/(0,1,0)/(0,0,1)
  • 17. 解決監督式分類問題有很多種方法 ● SVM ● Decision trees ● Neural networks ● Deep learning ● … 他們可以解決監督式分類問題 不代表他們只能解決監督式分類問題
  • 18. Agenda ● Supervised classification ● Support Vector Machine ● Software environment ● Use Support Vector Machines
  • 19. (1.2, 0, 0, 1, …, 57) train ƒ: → model O (8.7, 1, 0, 0, …, 22)X (2.4, 1, 0, 0, …, -3)O (0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8) predict X O ⋮⋮
  • 20. Support Vector Machine ?? 例子: 二維的向量,兩個分類 Feature 1 Feature 2 train Model (function)
  • 21. Support Vector Machine ?? 例子: 二維的向量,兩個分類 predict Model ? ? Model
  • 23. SVM 的性質 ● 和距離相關 (Distance related) ● 分越開越好 (Maximum margin)
  • 24. Characteristics in SVM ● 和距離相關 (Distance related) ● 分越開越好 (Maximum margin) ● 參數化 (Parameterized) ○ 邊界有可能是彎的 ○ 可以分錯,但要懲罰
  • 26. Agenda ● Supervised classification ● Support Vector Machine ● Software environment ● Use Support Vector Machines
  • 27. 用 python 的話 scikit-learn (sklearn) numpy SVM, decision trees, ... arrays, ... scipy python variance, ...
  • 28. Anaconda: 願望一次滿足 ● 跑在 python 上的開源科學平台 ○ Linux / OSX / Windows ● 想得到的都幫你安裝 ● 快。不花腦。 ● https://www.continuum.io/anaconda-overview
  • 29. Agenda ● Supervised classification ● Support Vector Machine ● Software environment ● Use Support Vector Machines
  • 30. (1.2, 0, 0, 1, …, 57) train ƒ: → model O (8.7, 1, 0, 0, …, 22)X (2.4, 1, 0, 0, …, -3)O (0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8) predict X O ⋮⋮
  • 32. ● Accuracy ○ Training accuracy ○ Testing accuracy ● precision, recall, Type I / Type II error, AUC, … 進行任何訓練前,先決定好你要怎麼評估結果! 評估 (Evaluation)
  • 33. ● Simple and easy, 閉著眼睛猜 ● 拿來「比較」用(你知道你做的比Baseline還差嗎) 基礎的預測 (Baseline predictor) train ALL
  • 34. 用 SVM 的流程 定好 評估公式+基礎預測 處理資料處理資料 縮放 features 尋找最好的參數 訓練模型 縮放 features 預測
  • 35. dataset = datasets.load_iris() X_train, X_test, y_train, y_test = train_test_split( dataset.data, dataset.target, test_size=0.1, stratify=dataset.target) scaler = MinMaxScaler() X_scaled = scaler.fit_transform(X_train) param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2) } grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5) grid.fit(X_scaled, y_train) clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000) clf.fit(X_scaled, y_train)
  • 36. novel_X = np.array([[5.9, 3.2, 3.9, 1.5]]) novel_X_scaled = scaler.transform(novel_X) print(novel_X_scaled) print(clf.predict(novel_X_scaled)) X_test_scaled = scaler.transform(X_test) print(clf.predict(X_test_scaled)) print(clf.score(X_test_scaled, y_test))
  • 37. 1. Data preparation ● Transform object → vector ● Whole training data at once ○ X in numpy.array (2-D) or scipy.sparse.csr_matrix ○ y in numpy.array (1.2, 0, 57)O (8.7, 1, 22)X (2.4, 1, -3)O X=np.array([[2.4, 1, -3], [8.7, 1, 22], [1.2, 0, 57]]) y=np.array([1,0,1])
  • 38. 2. Feature Scaling (1.2, 0, 0, …)O (8.7, 1, 0, …)X (2.4, 1, 0, …)O (0.3, 0, 1, …)X ⋮⋮ 0.3 ~ 10.3 (n−0.3) ×0.1 0 ~ 1 0 ~ 1 (n+0) ×1 0 ~ 1 (0.09, 0, 0, …)O (0.84, 1, 0, …)X O (0 , 0, 1, …)X ⋮⋮ (0.21, 1, 0, …) scale
  • 39. 2. Feature Scaling (1.2, 0, 0, …)O (8.7, 1, 0, …)X (2.4, 1, 0, …)O (0.3, 0, 1, …)X ⋮⋮ (0.09, 0, 0, …)O (0.84, 1, 0, …)X O (0 , 0, 1, …)X ⋮⋮ (0.21, 1, 0, …) scale scaler = MinMaxScaler() X_scaled = scaler.fit_transform(X)
  • 40. 3. Search for the best parameter param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2) } grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5) grid.fit(X_scaled, y_train)
  • 41. 3. Search for best (??) C and
  • 42. 3. what is “best”? 甜 不甜 不甜 甜 ?? train model 你還不知道
  • 43. 3. Search for the best - validation train model 當做新的, 沒看過 validate 甜 不甜 不甜 甜
  • 44. 3. Search for the best - cross-validation Cross-validation (CV): each fold validates in turn train validate train validate train validate train Given C=12, =34, the validation accuracy=0.56
  • 45. 3. Search for the best parameter - Grid C
  • 46. 3. Search for the best parameter param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2) } grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5) grid.fit(X_scaled, y_train)
  • 47. 4. Train Model use the best parameter in CV to train clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000) clf.fit(X_scaled, y_train)
  • 48. Predict a novel data ● Scaling ● Predict novel_X = np.array([[5.9, 3.2, 3.9, 1.5]]) novel_X_scaled = scaler.transform(novel_X) print(clf.predict(novel_X_scaled))
  • 49. Scale Training Data (1.2, 0, 0, …)O (8.7, 1, 0, …)X (2.4, 1, 0, …)O (0.3, 0, 1, …)X ⋮⋮ 0.3 ~ 10.3 (n−0.3) ×0.1 0 ~ 1 0 ~ 1 (n+0) ×1 0 ~ 1 (0.09, 0, 0, …)O (0.84, 1, 0, …)X O (0 , 0, 1, …)X ⋮⋮ (0.21, 1, 0, …) scale
  • 50. Scale Testing Data (2.3, 0, 0, …)O (-0.7, 1, 1, …)X (1.3, 1, 1, …)O (100, 0, 0, …)X ⋮⋮ (n−0.3) ×0.1 (n+0) ×1 (0.20, 0, 0, …)O (-0.1, 1, 1, …)X O (9.97, 0, 0, …)X ⋮⋮ (0.10, 1, 1, …) scale
  • 51. dataset = datasets.load_iris() X_train, X_test, y_train, y_test = train_test_split( dataset.data, dataset.target, test_size=0.1, stratify=dataset.target) scaler = MinMaxScaler() X_scaled = scaler.fit_transform(X_train) param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2) } grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5) grid.fit(X_scaled, y_train) clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000) clf.fit(X_scaled, y_train)
  • 52. novel_X = np.array([[5.9, 3.2, 3.9, 1.5]]) novel_X_scaled = scaler.transform(novel_X) print(novel_X_scaled) print(clf.predict(novel_X_scaled)) X_test_scaled = scaler.transform(X_test) print(clf.predict(X_test_scaled)) print(clf.score(X_test_scaled, y_test))
  • 53. Agenda ● Supervised classification ● Support Vector Machine ● Software environment ● Use Support Vector Machines Takeaway…
  • 54. ?? 甜 / 不甜 ? train 甜/不甜? model 甜 不甜 不甜 甜
  • 55. ?? 甜 / 不甜 ? predict model 甜甜/不甜? 甜 不甜 不甜 甜
  • 56. 用 SVM 的流程 Evaluation criteria + Baseline predictor prepare dataprepare data scale features search best param: CV on grid train model scale features predict
  • 57. 知道怎麼正確使用微波爐之後... ● Data collection (準備食材) ● Model evaluation monitoring (客戶滿意?) ● Feature engineering (處理食材) ● Model update from novel data (與時俱進) ● Training / prediction in large scale (大量食材) ● A robust pipeline that integrates these altogether (開餐廳)