37. 1. Data preparation
● Transform object → vector
● Whole training data at once
○ X in numpy.array (2-D) or scipy.sparse.csr_matrix
○ y in numpy.array
(1.2, 0, 57)O
(8.7, 1, 22)X
(2.4, 1, -3)O X=np.array([[2.4, 1, -3],
[8.7, 1, 22],
[1.2, 0, 57]])
y=np.array([1,0,1])
42. 3. what is “best”?
甜 不甜 不甜 甜 ??
train
model
你還不知道
43. 3. Search for the best - validation
train
model
當做新的,
沒看過
validate
甜 不甜 不甜 甜
44. 3. Search for the best - cross-validation
Cross-validation (CV): each fold validates in turn
train validate
train validate train
validate train
Given C=12, =34, the validation accuracy=0.56
46. 3. Search for the best parameter
param_grid = {
"C": np.logspace(-5, 15, num=6, base=2),
"gamma": np.logspace(-13, 3, num=5, base=2)
}
grid = GridSearchCV(
estimator=SVC(kernel="rbf",
max_iter=10000000),
param_grid=param_grid, cv=5)
grid.fit(X_scaled, y_train)
47. 4. Train Model
use the best parameter in CV to train
clf = SVC(kernel="rbf",
C=grid.best_params_["C"],
gamma=grid.best_params_["gamma"],
max_iter=10000000)
clf.fit(X_scaled, y_train)
48. Predict a novel data
● Scaling
● Predict
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])
novel_X_scaled = scaler.transform(novel_X)
print(clf.predict(novel_X_scaled))
56. 用 SVM 的流程
Evaluation criteria + Baseline predictor
prepare dataprepare data
scale features
search best param:
CV on grid
train model
scale features
predict
57. 知道怎麼正確使用微波爐之後...
● Data collection (準備食材)
● Model evaluation monitoring (客戶滿意?)
● Feature engineering (處理食材)
● Model update from novel data (與時俱進)
● Training / prediction in large scale (大量食材)
● A robust pipeline that integrates these altogether
(開餐廳)