Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)

T
A Union of Scikit-learn and PyTorch
Thomas Fan
github.com/thomasjpfan/pydata2018_dc_skorch
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
SciKit-Learn API
clf = SGDClassifier(alpha=0.01)
clf.fit(X, y)
y_pred = clf.predict(X)
clf.partial_fit(X, y)
clf.set_params(alpha=0.1)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
PyTorch Training - Training
for epoch in range(10):
net.train()
for inputs, labels in train_loader:
optimizer.zero_grad()
with torch.set_grad_enabled(True):
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
PyTorch Training - Recording Metrics
train_losses = []
for epoch in range(10):
running_loss = 0.0
for inputs, label in train_loader:
...
running_loss += loss.item() * inputs.size(0)
epoch_loss = running_loss / len(train_loader.dataset)
train_losses.append(epoch_loss)
PyTorch Training - Validation
net.eval()
with torch.set_grad_enabled(False):
for data in valid_loader:
inputs, labels = data
outputs = net(inputs)
loss = criterion(outputs, labels)
PyTorch Training - The Rest
• Recording validation losses
• Saving the best performing model
• Recording other metrics
• Logging
• ...
1. Scikit-Learn compatible neural network library that
wraps PyTorch.
2. Abstracts away the training loop.
3. Reduces the amount of boilerplate code with
callbacks.
Skorch NeuralNet
from skorch import NeuralNet
net = NuetralNet(
module,
criterion= ...,
callbacks=[ ...])
Exploring Skorch's API
1. MNIST
2. Ants and Bees
3. 2018 Kaggle Data Science Bowl
MNIST - Data
print(X.shape, y.shape)
# (70000, 784) (70000,)
MNIST - Data Code
from sklearn.model_selection import train_test_split
X_scaled = X / X.max()
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y, test_size=0.25, random_state=42)
MNIST - Neural Network Module
from torch.nn as nn
class SimpleFeedforward(nn.Module):
def __init __(self, dropout=0.5):
super(). __init __()
self.module = nn.Sequential(
nn.Linear(784, 98),
nn.ReLU(inplace=True),
nn.Dropout(dropout),
nn.Linear(98, 10))
def forward(self, X):
return self.module(X)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
MNIST - Loss function skorch
from skorch import NeuralNet
net = NeuralNet(
SimpleFeedforward,
criterion=nn.CrossEntropyLoss,
max_epochs=10,
lr=0.3,
device='cuda', # comment to train on cpu
)
MNIST - Fi!ing
_ = net.fit(X_train, y_train)
MNIST - Continue Training
net.set_params(max_epochs=5)
_ = net.partial_fit(X_train, y_train)
MNIST - History
len(net.history)
# 15
net.history[-1, 'valid_loss']
# 0.10163110941932314
net.history[-2:, 'train_loss']
# [0.13314295971961249,
# 0.1330454680351984]
MNIST - Accuracy Score
from sklearn.metrics import make_scorer
def accuracy_argmax(y_true, y_pred):
return np.mean(y_true == np.argmax(y_pred, -1))
accuracy_argmax_scorer = make_scorer(accuracy_argmax)
MNIST - EpochScoring
from skorch.callbacks import EpochScoring
epoch_acc = EpochScoring(
accuracy_argmax_scorer,
name='valid_acc',
lower_is_better=False)
net = NeuralNet( ...,
callbacks=[epoch_acc]
)
MNIST - Fi!ing With EpochScoring
_ = net.fit(X, y)
MNIST - Prediction
y_pred = net.predict(X_test)
print('test accuracy:', accuracy_argmax(y_test, y_pred))
# test accuracy: 0.9634857142857143
MNIST - Scikit-Learn Integration
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
pipe = Pipeline([
("min_max", MinMaxScaler()),
("net", net)])
_ = pipe.fit(X_train, y_train)
MNIST - Grid Search
from sklearn.model_selection import GridSearchCV
param_grid = {"net __module __dropout": [0.2, 0.5, 0.8]}
gs = GridSearchCV(pipe, param_grid, cv=3,
scoring=accuracy_argmax_scorer)
_ = gs.fit(X, y)
print("best score:", gs.best_score_)
# best score: 0.9651
print("best_params", gs.best_params_)
# best_params {'net __module __dropout': 0.2}
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Ants and Bees - Folder Structure
datasets/hymenoptera_data/
├── train
│ ├── ants
│ └── bees
└── val
├── ants
└── bees
Ants and Bees - ImageFolder Init
from torchvision.datasets import ImageFolder
train_tfms = ...
val_tfms = ...
train_ds = ImageFolder(
"datasets/hymenoptera_data/train" , train_tfms)
val_ds = ImageFolder(
"datasets/hymenoptera_data/val", val_tfms)
Ants and Bees - ImageFolder Class
Subclass of torch.utils.data.Dataset
print(len(train_ds), len(val_ds))
# (244, 153)
img, target = train_ds[0]
print(img.shape, target)
# (torch.Size([3, 224, 224]), 0)
# For ImageFolder only:
print(train_ds.class_to_idx)
# {'ants': 0, 'bees': 1}
Ants and Bees - ImageFolder Transformations
import torchvision.transforms as tfms
train_tfms = tfms.Compose([
tfms.RandomResizedCrop(224),
tfms.RandomHorizontalFlip(),
tfms.ToTensor(),
tfms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
train_ds = ImageFolder(
"datasets/hymenoptera_data/train" , train_tfms)
Ants and Bees - ImageNet
• 1000 classes
• 1300 images for each class
• Mean of ImageNet: [0.485, 0.456, 0.406]
• Standard Deviation of ImageNet: [0.229, 0.224,
0.225]
Ants and Bees - ResNet Model
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of CVPR, pages
770–778, 2016. arxiv.org/abs/1512.03385
Ants and Bees - ResNet Model Code
from torchvision.models import resnet18
import torch.nn as nn
class PretrainedModel(nn.Module):
def __init __(self):
super(). __init __()
self.model = resnet18(pretrained=True)
self.model.fc = nn.Linear(512, 2)
def forward(self, X):
return self.model(X)
Ants and Bees - Freezer
from skorch.callbacks import Freezer
freezer = Freezer(
lambda name: not name.startswith("model.fc"))
Ants and Bees - Learning Rate
Scheduler
from skorch.callbacks import (
LRScheduler
)
lr_scheduler = LRScheduler(
policy="StepLR",
step_size=7,
gamma=0.1)
Ants and Bees - Checkpoints
from skorch.callbacks import Checkpoint
epoch_acc = EpochScoring( ..., name='valid_acc',
lower_is_better=False)
checkpoint = Checkpoint(
dirname="exp_01_bee_vs_ant", monitor="valid_acc_best")
Ants and Bees - Skorch NeuralNet
from skorch.helper import predefined_split
net = NeuralNet(
PretrainedModel,
lr=0.001, batch_size=4,
train_split=predefined_split(val_ds),
callbacks=[freezer, lr_scheduler,
epoch_acc, checkpoint],
...
)
Ants and Bees - Fi!ing
_ = net.fit(train_ds)
Ants and Bees - Checkpoint Files
exp_01_bee_vs_ant
├── history.json
├── optimizer.pt
└── params.pt
Ants and Bees - Loading from Checkpoint
# net.fit( ...) was called
net.load_params(checkpoint=checkpoint)
val_output = net.predict(val_ds)
Ants and Bees - Prediction
checkpoint = Checkpoint( ...,
dirname="exp_01_bee_vs_ant",
monitor="valid_acc_best")
net = NeuralNet(PretrainedModel, ...)
net.initialize()
net.load_params(checkpoint=checkpoint)
val_pred = net.predict(val_ds)
Ants and Bees - Prediction Numpy
print(X_numpy.shape)
# (1, 3, 224, 224)
X_pred = net.predict(X_numpy)
print(X_pred)
# [[ 0.4966519, -0.9894746]]
print(softmax(X_pred))
# [[0.8154962 0.18450384]]
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Nuclei Image Segmentation - Dataset
train_cell_ds = CellsDataset( ...)
valid_cell_ds = CellsDataset( ...)
print(train_cell_ds[0])
# (<PIL.Image.Image>
# <PIL.PngImagePlugin.PngImageFile>)
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in
MICCAI, pp. 234–241, Springer, 2015. arxiv.org/abs/1505.04597
from skorch.callbacks import Freezer
freezer = Freezer('conv*')
Nuclei Image Segmentation - PatchedDataset
Nuclei Image Segmentation - PatchedDataset Code
train_ds = PatchedDataset(
train_cell_ds, patch_size=(256, 256),
padding=16, random_flips=True)
val_ds = PatchedDataset(
valid_cell_ds, patch_size=(256, 256),
padding=16, random_flips=False)
Nuclei Image Segmentation - IOU
Nuclei Image Segmentation - IOU Metric
def approximate_iou_metric(
true_masks, predicted_logit_masks, padding=16):
... # returns metric
iou_scoring = make_scorer(approximate_iou_metric)
iou_scoring = EpochScoring(
iou_scoring, name='valid_iou', lower_is_better=False)
best_cp = Checkpoint(
dirname="kaggle_seg_exp01", monitor="valid_iou_best")
Nuclei Image Segmentation - Custom Loss
class BCEWithLogitsLossPadding(nn.Module):
def __init __(self, padding):
super(). __init __()
self.padding = padding
...
net = NeuralNet(
UNet,
criterion=BCEWithLogitsLossPadding,
criterion __padding=16,
...
)
Nuclei Image Segmentation -
Cyclic LR Scheduler
cyclicLR = LRScheduler(
policy="CyclicLR",
base_lr=0.002,
max_lr=0.2,
step_size_up=550,
step_size_down=550)
Nuclei Image Segmentation - NeuralNet
net = NeuralNet(
UNet,
criterion=BCEWithLogitsLossPadding,
criterion __padding=16,
batch_size=32,
max_epochs=20,
train_split=predefined_split(val_ds),
callbacks=[freezer, cyclicLR, iou_scoring, best_cp],
...
)
Nuclei Image Segmentation - NeuralNet DataLoader
PyTorch's DataLoader(pin_memory=False, num_workers=0, ...)
net = NeuralNet( ...,
iterator_train __shuffle=True,
iterator_train __num_workers=4,
iterator_train __pin_memory=True,
iterator_valid __shuffle=False,
iterator_valid __num_workers=4,
iterator_valid __pin_memory=True)
_ = net.fit(train_ds)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Nuclei Image Segmentation - Predict on Validation
net.load_params(checkpoint=best_cp)
val_masks = net.predict(val_ds)
print(val_masks.shape)
# (468, 1, 288, 288)
val_prob_masks = num_staple_sigmod(val_masks.squeeze(1))
print(val_prob_masks.shape)
# (468, 288, 288)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Skorch - Closing 1
1. Scikit-Learn compatible neural network library that
wraps PyTorch.
• net.fit(X, y)
• net.partial_fit(X, y)
• net.predict(X)
• net.set_params( ...)
2. Abstracts away the training loop.
Skorch - Closing 2
1. Reduces the amount of boilerplate code with
callbacks.
• EpochScoring
• Freezer
• Checkpoint
• LRScheduler
• skorch.readthedocs.io/en/stable/user/callbacks.html
Skorch - Whats next
• skorch.readthedocs.io
• skorch Tutorials
• github.com/dnouri/skorch
• github.com/thomasjpfan/pydata2018_dc_skorch
Appendix Nuclei Image
Segmentation - Cyclic LR
Scheduler
• Number of training samples:
len(train_ds) = 1756
• max_epochs = 20
• batch_size = 32
• Training iterations per epoch:
ceil(1756/32) = 55
• Total number of iterations:
55*20 = 1100
Appendix Ants and Bees - Saving and Loading
from skorch.callbacks import TrainEndCheckpoint
from skorch.callbacks import LoadInitState
def run(max_epochs):
best_cp = Checkpoint(dirname="exp_02", ...)
train_end_cp = TrainEndCheckpoint(
dirname="exp_02", fn_prefix="train_end_")
load_state = LoadInitState(train_end_cp)
net = NeuralNet( ...,
max_epochs=max_epochs,
callbacks=[ ..., best_cp, train_end_cp, load_state]
).fit(train_ds)
Appendix Ants and Bees - Saving and Loading
Checkpoints
exp_02
├── history.json
├── optimizer.pt
├── params.pt
├── train_end_history.json
├── train_end_optimizer.pt
└── train_end_params.pt
Appendix Ants and Bees - Saving and Loading First Run
run(max_epochs=10)
Appendix Ants and Bees - Saving and Loading Second
Run
run(max_epochs=5)
1 von 66

Recomendados

Pybelsberg — Constraint-based Programming in Python von
Pybelsberg — Constraint-based Programming in PythonPybelsberg — Constraint-based Programming in Python
Pybelsberg — Constraint-based Programming in PythonChristoph Matthies
760 views51 Folien
Kristhyan kurtlazartezubia evidencia1-metodosnumericos von
Kristhyan kurtlazartezubia evidencia1-metodosnumericosKristhyan kurtlazartezubia evidencia1-metodosnumericos
Kristhyan kurtlazartezubia evidencia1-metodosnumericosKristhyanAndreeKurtL
85 views18 Folien
The Ring programming language version 1.8 book - Part 65 of 202 von
The Ring programming language version 1.8 book - Part 65 of 202The Ring programming language version 1.8 book - Part 65 of 202
The Ring programming language version 1.8 book - Part 65 of 202Mahmoud Samir Fayed
17 views10 Folien
Manual specialization von
Manual specializationManual specialization
Manual specializationSzymon Matejczyk
231 views22 Folien
Quest 1 define a class batsman with the following specifications von
Quest  1 define a class batsman with the following specificationsQuest  1 define a class batsman with the following specifications
Quest 1 define a class batsman with the following specificationsrajkumari873
4.6K views51 Folien
深入浅出Jscex von
深入浅出Jscex深入浅出Jscex
深入浅出Jscexjeffz
1.3K views59 Folien

Más contenido relacionado

Was ist angesagt?

openFrameworks 007 - video von
openFrameworks 007 - videoopenFrameworks 007 - video
openFrameworks 007 - videoroxlu
19K views18 Folien
662305 LAB13 von
662305 LAB13662305 LAB13
662305 LAB13Nitigan Nakjuatong
331 views10 Folien
Futures e abstração - QCon São Paulo 2015 von
Futures e abstração - QCon São Paulo 2015Futures e abstração - QCon São Paulo 2015
Futures e abstração - QCon São Paulo 2015Leonardo Borges
1.6K views60 Folien
Standford 2015 week3: Objective-C Compatibility, Property List, Views von
Standford 2015 week3: Objective-C Compatibility, Property List, ViewsStandford 2015 week3: Objective-C Compatibility, Property List, Views
Standford 2015 week3: Objective-C Compatibility, Property List, Views彼得潘 Pan
1K views57 Folien
openFrameworks 007 - 3D von
openFrameworks 007 - 3DopenFrameworks 007 - 3D
openFrameworks 007 - 3Droxlu
23K views30 Folien
Chapter 7 functions (c) von
Chapter 7 functions (c)Chapter 7 functions (c)
Chapter 7 functions (c)hhliu
144 views45 Folien

Was ist angesagt?(20)

openFrameworks 007 - video von roxlu
openFrameworks 007 - videoopenFrameworks 007 - video
openFrameworks 007 - video
roxlu19K views
Futures e abstração - QCon São Paulo 2015 von Leonardo Borges
Futures e abstração - QCon São Paulo 2015Futures e abstração - QCon São Paulo 2015
Futures e abstração - QCon São Paulo 2015
Leonardo Borges1.6K views
Standford 2015 week3: Objective-C Compatibility, Property List, Views von 彼得潘 Pan
Standford 2015 week3: Objective-C Compatibility, Property List, ViewsStandford 2015 week3: Objective-C Compatibility, Property List, Views
Standford 2015 week3: Objective-C Compatibility, Property List, Views
彼得潘 Pan1K views
openFrameworks 007 - 3D von roxlu
openFrameworks 007 - 3DopenFrameworks 007 - 3D
openFrameworks 007 - 3D
roxlu23K views
Chapter 7 functions (c) von hhliu
Chapter 7 functions (c)Chapter 7 functions (c)
Chapter 7 functions (c)
hhliu144 views
Advance features of C++ von vidyamittal
Advance features of C++Advance features of C++
Advance features of C++
vidyamittal1.6K views
Advance C++notes von Rajiv Gupta
Advance C++notesAdvance C++notes
Advance C++notes
Rajiv Gupta1.6K views
Scientific calcultor-Java von Shaibal Ahmed
Scientific calcultor-JavaScientific calcultor-Java
Scientific calcultor-Java
Shaibal Ahmed2.6K views
Machine-level Composition of Modularized Crosscutting Concerns von saintiss
Machine-level Composition of Modularized Crosscutting ConcernsMachine-level Composition of Modularized Crosscutting Concerns
Machine-level Composition of Modularized Crosscutting Concerns
saintiss279 views
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филонов von corehard_by
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел ФилоновC++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филонов
C++ Corehard Autumn 2018. Обучаем на Python, применяем на C++ - Павел Филонов
corehard_by56 views
Coding in Style von scalaconfjp
Coding in StyleCoding in Style
Coding in Style
scalaconfjp1.4K views
Improving Android Performance at Mobiconf 2014 von Raimon Ràfols
Improving Android Performance at Mobiconf 2014Improving Android Performance at Mobiconf 2014
Improving Android Performance at Mobiconf 2014
Raimon Ràfols242 views
Being functional in PHP (PHPDay Italy 2016) von David de Boer
Being functional in PHP (PHPDay Italy 2016)Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)
David de Boer1.4K views

Similar a Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)

Assignment 6.2a.pdf von
Assignment 6.2a.pdfAssignment 6.2a.pdf
Assignment 6.2a.pdfdash41
10 views4 Folien
Hybrid quantum classical neural networks with pytorch and qiskit von
Hybrid quantum classical neural networks with pytorch and qiskitHybrid quantum classical neural networks with pytorch and qiskit
Hybrid quantum classical neural networks with pytorch and qiskitVijayananda Mohire
268 views11 Folien
Assignment 5.2.pdf von
Assignment 5.2.pdfAssignment 5.2.pdf
Assignment 5.2.pdfdash41
4 views7 Folien
svm classification von
svm classificationsvm classification
svm classificationAkhilesh Joshi
849 views43 Folien
knn classification von
knn classificationknn classification
knn classificationAkhilesh Joshi
1.2K views23 Folien
I am working on this code for my project- but the accuracy is 0-951601.docx von
I am working on this code for my project- but the accuracy is 0-951601.docxI am working on this code for my project- but the accuracy is 0-951601.docx
I am working on this code for my project- but the accuracy is 0-951601.docxRyanEAcTuckern
8 views4 Folien

Similar a Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)(20)

Assignment 6.2a.pdf von dash41
Assignment 6.2a.pdfAssignment 6.2a.pdf
Assignment 6.2a.pdf
dash4110 views
Hybrid quantum classical neural networks with pytorch and qiskit von Vijayananda Mohire
Hybrid quantum classical neural networks with pytorch and qiskitHybrid quantum classical neural networks with pytorch and qiskit
Hybrid quantum classical neural networks with pytorch and qiskit
Vijayananda Mohire268 views
Assignment 5.2.pdf von dash41
Assignment 5.2.pdfAssignment 5.2.pdf
Assignment 5.2.pdf
dash414 views
I am working on this code for my project- but the accuracy is 0-951601.docx von RyanEAcTuckern
I am working on this code for my project- but the accuracy is 0-951601.docxI am working on this code for my project- but the accuracy is 0-951601.docx
I am working on this code for my project- but the accuracy is 0-951601.docx
RyanEAcTuckern8 views
How to use SVM for data classification von Yiwei Chen
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classification
Yiwei Chen739 views
maXbox starter65 machinelearning3 von Max Kleiner
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3
Max Kleiner168 views
MachineLearningApplication.docx von ArnabNath30
MachineLearningApplication.docxMachineLearningApplication.docx
MachineLearningApplication.docx
ArnabNath308 views
Nyc open-data-2015-andvanced-sklearn-expanded von Vivian S. Zhang
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expanded
Vivian S. Zhang2.6K views
My code from sklearn-datasets import load_diabetes from sklearn-model.pdf von KOCHHARHOSY
My code  from sklearn-datasets import load_diabetes from sklearn-model.pdfMy code  from sklearn-datasets import load_diabetes from sklearn-model.pdf
My code from sklearn-datasets import load_diabetes from sklearn-model.pdf
KOCHHARHOSY7 views
Machine Learning - Introduction von Empatika
Machine Learning - IntroductionMachine Learning - Introduction
Machine Learning - Introduction
Empatika771 views
logistic regression with python and R von Akhilesh Joshi
logistic regression with python and Rlogistic regression with python and R
logistic regression with python and R
Akhilesh Joshi1.1K views
생산적인 개발을 위한 지속적인 테스트 von 기룡 남
생산적인 개발을 위한 지속적인 테스트생산적인 개발을 위한 지속적인 테스트
생산적인 개발을 위한 지속적인 테스트
기룡 남1.2K views
Can someone please explain what the code below is doing and comment on.pdf von kuldeepkumarapgsi
Can someone please explain what the code below is doing and comment on.pdfCan someone please explain what the code below is doing and comment on.pdf
Can someone please explain what the code below is doing and comment on.pdf
6.3.2 CLIMADA model demo von NAP Events
6.3.2 CLIMADA model demo6.3.2 CLIMADA model demo
6.3.2 CLIMADA model demo
NAP Events325 views

Último

DRBD Deep Dive - Philipp Reisner - LINBIT von
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
140 views21 Folien
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... von
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
123 views28 Folien
Future of AR - Facebook Presentation von
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
62 views27 Folien
Cencora Executive Symposium von
Cencora Executive SymposiumCencora Executive Symposium
Cencora Executive Symposiummarketingcommunicati21
139 views14 Folien
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... von
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...ShapeBlue
85 views10 Folien
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT von
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
166 views8 Folien

Último(20)

DRBD Deep Dive - Philipp Reisner - LINBIT von ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue140 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... von ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue123 views
Future of AR - Facebook Presentation von Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty62 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... von ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue85 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT von ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue166 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems von ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue197 views
The Role of Patterns in the Era of Large Language Models von Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li80 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive von Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Digital Personal Data Protection (DPDP) Practical Approach For CISOs von Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash153 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... von Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue von ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue93 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T von ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue112 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue von ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue103 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool von ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue84 views
NTGapps NTG LowCode Platform von Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu365 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... von ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue88 views

Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)

  • 1. A Union of Scikit-learn and PyTorch Thomas Fan github.com/thomasjpfan/pydata2018_dc_skorch
  • 3. SciKit-Learn API clf = SGDClassifier(alpha=0.01) clf.fit(X, y) y_pred = clf.predict(X) clf.partial_fit(X, y) clf.set_params(alpha=0.1)
  • 5. PyTorch Training - Training for epoch in range(10): net.train() for inputs, labels in train_loader: optimizer.zero_grad() with torch.set_grad_enabled(True): outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step()
  • 6. PyTorch Training - Recording Metrics train_losses = [] for epoch in range(10): running_loss = 0.0 for inputs, label in train_loader: ... running_loss += loss.item() * inputs.size(0) epoch_loss = running_loss / len(train_loader.dataset) train_losses.append(epoch_loss)
  • 7. PyTorch Training - Validation net.eval() with torch.set_grad_enabled(False): for data in valid_loader: inputs, labels = data outputs = net(inputs) loss = criterion(outputs, labels)
  • 8. PyTorch Training - The Rest • Recording validation losses • Saving the best performing model • Recording other metrics • Logging • ...
  • 9. 1. Scikit-Learn compatible neural network library that wraps PyTorch. 2. Abstracts away the training loop. 3. Reduces the amount of boilerplate code with callbacks.
  • 10. Skorch NeuralNet from skorch import NeuralNet net = NuetralNet( module, criterion= ..., callbacks=[ ...])
  • 11. Exploring Skorch's API 1. MNIST 2. Ants and Bees 3. 2018 Kaggle Data Science Bowl
  • 12. MNIST - Data print(X.shape, y.shape) # (70000, 784) (70000,)
  • 13. MNIST - Data Code from sklearn.model_selection import train_test_split X_scaled = X / X.max() X_train, X_test, y_train, y_test = train_test_split( X_scaled, y, test_size=0.25, random_state=42)
  • 14. MNIST - Neural Network Module from torch.nn as nn class SimpleFeedforward(nn.Module): def __init __(self, dropout=0.5): super(). __init __() self.module = nn.Sequential( nn.Linear(784, 98), nn.ReLU(inplace=True), nn.Dropout(dropout), nn.Linear(98, 10)) def forward(self, X): return self.module(X)
  • 16. MNIST - Loss function skorch from skorch import NeuralNet net = NeuralNet( SimpleFeedforward, criterion=nn.CrossEntropyLoss, max_epochs=10, lr=0.3, device='cuda', # comment to train on cpu )
  • 17. MNIST - Fi!ing _ = net.fit(X_train, y_train)
  • 18. MNIST - Continue Training net.set_params(max_epochs=5) _ = net.partial_fit(X_train, y_train)
  • 19. MNIST - History len(net.history) # 15 net.history[-1, 'valid_loss'] # 0.10163110941932314 net.history[-2:, 'train_loss'] # [0.13314295971961249, # 0.1330454680351984]
  • 20. MNIST - Accuracy Score from sklearn.metrics import make_scorer def accuracy_argmax(y_true, y_pred): return np.mean(y_true == np.argmax(y_pred, -1)) accuracy_argmax_scorer = make_scorer(accuracy_argmax)
  • 21. MNIST - EpochScoring from skorch.callbacks import EpochScoring epoch_acc = EpochScoring( accuracy_argmax_scorer, name='valid_acc', lower_is_better=False) net = NeuralNet( ..., callbacks=[epoch_acc] )
  • 22. MNIST - Fi!ing With EpochScoring _ = net.fit(X, y)
  • 23. MNIST - Prediction y_pred = net.predict(X_test) print('test accuracy:', accuracy_argmax(y_test, y_pred)) # test accuracy: 0.9634857142857143
  • 24. MNIST - Scikit-Learn Integration from sklearn.pipeline import Pipeline from sklearn.preprocessing import MinMaxScaler pipe = Pipeline([ ("min_max", MinMaxScaler()), ("net", net)]) _ = pipe.fit(X_train, y_train)
  • 25. MNIST - Grid Search from sklearn.model_selection import GridSearchCV param_grid = {"net __module __dropout": [0.2, 0.5, 0.8]} gs = GridSearchCV(pipe, param_grid, cv=3, scoring=accuracy_argmax_scorer) _ = gs.fit(X, y) print("best score:", gs.best_score_) # best score: 0.9651 print("best_params", gs.best_params_) # best_params {'net __module __dropout': 0.2}
  • 27. Ants and Bees - Folder Structure datasets/hymenoptera_data/ ├── train │ ├── ants │ └── bees └── val ├── ants └── bees
  • 28. Ants and Bees - ImageFolder Init from torchvision.datasets import ImageFolder train_tfms = ... val_tfms = ... train_ds = ImageFolder( "datasets/hymenoptera_data/train" , train_tfms) val_ds = ImageFolder( "datasets/hymenoptera_data/val", val_tfms)
  • 29. Ants and Bees - ImageFolder Class Subclass of torch.utils.data.Dataset print(len(train_ds), len(val_ds)) # (244, 153) img, target = train_ds[0] print(img.shape, target) # (torch.Size([3, 224, 224]), 0) # For ImageFolder only: print(train_ds.class_to_idx) # {'ants': 0, 'bees': 1}
  • 30. Ants and Bees - ImageFolder Transformations import torchvision.transforms as tfms train_tfms = tfms.Compose([ tfms.RandomResizedCrop(224), tfms.RandomHorizontalFlip(), tfms.ToTensor(), tfms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) train_ds = ImageFolder( "datasets/hymenoptera_data/train" , train_tfms)
  • 31. Ants and Bees - ImageNet • 1000 classes • 1300 images for each class • Mean of ImageNet: [0.485, 0.456, 0.406] • Standard Deviation of ImageNet: [0.229, 0.224, 0.225]
  • 32. Ants and Bees - ResNet Model K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of CVPR, pages 770–778, 2016. arxiv.org/abs/1512.03385
  • 33. Ants and Bees - ResNet Model Code from torchvision.models import resnet18 import torch.nn as nn class PretrainedModel(nn.Module): def __init __(self): super(). __init __() self.model = resnet18(pretrained=True) self.model.fc = nn.Linear(512, 2) def forward(self, X): return self.model(X)
  • 34. Ants and Bees - Freezer from skorch.callbacks import Freezer freezer = Freezer( lambda name: not name.startswith("model.fc"))
  • 35. Ants and Bees - Learning Rate Scheduler from skorch.callbacks import ( LRScheduler ) lr_scheduler = LRScheduler( policy="StepLR", step_size=7, gamma=0.1)
  • 36. Ants and Bees - Checkpoints from skorch.callbacks import Checkpoint epoch_acc = EpochScoring( ..., name='valid_acc', lower_is_better=False) checkpoint = Checkpoint( dirname="exp_01_bee_vs_ant", monitor="valid_acc_best")
  • 37. Ants and Bees - Skorch NeuralNet from skorch.helper import predefined_split net = NeuralNet( PretrainedModel, lr=0.001, batch_size=4, train_split=predefined_split(val_ds), callbacks=[freezer, lr_scheduler, epoch_acc, checkpoint], ... )
  • 38. Ants and Bees - Fi!ing _ = net.fit(train_ds)
  • 39. Ants and Bees - Checkpoint Files exp_01_bee_vs_ant ├── history.json ├── optimizer.pt └── params.pt
  • 40. Ants and Bees - Loading from Checkpoint # net.fit( ...) was called net.load_params(checkpoint=checkpoint) val_output = net.predict(val_ds)
  • 41. Ants and Bees - Prediction checkpoint = Checkpoint( ..., dirname="exp_01_bee_vs_ant", monitor="valid_acc_best") net = NeuralNet(PretrainedModel, ...) net.initialize() net.load_params(checkpoint=checkpoint) val_pred = net.predict(val_ds)
  • 42. Ants and Bees - Prediction Numpy print(X_numpy.shape) # (1, 3, 224, 224) X_pred = net.predict(X_numpy) print(X_pred) # [[ 0.4966519, -0.9894746]] print(softmax(X_pred)) # [[0.8154962 0.18450384]]
  • 45. Nuclei Image Segmentation - Dataset train_cell_ds = CellsDataset( ...) valid_cell_ds = CellsDataset( ...) print(train_cell_ds[0]) # (<PIL.Image.Image> # <PIL.PngImagePlugin.PngImageFile>)
  • 46. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, pp. 234–241, Springer, 2015. arxiv.org/abs/1505.04597
  • 47. from skorch.callbacks import Freezer freezer = Freezer('conv*')
  • 48. Nuclei Image Segmentation - PatchedDataset
  • 49. Nuclei Image Segmentation - PatchedDataset Code train_ds = PatchedDataset( train_cell_ds, patch_size=(256, 256), padding=16, random_flips=True) val_ds = PatchedDataset( valid_cell_ds, patch_size=(256, 256), padding=16, random_flips=False)
  • 51. Nuclei Image Segmentation - IOU Metric def approximate_iou_metric( true_masks, predicted_logit_masks, padding=16): ... # returns metric iou_scoring = make_scorer(approximate_iou_metric) iou_scoring = EpochScoring( iou_scoring, name='valid_iou', lower_is_better=False) best_cp = Checkpoint( dirname="kaggle_seg_exp01", monitor="valid_iou_best")
  • 52. Nuclei Image Segmentation - Custom Loss class BCEWithLogitsLossPadding(nn.Module): def __init __(self, padding): super(). __init __() self.padding = padding ... net = NeuralNet( UNet, criterion=BCEWithLogitsLossPadding, criterion __padding=16, ... )
  • 53. Nuclei Image Segmentation - Cyclic LR Scheduler cyclicLR = LRScheduler( policy="CyclicLR", base_lr=0.002, max_lr=0.2, step_size_up=550, step_size_down=550)
  • 54. Nuclei Image Segmentation - NeuralNet net = NeuralNet( UNet, criterion=BCEWithLogitsLossPadding, criterion __padding=16, batch_size=32, max_epochs=20, train_split=predefined_split(val_ds), callbacks=[freezer, cyclicLR, iou_scoring, best_cp], ... )
  • 55. Nuclei Image Segmentation - NeuralNet DataLoader PyTorch's DataLoader(pin_memory=False, num_workers=0, ...) net = NeuralNet( ..., iterator_train __shuffle=True, iterator_train __num_workers=4, iterator_train __pin_memory=True, iterator_valid __shuffle=False, iterator_valid __num_workers=4, iterator_valid __pin_memory=True) _ = net.fit(train_ds)
  • 57. Nuclei Image Segmentation - Predict on Validation net.load_params(checkpoint=best_cp) val_masks = net.predict(val_ds) print(val_masks.shape) # (468, 1, 288, 288) val_prob_masks = num_staple_sigmod(val_masks.squeeze(1)) print(val_prob_masks.shape) # (468, 288, 288)
  • 59. Skorch - Closing 1 1. Scikit-Learn compatible neural network library that wraps PyTorch. • net.fit(X, y) • net.partial_fit(X, y) • net.predict(X) • net.set_params( ...) 2. Abstracts away the training loop.
  • 60. Skorch - Closing 2 1. Reduces the amount of boilerplate code with callbacks. • EpochScoring • Freezer • Checkpoint • LRScheduler • skorch.readthedocs.io/en/stable/user/callbacks.html
  • 61. Skorch - Whats next • skorch.readthedocs.io • skorch Tutorials • github.com/dnouri/skorch • github.com/thomasjpfan/pydata2018_dc_skorch
  • 62. Appendix Nuclei Image Segmentation - Cyclic LR Scheduler • Number of training samples: len(train_ds) = 1756 • max_epochs = 20 • batch_size = 32 • Training iterations per epoch: ceil(1756/32) = 55 • Total number of iterations: 55*20 = 1100
  • 63. Appendix Ants and Bees - Saving and Loading from skorch.callbacks import TrainEndCheckpoint from skorch.callbacks import LoadInitState def run(max_epochs): best_cp = Checkpoint(dirname="exp_02", ...) train_end_cp = TrainEndCheckpoint( dirname="exp_02", fn_prefix="train_end_") load_state = LoadInitState(train_end_cp) net = NeuralNet( ..., max_epochs=max_epochs, callbacks=[ ..., best_cp, train_end_cp, load_state] ).fit(train_ds)
  • 64. Appendix Ants and Bees - Saving and Loading Checkpoints exp_02 ├── history.json ├── optimizer.pt ├── params.pt ├── train_end_history.json ├── train_end_optimizer.pt └── train_end_params.pt
  • 65. Appendix Ants and Bees - Saving and Loading First Run run(max_epochs=10)
  • 66. Appendix Ants and Bees - Saving and Loading Second Run run(max_epochs=5)