Deep Learning with Spark

Deep Learning
with Spark
Anastasia Lieva
Fuzzy Humanist, Data-Scientist
@lievAnastazia

BigDL
High-level deep learning library

BigDL
High-level deep learning library
Intel MKL
Scale-out w/ Spark

BigDL : Deep Learning on Spark

API:
Scala and Python
BUT
the disadvantage of all Python APIs
is

API:
Scala and Python
BUT
the disadvantage of all Python APIs
is
that they are written in Python

API:
Scala a̶̶̶n̶̶̶d̶̶̶ ̶̶̶P̶̶̶y̶̶̶t̶̶̶h̶̶̶o̶̶̶n̶̶̶

val conf = Engine.createSparkConf()
.setAppName("DeepLearningOnSpark")
.setMaster("local[3]")
val sparkSession = SparkSession.builder()
.config(conf).getOrCreate()
val sqlContext = sparkSession.sqlContext
val sparkContext = sparkSession.sparkContext
Engine.init
The same configs as Spark

L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Model Architecture

Tensor
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/linear_algebra.html
DATA

Tensor
Sparse TensorTable
Sample
DATA

Tensor
Sparse TensorTable
Sample
Lua / Torch Tables
(Tensor of Features, Tensor of Targets)
Tensor(indices, values, shape)
DATA

Tensor
Sparse TensorTable
Sample
Mini-batch
Batch of Samples
DATA

DATA
Tensor
Sparse TensorTable
Sample
Mini-batch DataSet
For advanced applications only

More than 100 layers !
Embedding
Pooling
Convolution
Normalization
Reccurent
DropOut
Sparse
… and others
Layers

L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Expected
Learning by Backpropagation

L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction

L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error

L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error
Update weights in every layer w/ an optimization algorithm

L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error
Update weights in every layer w/ an optimization algorithm
Retry prediction with updated weights

Losses
More than 30 criterions :
mean squared error,
binary cross entropy,
negative log likelihood criterion,
KL-divergence of the Gaussian distribution...

Losses
More than 30 criterions :
mean squared error,
binary cross entropy,
negative log likelihood criterion,
KL-divergence of the Gaussian distribution...
Optimization algorithms
Most popular gradient descent algorithms :
SGD, Adam, Adagrad, Adadelta, AdaMax

Let’s predict something!
X X

Let’s predict something!
X X
Good BadMore Or Less

RegexTokenizer()
Word2Vec()
SpakMLlib
Preprocess unstructured data

RegexTokenizer()
Word2Vec()
Tensor[Vector]
Sample(featureTensor, label)
SpakMLlib
BigDL

http://intellabs.github.io/RiverTrail/tutorial/
Convolutional Neural Network

Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII

Montpellier
#systeme, reseau , #Devops, #Linux
pas une SSII

Montpellier
pas une SSII
$$$$$ ?

Montpellier
pas une SSII
$$$$$ ?
Bad

T
E
M
P
O
R
A
L
Conv
R
E
L
U
T
E
M
P
O
R
A
L
MaxP
ool
L
I
N
E
A
R
D
R
O
P
O
U
T
R
E
L
U
L
I
N
E
A
R
L
O
G
S
O
F
T
M
A
X
Model Architecture

val model = Sequential[Double]()
.add(TemporalConvolution(inputSize, outputSizeTempConv, kernelSize))
.add(ReLU())
.add(TemporalMaxPooling(outputSizeMaxPooling)
.add(Linear(inputSizeLinearLayer, outputSizeLinearLayer))
.add(Dropout(0.1))
.add(ReLU())
.add(Linear(inputSizeLinearLayer2, outputSizeLinearLayer2))
.add(LogSoftMax())
Model Architecture
In BigDL

val criterion = new ClassNLLCriterion[Double]
val optimizer = Optimizer(model, trainData, criterion, batchSize)
optimizer
.setOptimMethod(
new Adagrad(learningRate, learningRateDecay))
.optimize()
Training model
In BigDL

val optimizer = Optimizer.apply(model, trainData, criterion, 6)
val logdir = "mylogdir"
val appName = "job-offers-filter"
val trainSummary = TrainSummary(logdir, appName)
val validationSummary = ValidationSummary(logdir, appName)
optimizer.setTrainSummary(trainSummary)
optimizer.setValidationSummary(validationSummary)
optimizer
.setOptimMethod(
new Adagrad(learningRate = 0.01, learningRateDecay = 0.0002))
.optimize()
Config for tensorboard

RegexTokenizer()
Word2Vec()
Dataframe
.select(“features”, “label”)
SpakMLlib

val model = Sequential[Double]()
.add(TemporalConvolution(100, 20, 5))
.add(ReLU())
.add(TemporalMaxPooling(96))
.add(Linear(20, 100))
.add(Dropout(0.1))
.add(ReLU())
.add(Linear(100, 3))
.add(LogSoftMax())
val criterion = new ClassNLLCriterion[Double]
Spark Integration

Spark Integration
val estimator = new DLEstimator(model, criterion, featureSize, labelSize)
.setLearningRate(0.01)
.setBatchSize(6)
val trainedModel = estimator.fit(trainDataframe)
val predictions = trainedModel.transform(testDataframe)

Spark Integration
val estimator = new DLEstimator(model, criterion, featureSize, labelSize)
.setLearningRate(0.01)
.setBatchSize(6)
val trainedModel = estimator.fit(trainDataframe)
val predictions = trainedModel.transform(testDataframe)
Estimator
Transformer

Interoperability
Your
model
BigDL Torch
Tensor
flowCaffe
Keras

Post more job offers on comm-montpellier.slack !
https://bit.ly/comm-mtp
offres qualifiées correctement tant sur le domaine,
les technos que la fourchette salariale. Ou à minima
avec un pitch marrant ;)

Deep Learning with Spark

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Deep Learning with Spark

Ähnlich wie Deep Learning with Spark (20)

Mehr von Anastasia Bobyreva

Mehr von Anastasia Bobyreva (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Deep Learning with Spark