SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
Deep Learning
with Spark
Anastasia Lieva
Fuzzy Humanist, Data-Scientist
@lievAnastazia
Spark is a new Hero
Deep Learning is a new Hero
BigDL is a new epic story
BigDL
High-level deep learning library
BigDL
High-level deep learning library
BigDL
High-level deep learning library
Intel MKL
Scale-out w/ Spark
BigDL
Intel MKL
BigDL : Deep Learning on Spark
BigDL : Deep Learning on Spark
API:
Scala and Python
API:
Scala and Python
BUT
API:
Scala and Python
BUT
the disadvantage of all Python APIs
is
API:
Scala and Python
BUT
the disadvantage of all Python APIs
is
that they are written in Python
API:
Scala a̶̶̶n̶̶̶d̶̶̶ ̶̶̶P̶̶̶y̶̶̶t̶̶̶h̶̶̶o̶̶̶n̶̶̶
val conf = Engine.createSparkConf()
.setAppName("DeepLearningOnSpark")
.setMaster("local[3]")
val sparkSession = SparkSession.builder()
.config(conf).getOrCreate()
val sqlContext = sparkSession.sqlContext
val sparkContext = sparkSession.sparkContext
Engine.init
The same configs as Spark
val conf = Engine.createSparkConf()
.setAppName("DeepLearningOnSpark")
.setMaster("local[3]")
val sparkSession = SparkSession.builder()
.config(conf).getOrCreate()
val sqlContext = sparkSession.sqlContext
val sparkContext = sparkSession.sparkContext
Engine.init
The same configs as Spark
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Model Architecture
Tensor
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/linear_algebra.html
DATA
Tensor
Sparse TensorTable
Sample
DATA
Tensor
Sparse TensorTable
Sample
Lua / Torch Tables
(Tensor of Features, Tensor of Targets)
Tensor(indices, values, shape)
DATA
Tensor
Sparse TensorTable
Sample
Mini-batch
Batch of Samples
DATA
DATA
Tensor
Sparse TensorTable
Sample
Mini-batch DataSet
For advanced applications only
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Model Architecture
More than 100 layers !
Embedding
Pooling
Convolution
Normalization
Reccurent
DropOut
Sparse
… and others
Layers
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Expected
Learning by Backpropagation
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Learning by Backpropagation
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error
Learning by Backpropagation
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error
Update weights in every layer w/ an optimization algorithm
Learning by Backpropagation
L
A
Y
E
R
1
L
A
Y
E
R
2
L
A
Y
E
R
3
L
A
Y
E
R
4
L
A
Y
E
R
5
Input
Data
Prediction
Ground truth
Error
Update weights in every layer w/ an optimization algorithm
Retry prediction with updated weights
Learning by Backpropagation
Losses
More than 30 criterions :
mean squared error,
binary cross entropy,
negative log likelihood criterion,
KL-divergence of the Gaussian distribution...
Losses
More than 30 criterions :
mean squared error,
binary cross entropy,
negative log likelihood criterion,
KL-divergence of the Gaussian distribution...
Optimization algorithms
Most popular gradient descent algorithms :
SGD, Adam, Adagrad, Adadelta, AdaMax
Let’s predict something!
Let’s predict something!
X X
Let’s predict something!
X X
Good BadMore Or Less
RegexTokenizer()
Word2Vec()
SpakMLlib
Preprocess unstructured data
RegexTokenizer()
Word2Vec()
Tensor[Vector]
Sample(featureTensor, label)
SpakMLlib
BigDL
Preprocess unstructured data
http://intellabs.github.io/RiverTrail/tutorial/
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII
Convolutional Neural Network
Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII
Montpellier
#systeme, reseau , #Devops, #Linux
pas une SSII
Convolutional Neural Network
Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII
Montpellier
#systeme, reseau , #Devops, #Linux
pas une SSII
$$$$$ ?
Convolutional Neural Network
Bonjour, on recrute à Montpellier (#systeme, reseau , #Devops, #Linux ).
n'hésitez pas à postuler et à diffuser, Merci beaucoup .
PS nous ne sommes pas une SSII
Montpellier
#systeme, reseau , #Devops, #Linux
pas une SSII
$$$$$ ?
Bad
T
E
M
P
O
R
A
L
Conv
R
E
L
U
T
E
M
P
O
R
A
L
MaxP
ool
L
I
N
E
A
R
D
R
O
P
O
U
T
R
E
L
U
L
I
N
E
A
R
L
O
G
S
O
F
T
M
A
X
Model Architecture
val model = Sequential[Double]()
.add(TemporalConvolution(inputSize, outputSizeTempConv, kernelSize))
.add(ReLU())
.add(TemporalMaxPooling(outputSizeMaxPooling)
.add(Linear(inputSizeLinearLayer, outputSizeLinearLayer))
.add(Dropout(0.1))
.add(ReLU())
.add(Linear(inputSizeLinearLayer2, outputSizeLinearLayer2))
.add(LogSoftMax())
Model Architecture
In BigDL
val criterion = new ClassNLLCriterion[Double]
val optimizer = Optimizer(model, trainData, criterion, batchSize)
optimizer
.setOptimMethod(
new Adagrad(learningRate, learningRateDecay))
.optimize()
Training model
In BigDL
val criterion = new ClassNLLCriterion[Double]
val optimizer = Optimizer(model, trainData, criterion, batchSize)
optimizer
.setOptimMethod(
new Adagrad(learningRate, learningRateDecay))
.optimize()
Training model
In BigDL
val optimizer = Optimizer.apply(model, trainData, criterion, 6)
val logdir = "mylogdir"
val appName = "job-offers-filter"
val trainSummary = TrainSummary(logdir, appName)
val validationSummary = ValidationSummary(logdir, appName)
optimizer.setTrainSummary(trainSummary)
optimizer.setValidationSummary(validationSummary)
optimizer
.setOptimMethod(
new Adagrad(learningRate = 0.01, learningRateDecay = 0.0002))
.optimize()
Config for tensorboard
BigDL & Tensorboard
BigDL & Tensorboard
Spark Pipelines Integration
Preprocess unstructured data
RegexTokenizer()
Word2Vec()
Dataframe
.select(“features”, “label”)
SpakMLlib
val model = Sequential[Double]()
.add(TemporalConvolution(100, 20, 5))
.add(ReLU())
.add(TemporalMaxPooling(96))
.add(Linear(20, 100))
.add(Dropout(0.1))
.add(ReLU())
.add(Linear(100, 3))
.add(LogSoftMax())
val criterion = new ClassNLLCriterion[Double]
Spark Integration
Spark Integration
val estimator = new DLEstimator(model, criterion, featureSize, labelSize)
.setLearningRate(0.01)
.setBatchSize(6)
val trainedModel = estimator.fit(trainDataframe)
val predictions = trainedModel.transform(testDataframe)
Spark Integration
val estimator = new DLEstimator(model, criterion, featureSize, labelSize)
.setLearningRate(0.01)
.setBatchSize(6)
val trainedModel = estimator.fit(trainDataframe)
val predictions = trainedModel.transform(testDataframe)
Estimator
Transformer
Interoperability
Your
model
BigDL Torch
Tensor
flowCaffe
Keras
Post more job offers on comm-montpellier.slack !
https://bit.ly/comm-mtp
offres qualifiées correctement tant sur le domaine,
les technos que la fourchette salariale. Ou à minima
avec un pitch marrant ;)

Weitere ähnliche Inhalte

Ähnlich wie Deep Learning with Spark

SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
Keiichiro Ono
 
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Christian Schneider
 

Ähnlich wie Deep Learning with Spark (20)

Power of linked list
Power of linked listPower of linked list
Power of linked list
 
DWX 2013 Nuremberg
DWX 2013 NurembergDWX 2013 Nuremberg
DWX 2013 Nuremberg
 
Java Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
Java Hurdling: Obstacles and Techniques in Java Client Penetration-TestingJava Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
Java Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
 
2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REA
 
Learn you some Ansible for great good!
Learn you some Ansible for great good!Learn you some Ansible for great good!
Learn you some Ansible for great good!
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?
 
[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back[JSDC 2016] Codex: Conditional Modules Strike Back
[JSDC 2016] Codex: Conditional Modules Strike Back
 
Games for the Masses (Jax)
Games for the Masses (Jax)Games for the Masses (Jax)
Games for the Masses (Jax)
 
Reuse, Reduce, Recycle in Serverless World
Reuse, Reduce, Recycle in Serverless WorldReuse, Reduce, Recycle in Serverless World
Reuse, Reduce, Recycle in Serverless World
 
Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015Istanbul Spark Meetup Nov 28 2015
Istanbul Spark Meetup Nov 28 2015
 
Ransack, an Application Built on Ansible's API for Rackspace -- AnsibleFest N...
Ransack, an Application Built on Ansible's API for Rackspace -- AnsibleFest N...Ransack, an Application Built on Ansible's API for Rackspace -- AnsibleFest N...
Ransack, an Application Built on Ansible's API for Rackspace -- AnsibleFest N...
 
Ep keyote slides
Ep  keyote slidesEp  keyote slides
Ep keyote slides
 
Ep keyote slides
Ep  keyote slidesEp  keyote slides
Ep keyote slides
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
 
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
Containers and Developer Defined Data Centers - Evan Powell - Keynote in Bang...
 
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
Serial Killer - Silently Pwning your Java Endpoints // OWASP BeNeLux Day 2016
 
Puppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 EditionPuppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 Edition
 
Mind Control to Major Tom: Is It Time to Put Your EEG Headset On?
Mind Control to Major Tom: Is It Time to Put Your EEG Headset On? Mind Control to Major Tom: Is It Time to Put Your EEG Headset On?
Mind Control to Major Tom: Is It Time to Put Your EEG Headset On?
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015
 
The State of Wicket
The State of WicketThe State of Wicket
The State of Wicket
 

Mehr von Anastasia Bobyreva

Big Data Science in Scala ( Joker 2017, slides in Russian)
Big Data Science in Scala ( Joker 2017, slides in Russian)Big Data Science in Scala ( Joker 2017, slides in Russian)
Big Data Science in Scala ( Joker 2017, slides in Russian)
Anastasia Bobyreva
 

Mehr von Anastasia Bobyreva (10)

Extreme data Science (English version)
Extreme data Science (English version)Extreme data Science (English version)
Extreme data Science (English version)
 
Extreme Data Science
Extreme Data ScienceExtreme Data Science
Extreme Data Science
 
Make Data Science Great Again. Pourquoi et comment crafter la Data Science su...
Make Data Science Great Again. Pourquoi et comment crafter la Data Science su...Make Data Science Great Again. Pourquoi et comment crafter la Data Science su...
Make Data Science Great Again. Pourquoi et comment crafter la Data Science su...
 
NUPIC : new concept of AI
NUPIC : new concept of AINUPIC : new concept of AI
NUPIC : new concept of AI
 
LearnLink project for Startup Week-End Montpellier
LearnLink project for Startup Week-End MontpellierLearnLink project for Startup Week-End Montpellier
LearnLink project for Startup Week-End Montpellier
 
Google voice transcriptions demystified: Introduction to recurrent neural ne...
 Google voice transcriptions demystified: Introduction to recurrent neural ne... Google voice transcriptions demystified: Introduction to recurrent neural ne...
Google voice transcriptions demystified: Introduction to recurrent neural ne...
 
Big Data Science in Scala ( Joker 2017, slides in Russian)
Big Data Science in Scala ( Joker 2017, slides in Russian)Big Data Science in Scala ( Joker 2017, slides in Russian)
Big Data Science in Scala ( Joker 2017, slides in Russian)
 
Big Data Science in Scala V2
Big Data Science in Scala V2 Big Data Science in Scala V2
Big Data Science in Scala V2
 
Which library should you choose for data-science? That's the question!
Which library should you choose for data-science? That's the question!Which library should you choose for data-science? That's the question!
Which library should you choose for data-science? That's the question!
 
Big Data Science in Scala
Big Data Science in ScalaBig Data Science in Scala
Big Data Science in Scala
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Deep Learning with Spark