Deep learning models can be distributed across a cluster to speed up training time and handle large datasets. Deeplearning4j is an open-source deep learning library for Java that runs on Spark, allowing models to be trained in a distributed fashion across a Spark cluster. Training a model involves distributing stochastic gradient descent (SGD) across nodes, with the key challenge being efficient all-reduce communication between nodes. Engineering high performance distributed training, such as with parameter servers, is important to reduce bottlenecks.
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Deep learning on a mixed cluster with Deeplearning4j and Spark
1. Deep learning on a mixed
cluster with Deeplearning4j
and Spark
Barcelona Spark meetup, Dec 9, 2016
(right after NIPS)
francois@garillot.net @huitseeker
4. The bad thing about doing a
talk right after NIPS
you guys are scary.
5. The good thing about doing a
talk right after NIPS
You guys don't need to be told SkyNet is a fantasy (for now).
6. Paying algorithms
Anomaly detection in many forms (bad guys / predictive
maintenance / market rally)
Fraud detection
Network intrusion
Fintech secutiries churn prediction
Video object detection (security)
7. Models that are being
neglected in benchmarks and
implementation efforts
LSTMs
Autoencoders
8. How to deal with this in the
Spark world ?
experiment with trained model application: Tensorframes,
what are the deep learning frameworks that let you train
?
17. Cluster training in the
enterprise
it's really about multi-tenancy & economies of scale
a big bunch of machines shared among everybody shares
better
if only because you can reuse it for other workloads
Minor reasons
enterprises may not have
GPUs
20. Cluster training in your
(experimentor) case ?
it's a fun problem : AllReduce
Ultimately solved for people with a large amount of images
that solution is not open-source (but at Facebook, Google,
Amazon, Microsoft¹, Baidu)
¹: 1-bit SGD is under non-commercial license in CNTK 2.0
27. Scienti c computing on the JVM
libnd4j : Vectorization, 32-bit addressing, linalg (BLAS!)
JavaCPP: generates JNI bindings to your CPP libs
ND4J : numpy for the JVM, native superfast arrays
Datavec : one-stop interface to an NDArray
DeepLearning4J: orchestration, backprop, layer de nition
ScalNet: gateway drug, inspired from (and closely following)
Keras
RL4J : Reinforcement learning for the JVM
28. With Spark
JavaSparkContent sc = ...;
JavaRDD<DataSet> trainingData = ...;
MultiLayerConfiguration networkConfig = ...;
//Create the TrainingMaster instance
int examplesPerDataSetObject = 1;
TrainingMaster trainingMaster =
new ParameterAveragingTrainingMaster.Builder(examplesPerDataSetObjec
.(other configuration options)
.build();
//Create the SparkDl4jMultiLayer instance
SparkDl4jMultiLayer sparkNetwork =
new SparkDl4jMultiLayer(sc, networkConfig, trainingMaster);
//Fit the network using the training data:
sparkNetwork.fit(trainingData);
30. Even if you don't care about Deep
learning
(from Kazuaki Ishizaki @ IBM Japan)
SPARK-6442 : better linear algebra than
breeze
ND4J will have sparse representations soon
31. Even if you don't care about Deep
learning II
Meta-
RDDs
32. Killing the bottlenecks
Spark has already changed its networking backend once.
better support for parameters servers and their fault
tolerance.
33. A Last Word (from Andrew Y. Ng)
get involved !
don't just read papers, reproduce research
results
Also
We're happy to mentor contributions, and there's a book !