DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but it presents some unexpected issues that can compromise performance and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them, coming from lessons learned when putting things in production.
2. Guglielmo Iozzia, MSD
Deep Learning with DL4J
on Apache Spark: Yeah
it's Cool, but are You
Doing it the Right Way?
#UnifiedDataAnalytics #SparkAISummit
3. About Me
Currently at
Previously at
Author at Packt Publishing
Got some awards lately
Champion I love cooking
#UnifiedDataAnalytics #SparkAISummit #GuglielmoIozzia
3
4. MSD in ireland
4#UnifiedDataAnalytics #SparkAISummit
+ 50 years
Approx. 2,000 employees
$2.5 billion investment to date
Approx 50% MSD’s top 20 products manufactured here
Export to + 60 countries
€6.1 billion turnover in 2017
2017 + 300 jobs & €280m investment
MSD Biotech, Dublin, coming in 2021
6. Deep Learning
It is a subset of machine learning where
artificial neural networks, algorithms
inspired by the human brain, learn from
large amounts of data.
#UnifiedDataAnalytics #SparkAISummit
6
9. Some practical applications of
Deep Learning
• Computer vision
• Text generation
• NLP and NLU
• Autonomous cars
• Robotics
• Gaming
• Quantitative finance
• Manufacturing
9#UnifiedDataAnalytics #SparkAISummit
10. Challenges of training MNNs in
Spark
• Different execution models between Spark and
the DL frameworks
• GPU configuration and management
• Performance
• Accuracy
10#UnifiedDataAnalytics #SparkAISummit
14. ND4J
It is an Open Source linear algebra and
matrix manipulation library which supports
n-dimensional arrays and it is integrated
with Apache Hadoop and Spark.
#UnifiedDataAnalytics #SparkAISummit
14
16. Why Distributed MNN Training
with DL4J and Apache Spark?
Why this is a powerful combination?
16#UnifiedDataAnalytics #SparkAISummit
17. DL4J + Apache Spark
• DL4J provides high level API to design, configure, train and
evaluate MNNs.
• Spark performances are excellent in particular for ETL/streaming,
but in terms of computation, in a MNN training context, some data
transformation/aggregation need to be done using a low-level
language.
• DL4J uses ND4J, which is a C++ library that provides high level
Scala API to developers.
17#UnifiedDataAnalytics #SparkAISummit
18. DL4J + Apache Spark
18#UnifiedDataAnalytics #SparkAISummit
Model Parallelization Data Parallelization
19. How Training Happens in Spark
with DL4J?
19#UnifiedDataAnalytics #SparkAISummit
Parameter Averaging
(DL4J 1.0.0-alpha)
Asynchronous SDG
(DL4J 1.0.0-beta+)
20. So: What could possibly go wrong?
20#UnifiedDataAnalytics #SparkAISummit
23. Memory Management in DL4J
Memory allocations can be managed using two different approaches:
• JVM GC and WeakReference tracking
• MemoryWorkspaces
The idea behind both is the same:
once a NDArray is no longer required, the off-heap memory associated
with it should be released so that it can be reused.
23#UnifiedDataAnalytics #SparkAISummit
24. Memory Management in DL4J
The difference between the two approaches is:
• JVM GC: when a INDArray is collected by the garbage collector, its
off-heap memory is deallocated, with the assumption that it is not
used elsewhere.
• MemoryWorkspaces: when a INDArray leaves the workspace
scope, its off-heap memory may be reused, without deallocation
and reallocation.
24#UnifiedDataAnalytics #SparkAISummit
Better performance for training and inference.
25. Memory Management in DL4J
Please remember that, when a training process uses
workspaces, in order to get the most from this approach,
periodic GC calls need to be disabled:
Nd4j.getMemoryManager.togglePeriodicGc(false)
or their frequency needs to be reduced:
val gcInterval = 10000 // In milliseconds
Nd4j.getMemoryManager.setAutoGcWindow(gcInterval)
25#UnifiedDataAnalytics #SparkAISummit
27. Root Cause and Potential
Solutions
Dependencies conflict between the DL4J-UI library and
Apache Spark when running in the same JVM.
Two alternatives are available:
• Collect and save the relevant training stats at runtime,
and then visualize them offline later.
• Execute the UI and use its remote functionality into
separate JVMs (servers). Metrics are uploaded from
the Spark master to the UI server.
27#UnifiedDataAnalytics #SparkAISummit
28. Serialization & ND4J
Data Serialization is the process of converting the
in-memory objects to another format that can be
used to store or send them over the network.
Two options available in Spark:
• Java (default)
• Kryo
28#UnifiedDataAnalytics #SparkAISummit
29. Do You Opt for Kryo?
Kryo doesn’t work well with
off-heap data structures.
29#UnifiedDataAnalytics #SparkAISummit
30. How to Use Kryo Serialization
with ND4J?
1. Add the ND4J-Kryo dependency to the project
2. Configure the Spark application to use the ND4J Kryo
Registrator:
val sparkConf = new SparkConf
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator")
30#UnifiedDataAnalytics #SparkAISummit
31. Spark and Large Off-heap
Objects
Spark has problems handling Java objects with
large off-heap components, in particular in
caching or persisting them.
When working with DL4J, this is a frequent case,
as DataSet and NDArray objects are involved.
31#UnifiedDataAnalytics #SparkAISummit
32. Spark and Large Off-heap
Objects
Spark drops part of a RDD based on the estimated size of
that block. It estimates the size of a block depending on
the selected persistence level.
In case of MEMORY_ONLY or MEMORY_AND_DISK,
the estimate is done by walking the Java object graph.
This process doesn't take into account the off-heap
memory used by DL4J and ND4J, so Spark under-
estimates the true size of objects like DataSets or
NDArrays.
32#UnifiedDataAnalytics #SparkAISummit
O
ut of M
em
ory
Exception!
33. Spark and Large Off-heap
Objects
It is then good practice using MEMORY_ONLY_SER or
MEMORY_AND_DISK_SER when persisting a
RDD<DataSet> or a RDD<INDArray>.
This way Spark stores blocks on the JVM heap in
serialized form. Because there is no off-heap memory for
the serialized objects, it can accurately estimate their
size, in so avoiding out of memory issues.
33#UnifiedDataAnalytics #SparkAISummit
34. Configuring the Memory Limits
Java command line arguments available:
• -Xms
• -Xmx
• -Dorg.bytedeco.javacpp.maxbytes: to specify the off-
heap memory limit
• -Dorg.bytedeco.javacpp.maxphysicalbytes: (optional)
to specify the maximum bytes for the entire process
34#UnifiedDataAnalytics #SparkAISummit
35. Configuring the Memory Limits
Caveat:
In limited memory environments it’s a bad idea to use high
-Xmx value together with the -Xms option. This way not
enough off-heap memory would be left.
Example:
• A system with 32 GB of RAM.
• -Xmx28G
• 4 GB of RAM left for the off-heap memory, the OS
and everything else running in the machine.
35#UnifiedDataAnalytics #SparkAISummit
36. Configuring the Memory Limits
General best practice:
Typically in DL4J applications you need less RAM to be
used in the JVM heap and more too be used in off-heap,
since all NDArrays are stored there.
Allocating too much to the JVM heap, there will not be
enough memory left for the off-heap memory.
36#UnifiedDataAnalytics #SparkAISummit
37. Python Models Import in DL4J
37#UnifiedDataAnalytics #SparkAISummit
TensorFlow
DL4J Memory Management Applies Here
38. You can find
more details on
DL4J and Spark
in my Book
http://tinyurl.com/y9jkvtuy
38
39. Thank You!
Any Questions?
You can find me at
@guglielmoiozzia
https://ie.linkedin.com/in/giozzia
googlielmo.blogspot.com
39#UnifiedDataAnalytics #SparkAISummit
40. DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT