Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way?

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Guglielmo Iozzia, MSD
Deep Learning with DL4J
on Apache Spark: Yeah
it's Cool, but are You
Doing it the Right Way?
#UnifiedDataAnalytics #SparkAISummit

About Me
Currently at
Previously at
Author at Packt Publishing
Got some awards lately
Champion I love cooking
#UnifiedDataAnalytics #SparkAISummit #GuglielmoIozzia
3

MSD in ireland
4#UnifiedDataAnalytics #SparkAISummit
+ 50 years
Approx. 2,000 employees
$2.5 billion investment to date
Approx 50% MSD’s top 20 products manufactured here
Export to + 60 countries
€6.1 billion turnover in 2017
2017 + 300 jobs & €280m investment
MSD Biotech, Dublin, coming in 2021

The Dublin Tech Hub
#UnifiedDataAnalytics #SparkAISummit 5

Deep Learning
It is a subset of machine learning where
artificial neural networks, algorithms
inspired by the human brain, learn from
large amounts of data.
6

Deep Learning
http://www.asimovinstitute.org/wp-content/uploads/2019/04/NeuralNetworkZoo20042019.png
7

Deep
Learning
8

Some practical applications of
Deep Learning
• Computer vision
• Text generation
• NLP and NLU
• Autonomous cars
• Robotics
• Gaming
• Quantitative finance
• Manufacturing

Challenges of training MNNs in
Spark
• Different execution models between Spark and
the DL frameworks
• GPU configuration and management
• Performance
• Accuracy

DeepLearning4J
It is an Open Source, distributed,
Deep Learning framework written
for JVM languages.
It can be used on distributed
GPUs and CPUs.
It is integrated with Hadoop
and Apache Spark.

DL4J modules
• DataVec
• Arbiter
• NN
• Datasets
• RL4J
• DL4J-Spark
• Model Import
• ND4J

DL4J Code Example
Training and Evaluation
Network
Configuration

ND4J
It is an Open Source linear algebra and
matrix manipulation library which supports
n-dimensional arrays and it is integrated
with Apache Hadoop and Spark.
14

ND4J Code Example

Why Distributed MNN Training
with DL4J and Apache Spark?
Why this is a powerful combination?

DL4J + Apache Spark
• DL4J provides high level API to design, configure, train and
evaluate MNNs.
• Spark performances are excellent in particular for ETL/streaming,
but in terms of computation, in a MNN training context, some data
transformation/aggregation need to be done using a low-level
language.
• DL4J uses ND4J, which is a C++ library that provides high level
Scala API to developers.

DL4J + Apache Spark
Model Parallelization Data Parallelization

How Training Happens in Spark
with DL4J?
Parameter Averaging
(DL4J 1.0.0-alpha)
Asynchronous SDG
(DL4J 1.0.0-beta+)

So: What could possibly go wrong?

Memory
Management
And now, for something
(a little bit) different.

Memory Utilization at Training Time

Memory Management in DL4J
Memory allocations can be managed using two different approaches:
• JVM GC and WeakReference tracking
• MemoryWorkspaces
The idea behind both is the same:
once a NDArray is no longer required, the off-heap memory associated
with it should be released so that it can be reused.

The difference between the two approaches is:
• JVM GC: when a INDArray is collected by the garbage collector, its
off-heap memory is deallocated, with the assumption that it is not
used elsewhere.
• MemoryWorkspaces: when a INDArray leaves the workspace
scope, its off-heap memory may be reused, without deallocation
and reallocation.
Better performance for training and inference.

Please remember that, when a training process uses
workspaces, in order to get the most from this approach,
periodic GC calls need to be disabled:
Nd4j.getMemoryManager.togglePeriodicGc(false)
or their frequency needs to be reduced:
val gcInterval = 10000 // In milliseconds
Nd4j.getMemoryManager.setAutoGcWindow(gcInterval)

The DL4J training UI

Root Cause and Potential
Solutions
Dependencies conflict between the DL4J-UI library and
Apache Spark when running in the same JVM.
Two alternatives are available:
• Collect and save the relevant training stats at runtime,
and then visualize them offline later.
• Execute the UI and use its remote functionality into
separate JVMs (servers). Metrics are uploaded from
the Spark master to the UI server.

Serialization & ND4J
Data Serialization is the process of converting the
in-memory objects to another format that can be
used to store or send them over the network.
Two options available in Spark:
• Java (default)
• Kryo

Do You Opt for Kryo?
Kryo doesn’t work well with
off-heap data structures.

How to Use Kryo Serialization
with ND4J?
1. Add the ND4J-Kryo dependency to the project
2. Configure the Spark application to use the ND4J Kryo
Registrator:
val sparkConf = new SparkConf
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator")

Spark and Large Off-heap
Objects
Spark has problems handling Java objects with
large off-heap components, in particular in
caching or persisting them.
When working with DL4J, this is a frequent case,
as DataSet and NDArray objects are involved.

Objects
Spark drops part of a RDD based on the estimated size of
that block. It estimates the size of a block depending on
the selected persistence level.
In case of MEMORY_ONLY or MEMORY_AND_DISK,
the estimate is done by walking the Java object graph.
This process doesn't take into account the off-heap
memory used by DL4J and ND4J, so Spark under-
estimates the true size of objects like DataSets or
NDArrays.
O
ut of M
em
ory
Exception!

Objects
It is then good practice using MEMORY_ONLY_SER or
MEMORY_AND_DISK_SER when persisting a
RDD<DataSet> or a RDD<INDArray>.
This way Spark stores blocks on the JVM heap in
serialized form. Because there is no off-heap memory for
the serialized objects, it can accurately estimate their
size, in so avoiding out of memory issues.

Configuring the Memory Limits
Java command line arguments available:
• -Xms
• -Xmx
• -Dorg.bytedeco.javacpp.maxbytes: to specify the off-
heap memory limit
• -Dorg.bytedeco.javacpp.maxphysicalbytes: (optional)
to specify the maximum bytes for the entire process

Caveat:
In limited memory environments it’s a bad idea to use high
-Xmx value together with the -Xms option. This way not
enough off-heap memory would be left.
Example:
• A system with 32 GB of RAM.
• -Xmx28G
• 4 GB of RAM left for the off-heap memory, the OS
and everything else running in the machine.

General best practice:
Typically in DL4J applications you need less RAM to be
used in the JVM heap and more too be used in off-heap,
since all NDArrays are stored there.
Allocating too much to the JVM heap, there will not be
enough memory left for the off-heap memory.

Python Models Import in DL4J
TensorFlow
DL4J Memory Management Applies Here

You can find
more details on
DL4J and Spark
in my Book
http://tinyurl.com/y9jkvtuy
38

Thank You!
Any Questions?
You can find me at
@guglielmoiozzia
https://ie.linkedin.com/in/giozzia
googlielmo.blogspot.com

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way?

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way?

Ähnlich wie Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way? (20)

Mehr von Databricks

Mehr von Databricks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it the Right Way?