SlideShare a Scribd company logo
1 of 38
Deep Learning with Apache
Flink and DeepLearning4J
Flink Forward 2016,
Berlin, Germany
Suneel Marthi
@suneelmarthi
About me
•Senior Principal Software Engineer, Office of Technology, Red Hat Inc.
•Member of the Apache Software Foundation
•PMC member on Apache Mahout, Apache Pirk, Apache Incubator
•PMC Chair, Apache Mahout (April 2015 - April 2016)
Outline
● What is Deep Learning?
● Overview of DeepLearning4J Ecosystem
● Deep Learning Workflows
● ETL & Vectorization with DataVec
● Apache Flink and DL4J
What is Deep Learning?
Handwriting Recognition
Face Recognition (Facebook)
Image Generation
Self-Driving Cars
DL has been very successful with Image Classification
Dogs v/s Cats
https://www.kaggle.com/c/dogs-vs-cats
● Deep Learning is a series of steps for automated feature extraction
o Based on techniques that have been around for several years
o Several techniques chained together to automate feature engineering
o “Deep” due to several interconnected layers of nodes stacked together
between the input and the output.
“Deep learning will make you acceptable to the learned; but it is only
an obliging and easy behaviour, and entertaining conversation, that
will make you agreeable to all companies”
- James Burgh
Popular Deep Neural Networks
● Deep Belief Networks
o Most popular architecture
● Convolutional Neural Networks
o Successful in image classification
● Recurrent Networks
o Time series Analysis
o Sequence Modelling
Deep Learning in Enterprise
● Ability to work with small and big data easily
o Don’t want to change tooling because we moved to Hadoop
● Ability to not get caught up in things like vectorization and ETL
o Need to focus on better models
o Understanding your data is very important
● Ability to experiment with lots of models
DeepLearning4J
● “The Hadoop of Deep Learning”
o Command line driven
o Java and Scala APIs
o ASF 2.0 Licensed
● Java implementation
o Parallelization
o GPU support
 Support for multi-GPU per host
● Runtime Neutral
o Local, Spark, Flink
o AWS
DL4J Suite of Tools
● DeepLearning4J
o Main library for deep learning
● DataVec
o Extract, Transform, Load (ETL) and Vectorization library
● ND4J
o Linear Algebra framework
o Swappable backends (JBLAS, GPUs)
o Think NumPy on the JVM
● Arbiter
o Model evaluation, Hyperparameter Search and testing platform
DL4J: DataVec for Data Ingest and Vectorization
● Uses an Input/Output format
● Supports all major types of Input data (Text, Images, Audio, Video,
SVMLight)
● Extensible for Specialized Input Formats
● Interfaces with Apache Kafka
DL4J: ND4J
● Scientific computing library on JVM (think NumPy on JVM)
● Supports N-dimensional vector computations
● Supports GPUs via CUDA and Native JBlas
Learning Progressive Layers
Deep Learning Workflows
● Data Ingestion and storage.
● Data cleansing and transformation.
● Split the dataset into Training, Validation and Test Data sets
- Apache Flink DataSet API for Data Ingestion and Transformation
Data Ingestion and Munging
DL Model Building
● Build Deep Learning Network and Train with Training Data
● Parameter Averaging
● Test and Validate the Model
● Repeat until satisfied
● Persist and Deploy the Model in Production
Prediction and Scoring
Deployed Model used to make predictions against Streaming data
-- Streaming Predictors using Apache Flink DataStream API
DL4J API Example
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(12345)
.iterations(1)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(0.05)
.l2(0.001)
.list(4)
.layer(0, new DenseLayer.Builder().nIn(784).nOut(250)
.weightInit(WeightInit.XAVIER)
.updater(Updater.ADAGRAD)
.activation("relu").build())
.layer(1, new DenseLayer.Builder().nIn(250).nOut(10)
.weightInit(WeightInit.XAVIER)
.updater(Updater.ADAGRAD)
.activation("relu").build())
.layer(2, new DenseLayer.Builder().nIn(10).nOut(250)
.weightInit(WeightInit.XAVIER)
.updater(Updater.ADAGRAD)
.activation("relu").build())
.layer(3, new OutputLayer.Builder().nIn(250).nOut(784)
.weightInit(WeightInit.XAVIER)
.updater(Updater.ADAGRAD)
.activation("relu").lossFunction(LossFunctions.LossFunction.MSE)
.build())
.pretrain(false).backprop(true)
.build();
Building Deep Learning Workflows
● Flexibility to build / apply the model
o Local
o AWS, Spark, Flink (WIP)
● Convert data from a raw format into a baseline raw vector
o Model the data
o Evaluate the Model
● Traditionally all of these are tied together in one tool
o But this is a monolithic pattern
Load Existing Models in DL4J
String jsonModelConfig = loadTextFileFromDisk( pathToModelJSON );
MultiLayerConfiguration configFromJson = MultiLayerConfiguration.fromJson( jsonModelConfig );
FSDataInputStream hdfsInputStream_ModelParams = hdfs.open(new Path(
hdfsPathToModelParams ));
try (DataInputStream dis = new DataInputStream( hdfsInputStream_ModelParams )) {
INDArray newParams = Nd4j.read( dis );
}
MultiLayerNetwork network = new MultiLayerNetwork( configFromJson );
network.init();
network.setParameters(newParams);
Vectorizing Data - Iris Data Set
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
vectorized to
0.0 1:0.1666666666666665 2:1.0 3:0.021276595744680823 4:0.0
0.0 1:0.08333333333333343 2:0.5833333333333334 3:0.021276595744680823 4:0.0
0.0 1:0.0 2:0.7500000000000002 3:0.0 4:0.0
1.0 1:0.9583333333333335 2:0.7500000000000002 3:0.723404255319149 4:0.5217391304347826
DataVec - Command Line Vectorization
● Library of tools to vectorize - Audio, Video, Image, Text, CSV, SVMLight
● Convert the input data into vectors in a standardized format (SVMLight,
Text, CSV etc)
o Adaptable with custom input/output formats
● Open Source, ASF 2.0 Licensed
o https://github.com/deeplearning4j/DataVec
o Part of DL4J suite
Workflow Configuration (iris_conf.txt)
canova.input.header.skip=false
canova.input.statistics.debug.print=false
canova.input.format=org.canova.api.formats.input.impl.LineInputFormat
canova.input.directory=src/test/resources/csv/data/uci_iris_sample.txt
canova.input.vector.schema=src/test/resources/csv/schemas/uci/iris.txt
canova.output.directory=/tmp/iris_unit_test_sample.txt
canova.output.format=org.canova.api.formats.output.impl.SVMLightOutputFormat
Iris Canova Vector Schema
@RELATION UCIIrisDataset
@DELIMITER ,
@ATTRIBUTE sepallength NUMERIC !NORMALIZE
@ATTRIBUTE sepalwidth NUMERIC !NORMALIZE
@ATTRIBUTE petallength NUMERIC !NORMALIZE
@ATTRIBUTE petalwidth NUMERIC !NORMALIZE
@ATTRIBUTE class STRING !LABEL
Model Iris using Canova Command Line
./bin/canova vectorize -conf /tmp/iris_conf.txt
Output vectors written to: /tmp/iris_svmlight.txt
./bin/dl4j train –conf /tmp/iris_conf.txt
[ …log output… ]
./bin/arbiter evaluate –conf /tmp/iris_conf.txt
[ …log output… ]
DL4J + Apache Flink
• Apache Flink support for Dl4J : DataVec (In progress)
• Streaming Predictors using Flink : Kafka (In progress)
• Possible
Present DL4J – Flink work in progress
• Support for DL4J : DataVec
• Streaming Predictions with Apache Flink
Future Work
• Flink support for DL4J: Arbiter for Hyperparameter Search
• Flink support for DeepLearning4J to be able to build MultiLayer DL
configurations.
https://github.com/deeplearning4j
Credits
Skymind.io Team
• Adam Gibson
• Chris V. Nicholson
• Josh Patterson
Questions ???

More Related Content

What's hot

Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFlink Forward
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFlink Forward
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaFlink Forward
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink StreamingTuri, Inc.
 
STORM as an ETL Engine to HADOOP
STORM as an ETL Engine to HADOOPSTORM as an ETL Engine to HADOOP
STORM as an ETL Engine to HADOOPDataWorks Summit
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseBig Data Spain
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Spark Summit
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 
Slim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkSlim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkFlink Forward
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...Flink Forward
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 

What's hot (20)

Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
 
STORM as an ETL Engine to HADOOP
STORM as an ETL Engine to HADOOPSTORM as an ETL Engine to HADOOP
STORM as an ETL Engine to HADOOP
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas Weise
 
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFiMachine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Slim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. SparkSlim Baltagi – Flink vs. Spark
Slim Baltagi – Flink vs. Spark
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache Spark
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 

Similar to Suneel Marthi - Deep Learning with Apache Flink and DL4J

Guglielmo iozzia - Google I/O extended dublin 2018
Guglielmo iozzia - Google  I/O extended dublin 2018Guglielmo iozzia - Google  I/O extended dublin 2018
Guglielmo iozzia - Google I/O extended dublin 2018Guglielmo Iozzia
 
Why scala for data science
Why scala for data scienceWhy scala for data science
Why scala for data scienceGuglielmo Iozzia
 
Building Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4JBuilding Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4JJosh Patterson
 
Distributed Deep Learning with Keras and TensorFlow on Apache Spark
Distributed Deep Learning with Keras and TensorFlow on Apache SparkDistributed Deep Learning with Keras and TensorFlow on Apache Spark
Distributed Deep Learning with Keras and TensorFlow on Apache SparkGuglielmo Iozzia
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecJosh Patterson
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jGuglielmo Iozzia
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Josh Patterson
 
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16Romeo Kienzler
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JJosh Patterson
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionSri Ambati
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersDataWorks Summit
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning ModelsJosh Patterson
 
Enhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through OntologyEnhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through OntologyChunhua Liao
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Deep Learning for Java Developer - Getting Started
Deep Learning for Java Developer - Getting StartedDeep Learning for Java Developer - Getting Started
Deep Learning for Java Developer - Getting StartedSuyash Joshi
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production Paolo Platter
 

Similar to Suneel Marthi - Deep Learning with Apache Flink and DL4J (20)

Guglielmo iozzia - Google I/O extended dublin 2018
Guglielmo iozzia - Google  I/O extended dublin 2018Guglielmo iozzia - Google  I/O extended dublin 2018
Guglielmo iozzia - Google I/O extended dublin 2018
 
Why scala for data science
Why scala for data scienceWhy scala for data science
Why scala for data science
 
Building Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4JBuilding Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4J
 
Distributed Deep Learning with Keras and TensorFlow on Apache Spark
Distributed Deep Learning with Keras and TensorFlow on Apache SparkDistributed Deep Learning with Keras and TensorFlow on Apache Spark
Distributed Deep Learning with Keras and TensorFlow on Apache Spark
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4j
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015
 
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
Enhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through OntologyEnhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through Ontology
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
eScience Cluster Arch. Overview
eScience Cluster Arch. OvervieweScience Cluster Arch. Overview
eScience Cluster Arch. Overview
 
Deep Learning for Java Developer - Getting Started
Deep Learning for Java Developer - Getting StartedDeep Learning for Java Developer - Getting Started
Deep Learning for Java Developer - Getting Started
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 

Recently uploaded (20)

modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 

Suneel Marthi - Deep Learning with Apache Flink and DL4J

  • 1. Deep Learning with Apache Flink and DeepLearning4J Flink Forward 2016, Berlin, Germany Suneel Marthi @suneelmarthi
  • 2. About me •Senior Principal Software Engineer, Office of Technology, Red Hat Inc. •Member of the Apache Software Foundation •PMC member on Apache Mahout, Apache Pirk, Apache Incubator •PMC Chair, Apache Mahout (April 2015 - April 2016)
  • 3. Outline ● What is Deep Learning? ● Overview of DeepLearning4J Ecosystem ● Deep Learning Workflows ● ETL & Vectorization with DataVec ● Apache Flink and DL4J
  • 4. What is Deep Learning?
  • 5. Handwriting Recognition Face Recognition (Facebook) Image Generation Self-Driving Cars
  • 6. DL has been very successful with Image Classification Dogs v/s Cats https://www.kaggle.com/c/dogs-vs-cats
  • 7.
  • 8.
  • 9. ● Deep Learning is a series of steps for automated feature extraction o Based on techniques that have been around for several years o Several techniques chained together to automate feature engineering o “Deep” due to several interconnected layers of nodes stacked together between the input and the output.
  • 10. “Deep learning will make you acceptable to the learned; but it is only an obliging and easy behaviour, and entertaining conversation, that will make you agreeable to all companies” - James Burgh
  • 11. Popular Deep Neural Networks ● Deep Belief Networks o Most popular architecture ● Convolutional Neural Networks o Successful in image classification ● Recurrent Networks o Time series Analysis o Sequence Modelling
  • 12. Deep Learning in Enterprise ● Ability to work with small and big data easily o Don’t want to change tooling because we moved to Hadoop ● Ability to not get caught up in things like vectorization and ETL o Need to focus on better models o Understanding your data is very important ● Ability to experiment with lots of models
  • 14. ● “The Hadoop of Deep Learning” o Command line driven o Java and Scala APIs o ASF 2.0 Licensed ● Java implementation o Parallelization o GPU support  Support for multi-GPU per host ● Runtime Neutral o Local, Spark, Flink o AWS
  • 15. DL4J Suite of Tools ● DeepLearning4J o Main library for deep learning ● DataVec o Extract, Transform, Load (ETL) and Vectorization library ● ND4J o Linear Algebra framework o Swappable backends (JBLAS, GPUs) o Think NumPy on the JVM ● Arbiter o Model evaluation, Hyperparameter Search and testing platform
  • 16.
  • 17. DL4J: DataVec for Data Ingest and Vectorization ● Uses an Input/Output format ● Supports all major types of Input data (Text, Images, Audio, Video, SVMLight) ● Extensible for Specialized Input Formats ● Interfaces with Apache Kafka
  • 18. DL4J: ND4J ● Scientific computing library on JVM (think NumPy on JVM) ● Supports N-dimensional vector computations ● Supports GPUs via CUDA and Native JBlas
  • 21. ● Data Ingestion and storage. ● Data cleansing and transformation. ● Split the dataset into Training, Validation and Test Data sets - Apache Flink DataSet API for Data Ingestion and Transformation Data Ingestion and Munging
  • 22. DL Model Building ● Build Deep Learning Network and Train with Training Data ● Parameter Averaging ● Test and Validate the Model ● Repeat until satisfied ● Persist and Deploy the Model in Production
  • 23. Prediction and Scoring Deployed Model used to make predictions against Streaming data -- Streaming Predictors using Apache Flink DataStream API
  • 24. DL4J API Example MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(12345) .iterations(1) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .learningRate(0.05) .l2(0.001) .list(4) .layer(0, new DenseLayer.Builder().nIn(784).nOut(250) .weightInit(WeightInit.XAVIER) .updater(Updater.ADAGRAD) .activation("relu").build()) .layer(1, new DenseLayer.Builder().nIn(250).nOut(10) .weightInit(WeightInit.XAVIER) .updater(Updater.ADAGRAD) .activation("relu").build()) .layer(2, new DenseLayer.Builder().nIn(10).nOut(250) .weightInit(WeightInit.XAVIER) .updater(Updater.ADAGRAD) .activation("relu").build()) .layer(3, new OutputLayer.Builder().nIn(250).nOut(784) .weightInit(WeightInit.XAVIER) .updater(Updater.ADAGRAD) .activation("relu").lossFunction(LossFunctions.LossFunction.MSE) .build()) .pretrain(false).backprop(true) .build();
  • 25. Building Deep Learning Workflows ● Flexibility to build / apply the model o Local o AWS, Spark, Flink (WIP) ● Convert data from a raw format into a baseline raw vector o Model the data o Evaluate the Model ● Traditionally all of these are tied together in one tool o But this is a monolithic pattern
  • 26. Load Existing Models in DL4J String jsonModelConfig = loadTextFileFromDisk( pathToModelJSON ); MultiLayerConfiguration configFromJson = MultiLayerConfiguration.fromJson( jsonModelConfig ); FSDataInputStream hdfsInputStream_ModelParams = hdfs.open(new Path( hdfsPathToModelParams )); try (DataInputStream dis = new DataInputStream( hdfsInputStream_ModelParams )) { INDArray newParams = Nd4j.read( dis ); } MultiLayerNetwork network = new MultiLayerNetwork( configFromJson ); network.init(); network.setParameters(newParams);
  • 27. Vectorizing Data - Iris Data Set 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 7.0,3.2,4.7,1.4,Iris-versicolor vectorized to 0.0 1:0.1666666666666665 2:1.0 3:0.021276595744680823 4:0.0 0.0 1:0.08333333333333343 2:0.5833333333333334 3:0.021276595744680823 4:0.0 0.0 1:0.0 2:0.7500000000000002 3:0.0 4:0.0 1.0 1:0.9583333333333335 2:0.7500000000000002 3:0.723404255319149 4:0.5217391304347826
  • 28. DataVec - Command Line Vectorization ● Library of tools to vectorize - Audio, Video, Image, Text, CSV, SVMLight ● Convert the input data into vectors in a standardized format (SVMLight, Text, CSV etc) o Adaptable with custom input/output formats ● Open Source, ASF 2.0 Licensed o https://github.com/deeplearning4j/DataVec o Part of DL4J suite
  • 30. Iris Canova Vector Schema @RELATION UCIIrisDataset @DELIMITER , @ATTRIBUTE sepallength NUMERIC !NORMALIZE @ATTRIBUTE sepalwidth NUMERIC !NORMALIZE @ATTRIBUTE petallength NUMERIC !NORMALIZE @ATTRIBUTE petalwidth NUMERIC !NORMALIZE @ATTRIBUTE class STRING !LABEL
  • 31. Model Iris using Canova Command Line ./bin/canova vectorize -conf /tmp/iris_conf.txt Output vectors written to: /tmp/iris_svmlight.txt ./bin/dl4j train –conf /tmp/iris_conf.txt [ …log output… ] ./bin/arbiter evaluate –conf /tmp/iris_conf.txt [ …log output… ]
  • 32. DL4J + Apache Flink
  • 33. • Apache Flink support for Dl4J : DataVec (In progress) • Streaming Predictors using Flink : Kafka (In progress) • Possible
  • 34. Present DL4J – Flink work in progress • Support for DL4J : DataVec • Streaming Predictions with Apache Flink Future Work • Flink support for DL4J: Arbiter for Hyperparameter Search • Flink support for DeepLearning4J to be able to build MultiLayer DL configurations.
  • 37. Skymind.io Team • Adam Gibson • Chris V. Nicholson • Josh Patterson