Student Profile Sample report on improving academic performance by uniting gr...
Smart Data Conference: DL4J and DataVec
1. skymind.io | deeplearning.org | gitter.im/deeplearning4j
DL4J and DataVec
Building Production Class Deep Learning Workflows for the Enterprise
Josh Patterson / Director Field Org
Smart Data 2017 / San Francisco, CA
2. Josh Patterson
Director Field Engineering / Skymind
Co-Author: O’Reilly’s “Deep Learning: A Practitioners Approach”
Past:
Self-Organizing Mesh Networks / Meta-Heuristics Research
Smartgrid work / TVA + NERC
Principal Field Architect / Cloudera
3. Topics
• Deep Learning in Production for the Enterprise
• DL4J and DataVec
• Example Workflow: Modeling Sensor Data with RNNs
5. In Practice Deep Learning Is…
• Matching Input Data Type to Specific Architecture (Image -> Convolutional
Network)
• Higher Parameter Counts and more Processing Power
• Moving from “Feature Engineering” to “Automated Feature Learning”
12. Automated Feature Learning
• Hand-coding features has long been standard operation in machine learning
• Deep Learning got smart about matching architectures to data types
• Going forward, hand-coded features will be considered the “technical debt of
machine learning”
13. Quick Usage Guide
• If I have Timeseries or Audio Input: Use a Recurrent Neural Network
• If I have Image input: Use a Convolutional Neural Network
• If I have Video input: Use a hybrid Convolutional + Recurrent Architecture!
• Applications in NLP: Word2Vec + variants
14. The Challenge of the Fortune 500
Take business problem and translate it into a product-izable solution
• Get data together
• Understand modeling, pull together expertise
Get the right data workflow / infra architecture to production-ize application
• Security
• Integration
15. “Google is living a few years in the future and
sending the rest of us messages”
-- Doug Cutting in 2013
However
Most organizations are not built like Google
(and Jeff Dean does not work at your company…)
Anyone building Next-Gen infrastructure has to consider these things
16. Production Considerations
• Security – even though I can build a model, will IT let me
run it?
• Data Warehouse Integration – can I easily run this In the
existing IT footprint?
• Speedup – once I need to go faster, how hard is it to speed
up modeling?
18. DL4J and DataVec
• DL4J – ASF 2.0 Licensed JVM Platform for Enterprise Deep Learning
• DataVec - a tool for machine learning ETL (Extract, Transform, Load)
operations.
• Both run natively on Spark on CPU or GPU as Backends
• DL4J Suite certified on CDH5, HDP2.4, and upcoming IBM IOP platform.
19. ND4J: The Need for Speed
JavaCPP
• Auto generate JNI Bindings for C++
• Allows for easy maintenance and deployment of C++ binaries in Java
CPU Backends
• OpenMP (multithreading within native operations)
• OpenBLAS or MKL (BLAS operations)
• SIMD-extensions
GPU Backends
• DL4J supports Cuda 7.5 (+cuBLAS) at the moment, and will support 8.0 support as soon as it comes out.
• Leverages cuDNN as well
https://github.com/deeplearning4j/dl4j-benchmark
20. Prepping Data is Time Consuming
http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-
says/#633ea7f67f75
23. Model Import
• Import models from: Keras
• Keras imports data from: TensorFlow, Caffe, etc
• Example: Import VGGNet16
• Allows integration engineers to work with pre-built models
24. Coming Soon: DL4J as Keras Backend
• Allows Data Scientist to run python Keras commands and then execute on
DL4J
• Sets up ability to run Keras jobs on Spark + Hadoop, securely
• Gives Python Data Scientists a better path to production class environment in
the Enterprise
26. NERC Sensor Data Collection
openPDC PMU Data Collection circa 2009
• 120 Sensors
• 30 samples/second
• 4.3B Samples/day
• Housed in Hadoop
27. Classifying UCI Sensor Data: Trends
A – Downward Trend
B – Cyclic
C – Normal
D – Upward Shift
E – Upward Trend
F – Downward Shift
28. Loading and Transforming Timeseries Data with DataVec
SequenceRecordReader trainFeatures = new CSVSequenceRecordReader();
trainFeatures.initialize(new NumberedFileInputSplit(featuresDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));
SequenceRecordReader trainLabels = new CSVSequenceRecordReader();
trainLabels.initialize(new NumberedFileInputSplit(labelsDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));
int minibatch = 10;
int numLabelClasses = 6;
DataSetIterator trainData = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, minibatch,
numLabelClasses, false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
//Normalize the training data
DataNormalization normalizer = new NormalizerStandardize();
normalizer.fit(trainData); //Collect training data statistics
trainData.reset();
trainData.setPreProcessor(normalizer); //Use previously collected statistics to normalize on-the-fly
29. Configuring a Recurrent Neural Network with DL4J
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
.updater(Updater.NESTEROVS).momentum(0.9).learningRate(0.005)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(0.5)
.list()
.layer(0, new GravesLSTM.Builder().activation("tanh").nIn(1).nOut(10).build())
.layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
.activation("softmax").nIn(10).nOut(numLabelClasses).build())
.pretrain(false).backprop(true).build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
30. Train the Network on Local Machine
int nEpochs = 40;
String str = "Test set evaluation at epoch %d: Accuracy = %.2f, F1 = %.2f";
for (int i = 0; i < nEpochs; i++) {
net.fit(trainData);
//Evaluate on the test set:
Evaluation evaluation = net.evaluate(testData);
System.out.println(String.format(str, i, evaluation.accuracy(), evaluation.f1()));
testData.reset();
trainData.reset();
}
31. Train the Network on Spark
TrainingMaster tm = new ParameterAveragingTrainingMaster(true,executors_count,1,batchSizePerWorker,1,0);
//Create Spark multi layer network from configuration
SparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, net, tm);
int nEpochs = 40;
String str = "Test set evaluation at epoch %d: Accuracy = %.2f, F1 = %.2f";
for (int i = 0; i < nEpochs; i++) {
sparkNetwork.fit(trainDataRDD);
//Evaluate on the test set:
Evaluation evaluation = net.evaluate(testData);
System.out.println(String.format(str, i, evaluation.accuracy(), evaluation.f1()));
testData.reset();
trainData.reset();
}
35. Setting Up LSTM Architecture
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(rngSeed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).learningRate(0.1)
.iterations(1)
.updater(Updater.RMSPROP).rmsDecay(0.95)
.regularization(true).l2(0.001)
.weightInit(WeightInit.XAVIER)
.list()
.layer(0, new GravesLSTM.Builder().nIn(nIn).nOut(lstmLayerSize).activation("tanh").build())
.layer(1, new GravesLSTM.Builder().nOut(lstmLayerSize).activation("tanh").build())
.layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT)
.activation("softmax").nOut(nOut).build())
.backpropType(BackpropType.TruncatedBPTT)
.tBPTTForwardLength(tbpttLength)
.tBPTTBackwardLength(tbpttLength)
.pretrain(false)
.backprop(true)
.build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
Optimization: SGD with RMSProp
(NOTE: can be set on per layer basis)
Weight initialization and regularization: L2 weight decay
(again, can be set per layer)
Hidden layers: 2 x Graves-style LSTM layers
Output layer: plain dense layer with softmax activation
Loss function: cross entropy (KL divergence between
character distributions: neural net vs. empirical)
RNN-specific config for truncated
backprop-through-time
36. Training Our LSTM
for(int epoch = 0; i < numEpochs; i++) {
net.fit(trainData);
/* Save model, print logging messages, etc. */
/* Compute held-out data performance. */
double cost = 0;
double count = 0;
while(heldoutData.hasNext()) {
DataSet minibatch = heldoutData.next();
cost += net.scoreExamples(heldoutData, false).sumNumber().doubleValue();
count += minibatch.getLabelsMaskArray().sumNumber().doubleValue();
}
log.info(String.format("Epoch %4d test set average cost: %.4f", i, cost / count));
/* Rest dataset iterators. */
trainData.reset()
heldoutData.reset()
}
Compute performance on held-out data.
Training. fit can be applied to DataSetIterator,
DataSet, INDArray, etc.
37. Generating Beer Reviews from the LSTM Model
INDArray input = Nd4j.zeros(new int[]{iter.inputColumns()});
/* Load static data into vector. */
StringBuilder sb = new StringBuilder();
int prevCharIdx = 0;
int currCharIdx = 0;
while (true) {
input.putScalar(prevCharIdx, 0);
input.putScalar(currCharIdx, 1);
INDArray output = net.rnnTimeStep(input);
double[] outputProbDistribution = new double[numCharacters];
for (int j = 0; j < outputProbDistribution.length; j++)
outputProbDistribution[j] = output.getDouble(s, j);
prevCharIdx = currCharIdx;
currCharIdx = sampleFromDistribution(outputProbDistribution, rng);
sb.append(convertIndexToCharacter(currCharIdx));
if (currCharIdx == STOPWORD) break;
}
String reviewSample = sb.toString();
Load input vector for single step.
Get probability distribution over next
character by running RNN for one step.
Sample character from
probability distribution.
Stop if we generate STOPWORD.
Talk through Tellus story around can’t get product out there
Just like Ricky Bobby, we like to go fast at Skymind
No alignment attempted per timestep across records, just indexing each recorded timestep (simpler way to find long term dependencies)
Alternative was: (60sec) x (60min) x (48h) == 172,800 timesteps (not easy to model)
OpenPDC setup
Blackouts
NERC wants visibility
SCADA lmited
Lots of data coming in
Hey you, fix this!