SlideShare a Scribd company logo
1 of 43
Spark Meet-up 23rd Jan 2015,
Dr. Vijay Srinivas Agneeswaran,
Director, Big-data Labs,
Copyright @Impetus Technologies, 2015
1. Distributed Deep Learning over Spark
• Dr. Vijay Srinivas Agneeswaran and team
2. Research Track - "Outlier Detection and KNN-Join Algorithms
over Spark"
• Ashutosh Trivedi and Kaushik Ranjan.
3. "Autoscaling in Spark"
• Rajat Gupta and team, Qubole.
Lightening Talks (Production use cases of Spark)
• ???
Distributed Deep Learning Over
Dr. Vijay Srinivas Agneeswaran et. al
Director, Big-data Labs,
Spark Meet-up
23rd Jan 2015, Bangalore.
Different Shallow Architectures
Fixed Basis
Trainable Basis
Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L.
Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007.
Copyright @Impetus Technologies, 2015
Linear predictor ANN, Radial Basis FunctionsKernel Machines
Copyright @Impetus Technologies, 2015
DLN for Face Recognition
Copyright @Impetus Technologies, 2015
DLN for Face Recognition
Copyright @Impetus Technologies, 2015
Deep Learning Networks: Learning
No general
algorithm (No-
theorem by
Wolpert 1996).
for specific
tasks –
of BP –
for non-
deep belief
networks as
stack of
learning for
• This is a deep neural network
composed of multiple layers of
latent variables (hidden units or
feature detectors)
• Can be viewed as a stack of
• Hinton along with his student
proposed that these networks
can be trained greedily one
layer at a time
Deep Belief Networks
Copyright @Impetus Technologies, 2015
• Boltzmann Machine is a
specific energy model with
linear energy function.
Copyright @Impetus Technologies, 2015
• Aim of auto encoders network is to
learn a compressed representation for
set of data
• Is an unsupervised learning algorithm
that applies back propagation, setting
the target values equal to inputs
(identity function)
• Denoising auto encoder addresses
identity function by randomly corrupting
input that the auto encoder must then
reconstruct or denoise
• Best applied when there is structure in
the data
• Applications : Dimensionality reduction,
feature selection
Other DL Networks: Auto Encoders (Auto-
associators or Diabolo Network)
Why Deep Learning Networks are Brain-like?
approach of
traditional ML –
SVMs or kernel
• Not applicable in
deep learning
brain –
Traditional ML – lot of
data munging,
issues (feature
abstractor), before
classifier can kick in.
Deep learning –
allows the
system to learn
as well
Copyright @Impetus Technologies, 2015
Copyright @Impetus Technologies,
Success stories of DLNs
Android voice
recognition system –
based on DLNs
Improves accuracy by
25% compared to state-
Microsoft Skype Translate software
and Digital assistant Cortana
1.2 million images, 1000
classes (ImageNet Data)
– error rate of 15.3%,
better than state of art at
Copyright @Impetus Technologies, 2015
Success stories of DLNs…..
Senna system – PoS tagging, chunking, NER,
semantic role labeling, syntactic parsing
Comparable F1 score with state-of-art with huge speed
advantage (5 days VS few hours).
DLNs VS TF-IDF: 1 million
documents, relevance search.
3.2ms VS 1.2s.
Robot navigation
Potential Applications of DLNs
Copyright @Impetus Technologies, 2015
Speech recognition/enhancement
Video sequencing
Emotion recognition (video/audio),
Malware detection,
Robotics – navigation.
multi-modal learning (text and image).
Natural Language Processing
Copyright @Impetus Technologies, 2014
Challenges in Realizing DLNs
Large no. of training examples – high accuracy.
• Large no. of parameters can also improve accuracy.
Inherently sequential nature – freeze up one
layer for learning.
GPUs to improve training speedup
• Limitations – CPU_to_GPU data transfers.
Distributed DLNs – Jeffrey Dean’s work.
• Motivation
• Scalable, low latency training
• Parallelize training data and learn fast
• Jeffrey Dean’s work DistBelief
• Pseudo-centralized realization
Distributed DLNs
Copyright @Impetus Technologies, 2014
What is Spark?
Spark provides a
abstraction that
generalizes Map-
More powerful set
of operations than
just map and
reduce – group by,
order by, sort,
reduce by key,
sample, union, etc.
Provides efficient
based on
distributed shared
memory – keep
working set of data
in memory.
Shark provides
Hive Query
Language (HQL)
interface over
What is Spark? Data Flow in Hadoop
What is Spark? Data Flow in Spark
Real world use-case example: HITS algorithm
The Hub score and Authority score for a node is calculated with the following algorithm:
 Start with each node having a hub score and authority score of 1 i.e. auth(p) = 1 and
hub(p) = 1
 Run the Authority Update Rule: Update each node's Authority score to be equal
to the sum of the Hub Scores of each node that points to it. That is, a node is given
a high authority score by being linked to by pages that are recognized as Hubs for
 Run the Hub Update Rule: Update each node's Hub Score to be equal to the sum
of the Authority Scores of each node that it points to. That is, a node is given a high
hub score by linking to nodes that are considered to be authorities on the subject.
 Normalize the values by dividing each Hub score by square root of the sum of the
squares of all Hub scores, and dividing each Authority score by square root of the
sum of the squares of all Authority scores.
 Repeat from the second step as necessary.
Solve HITS algorithm using Hadoop MR
Step 1 : auth(p) = 1 and
hub(p) = 1
Step 2 : Run Authority Update
Rule auth(p) = X
Step 3 : Run Hub Update Rule
hub(p) = Y
Step 4 : Normalize hub(p) and
Solve HITS algorithm using Spark
Step 1 : auth(p) = 1 and
hub(p) = 1
Step 2 : Run Authority Update
Rule auth(p) = X
Step 3 : Run Hub Update Rule
hub(p) = Y
Step 4 : Normalize hub(p) and
[MZ12] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael
J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-
memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and
Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 2-2.
Transformations/Actions Description
Map(function f1) Pass each element of the RDD through f1 in parallel and return the resulting RDD.
Filter(function f2) Select elements of RDD that return true when passed through f2.
flatMap(function f3) Similar to Map, but f3 returns a sequence to facilitate mapping single input to multiple
Union(RDD r1) Returns result of union of the RDD r1 with the self.
Sample(flag, p, seed) Returns a randomly sampled (with seed) p percentage of the RDD.
groupByKey(noTasks) Can only be invoked on key-value paired data – returns data grouped by value. No. of
parallel tasks is given as an argument (default is 8).
reduceByKey(function f4,
Aggregates result of applying f4 on elements with same key. No. of parallel tasks is the
second argument.
Join(RDD r2, noTasks) Joins RDD r2 with self – computes all possible pairs for given key.
groupWith(RDD r3,
Joins RDD r3 with self and groups by key.
sortByKey(flag) Sorts the self RDD in ascending or descending based on flag.
Reduce(function f5) Aggregates result of applying function f5 on all elements of self RDD
Collect() Return all elements of the RDD as an array.
Count() Count no. of elements in RDD
take(n) Get first n elements of RDD.
First() Equivalent to take(1)
saveAsTextFile(path) Persists RDD in a file in HDFS or other Hadoop supported file system at given path.
Persist RDD as a Hadoop sequence file. Can be invoked only on key-value paired RDDs
that implement Hadoop writable interface or equivalent.
foreach(function f6) Run f6 in parallel on elements of self RDD.
Berkeley Big-data Analytics Stack (BDAS)
Spark: Use Cases
Uses Cassandra for
video data
aggregates VS on-
the-fly queries.
Moved to Spark for
ML and computing
Moved to Shark for on-the-fly
queries – C* OLAP aggregate
queries on Cassandra 130 secs, 60
ms in Spark
Uses Hive for
repeatedly running
ad-hoc queries on
video data.
Optimized ad-hoc
queries using Spark
RDDs – found Spark
is 30 times faster
than Hive
ML for connection
analysis and video
targeting: 30K nodes
on Hadoop Yarn
Hadoop – batch processing
Spark – iterative processing
Storm – on-the-fly processing
recommendation –
Spark Use Cases: Spark is good for linear algebra, optimization and
N-body problems.Computations/Operations
Giant 1 (simple stats) is perfect
for Hadoop 1.0.
Giants 2 (linear algebra), 3 (N-
body), 4 (optimization) Spark
from UC Berkeley is efficient.
Logistic regression, kernel SVMs,
conjugate gradient descent,
collaborative filtering, Gibbs
sampling, alternating least squares.
Example is social group-first
approach for consumer churn
analysis [2]
Interactive/On-the-fly data
processing – Storm.
OLAP – data cube operations.
Data sets – not embarrassingly
Deep Learning
Artificial Neural Networks/Deep
Belief Networks
Machine vision from Google [3]
Speech analysis from Microsoft
Giant 5 – Graph processing –
GraphLab, Pregel, Giraph
[1] National Research Council. Frontiers in Massive Data Analysis . Washington, DC: The National Academies Press, 2013.
[2] Richter, Yossi ; Yom-Tov, Elad ; Slonim, Noam: Predicting Customer Churn in Mobile Networks through Analysis of Social
Groups. In: Proceedings of SIAM International Conference on Data Mining, 2010, S. 732-741
[3] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio
Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, Andrew Y. Ng: Large Scale Distributed Deep Networks. NIPS 2012:
Some Spark(ling) examples
Scala code (serial)
var count = 0
for (i <- 1 to 100000)
{ val x = Math.random * 2 - 1
val y = Math.random * 2 - 1
if (x*x + y*y < 1) count += 1 }
println("Pi is roughly " + 4 * count / 100000.0)
Sample random point on unit circle – count how many are inside them (roughly about PI/4).
Hence, u get approximate value for PI.
Based on the PS/PC = AS/AC=4/PI, so PI = 4 * (PC/PS).
Some Spark(ling) examples
Spark code (parallel)
val spark = new SparkContext(<Mesos master>)
var count = spark.accumulator(0)
for (i <- spark.parallelize(1 to 100000, 12))
{ val x = Math.random * 2 – 1
val y = Math.random * 2 - 1
if (x*x + y*y < 1) count += 1 }
println("Pi is roughly " + 4 * count / 100000.0)
Notable points:
1. Spark context created – talks to Mesos1 master.
2. Count becomes shared variable – accumulator.
3. For loop is an RDD – breaks scala range object (1 to 100000) into 12 slices.
4. Parallelize method invokes foreach method of RDD.
1 Mesos is an Apache incubated clustering system –
Logistic Regression in Spark: Serial Code
// Read data file and convert it into Point objects
val lines ="data.txt").getLines()
val points = => parsePoint(x))
// Run logistic regression
var w = Vector.random(D)
for (i <- 1 to ITERATIONS) {
val gradient = Vector.zeros(D)
for (p <- points) {
val scale = (1/(1+Math.exp(-p.y*(w dot p.x)))-1)*p.y
gradient += scale * p.x
w -= gradient
println("Result: " + w)
Logistic Regression in Spark
// Read data file and transform it into Point objects
val spark = new SparkContext(<Mesos master>)
val lines = spark.hdfsTextFile("hdfs://.../data.txt")
val points = => parsePoint(x)).cache()
// Run logistic regression
var w = Vector.random(D)
for (i <- 1 to ITERATIONS) {
val gradient = spark.accumulator(Vector.zeros(D))
for (p <- points) {
val scale = (1/(1+Math.exp(-p.y*(w dot p.x)))-1)*p.y
gradient += scale * p.x
w -= gradient.value
println("Result: " + w)
Deep Learning on
Spark Fully Distributed Deep
learning network
implementation on
Spark would handle
the parallelism,
distribution, and fail
The input data set in
HDFS, intermediate
data in local file
message passing
framework built on top
of Apache Spark using
Akka Framework.
Thank You!
Copyright @Impetus Technologies, 2015
Backup Slides
Copyright @Impetus
Technologies, 2015
Copyright @Impetus Technologies,
• RBM are Energy Based Models (EBM)
• EBM associate an energy with every configuration of a
• Learning corresponds to modifying the shape of energy
function, so that it has desirable properties
• Like in physics, lower energy = more stability
• So, modify shape of energy function such that the
desirable configurations have lower energy
Energy Based Models
Other DL networks:
Convolutional Networks
Copyright @Impetus Technologies, 2015
Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with Gradient-Based
Learning. In Shape, Contour and Grouping in Computer Vision, David A. Forsyth, Joseph L. Mundy, Vito Di
Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK, UK, 319-.
• Recurrent Neural networks
• Long Short Term Memory (LSTM), Temporal
• Sum-product networks
• Deep architectures of sum-product networks
• Hierarchical temporal memory
• online structural and algorithmic model of
Other Brain-like Approaches
Copyright @Impetus Technologies, 2015
• Connections between units form a Directed
cycle i.e. a typical feed back connections
• RNNs can use their internal memory to process
arbitrary sequences of inputs
• RNNs cannot learn to look far back past
• LSTM solve this problem by introducing stem
• These stem cells can remember a value for an
arbitrary amount of time
Recurrent Neural Networks
Copyright @Impetus Technologies, 2015
• SPN is deep network model and is a directed
acyclic graph
• These networks allow to compute the
probability of an event quickly
• SPNs try to convert multi linear functions to
ones in computationally short forms i.e. it must
consist of multiple additions and multiplications
• Leaves correspond to variables and nodes
correspond to sums and products
Sum-Product Networks (SPN)
Copyright @Impetus Technologies, 2015
• Is a online machine learning model developed by
Jeff Hawkins
• This model learns one instance at a time
• Best explained by online stock model. Today’s
situation of stock helps in prediction of tomorrow’s
• A HTM network is tree shaped hierarchy of levels
• Higher hierarchy levels can use patterns learned at
lower levels. This is adopted from learning model
adopted by brain in the form of neo cortex
Hierarchical Temporal Memory
Copyright @Impetus Technologies, 2015
Copyright @Impetus Technologies, 2015
Mathematical Equations
• The Energy Function is defined as follows:
b’ and c’ are the biases
𝐸 𝑥, ℎ = −𝑏′ 𝑥 − 𝑐′ℎ − ℎ′ 𝑊𝑥
where, W represents the
weights connecting
visible layer and hidden
Copyright @Impetus Technologies, 2015
Learning Energy Based Models
• Energy based models can be learnt by performing gradient
descent on negative log-likelihood of training data
• It has the following form:
𝜕 log 𝑝 𝑥
𝜕 𝐹 𝑥
𝑝 𝑥
𝜕 𝐹 𝑥
Positive phaseNegative phase
Copyright @Impetus Technologies, 2015
• ANN to Distributed Deep Learning
• Key ideas in deep learning
• Need for distributed realizations.
• DistBelief, deeplearning4j etc.
• Our work on large scale distributed deep learning
• Deep learning leads us from statistics based
machine learning towards brain inspired AI.
Copyright @Impetus Technologies, 2015

More Related Content

What's hot

Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLMLconf
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Spark Summit
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Databricks
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Spark Summit
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéJen Aman
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark Summit
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Databricks
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersSaliya Ekanayake
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Databricks
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabVijay Srinivas Agneeswaran, Ph.D

What's hot (20)

Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
Extending Spark's Ingestion: Build Your Own Java Data Source with Jean George...
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLabBeyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab
Beyond Hadoop 1.0: A Holistic View of Hadoop YARN, Spark and GraphLab

Similar to Distributed Deep Learning + others for Spark Meetup

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyRohit Kulkarni
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskSaurabh Saxena
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuDatabricks
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Tomasz Sikora
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladevPavel Tsukanov
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationTravis Oliphant
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingAnimesh Chaturvedi
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf

Similar to Distributed Deep Learning + others for Spark Meetup (20)

Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
My Master's Thesis
My Master's ThesisMy Master's Thesis
My Master's Thesis
Data Science
Data ScienceData Science
Data Science
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladev
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computing
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...

More from Vijay Srinivas Agneeswaran, Ph.D

More from Vijay Srinivas Agneeswaran, Ph.D (7)

Dl surface statistical_regularities_vs_high_level_concepts_draft_v0.1
Dl surface statistical_regularities_vs_high_level_concepts_draft_v0.1Dl surface statistical_regularities_vs_high_level_concepts_draft_v0.1
Dl surface statistical_regularities_vs_high_level_concepts_draft_v0.1
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013
Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013

Recently uploaded

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx

Recently uploaded (20)

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx

Distributed Deep Learning + others for Spark Meetup

  • 1. Spark Meet-up 23rd Jan 2015, Bangalore Dr. Vijay Srinivas Agneeswaran, Director, Big-data Labs, Impetus
  • 2. Copyright @Impetus Technologies, 2015 Agenda 1. Distributed Deep Learning over Spark • Dr. Vijay Srinivas Agneeswaran and team 2. Research Track - "Outlier Detection and KNN-Join Algorithms over Spark" • Ashutosh Trivedi and Kaushik Ranjan. 3. "Autoscaling in Spark" • Rajat Gupta and team, Qubole. Lightening Talks (Production use cases of Spark) • ???
  • 3. Distributed Deep Learning Over Spark Dr. Vijay Srinivas Agneeswaran et. al Director, Big-data Labs, Impetus Spark Meet-up 23rd Jan 2015, Bangalore.
  • 4. Different Shallow Architectures Weighted Sum Weighted Sum Weighted Sum Template matchers Fixed Basis Functions Simple Trainable Basis Functions Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007. Copyright @Impetus Technologies, 2015 Linear predictor ANN, Radial Basis FunctionsKernel Machines
  • 5. Copyright @Impetus Technologies, 2015 DLN for Face Recognition
  • 6. Copyright @Impetus Technologies, 2015 DLN for Face Recognition
  • 7. Copyright @Impetus Technologies, 2015 Deep Learning Networks: Learning No general learning algorithm (No- free-lunch theorem by Wolpert 1996). Learning algorithm for specific tasks – perception, control, prediction, planning, reasoning, language understand ing. Limitations of BP – local minima, optimization challenges for non- convex objective functions. Hinton’s deep belief networks as stack of RBMs. Lecun’s energy based learning for DBNs.
  • 8. • This is a deep neural network composed of multiple layers of latent variables (hidden units or feature detectors) • Can be viewed as a stack of RBMs • Hinton along with his student proposed that these networks can be trained greedily one layer at a time Deep Belief Networks Copyright @Impetus Technologies, 2015 • Boltzmann Machine is a specific energy model with linear energy function.
  • 9. Copyright @Impetus Technologies, 2015 • Aim of auto encoders network is to learn a compressed representation for set of data • Is an unsupervised learning algorithm that applies back propagation, setting the target values equal to inputs (identity function) • Denoising auto encoder addresses identity function by randomly corrupting input that the auto encoder must then reconstruct or denoise • Best applied when there is structure in the data • Applications : Dimensionality reduction, feature selection Other DL Networks: Auto Encoders (Auto- associators or Diabolo Network)
  • 10. Why Deep Learning Networks are Brain-like? Statistical approach of traditional ML – SVMs or kernel approaches. • Not applicable in deep learning networks. Human brain – trophic factors Traditional ML – lot of data munging, representational issues (feature abstractor), before classifier can kick in. Deep learning – allows the system to learn representations as well naturally. Copyright @Impetus Technologies, 2015
  • 11. Copyright @Impetus Technologies, 2014 Success stories of DLNs Android voice recognition system – based on DLNs Improves accuracy by 25% compared to state- of-art Microsoft Skype Translate software and Digital assistant Cortana 1.2 million images, 1000 classes (ImageNet Data) – error rate of 15.3%, better than state of art at 26.1%
  • 12. Copyright @Impetus Technologies, 2015 Success stories of DLNs….. Senna system – PoS tagging, chunking, NER, semantic role labeling, syntactic parsing Comparable F1 score with state-of-art with huge speed advantage (5 days VS few hours). DLNs VS TF-IDF: 1 million documents, relevance search. 3.2ms VS 1.2s. Robot navigation
  • 13. Potential Applications of DLNs Copyright @Impetus Technologies, 2015 Speech recognition/enhancement Video sequencing Emotion recognition (video/audio), Malware detection, Robotics – navigation. multi-modal learning (text and image). Natural Language Processing
  • 14. Copyright @Impetus Technologies, 2014 Challenges in Realizing DLNs Large no. of training examples – high accuracy. • Large no. of parameters can also improve accuracy. Inherently sequential nature – freeze up one layer for learning. GPUs to improve training speedup • Limitations – CPU_to_GPU data transfers. Distributed DLNs – Jeffrey Dean’s work.
  • 15. • Motivation • Scalable, low latency training • Parallelize training data and learn fast • Jeffrey Dean’s work DistBelief • Pseudo-centralized realization Distributed DLNs Copyright @Impetus Technologies, 2014
  • 16. What is Spark? 16 Spark provides a computing abstraction that generalizes Map- Reduce. More powerful set of operations than just map and reduce – group by, order by, sort, reduce by key, sample, union, etc. Provides efficient execution environment based on distributed shared memory – keep working set of data in memory. Shark provides Hive Query Language (HQL) interface over Spark
  • 17. 17 What is Spark? Data Flow in Hadoop
  • 18. 18 What is Spark? Data Flow in Spark
  • 19. Real world use-case example: HITS algorithm The Hub score and Authority score for a node is calculated with the following algorithm:  Start with each node having a hub score and authority score of 1 i.e. auth(p) = 1 and hub(p) = 1  Run the Authority Update Rule: Update each node's Authority score to be equal to the sum of the Hub Scores of each node that points to it. That is, a node is given a high authority score by being linked to by pages that are recognized as Hubs for information.  Run the Hub Update Rule: Update each node's Hub Score to be equal to the sum of the Authority Scores of each node that it points to. That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject.  Normalize the values by dividing each Hub score by square root of the sum of the squares of all Hub scores, and dividing each Authority score by square root of the sum of the squares of all Authority scores.  Repeat from the second step as necessary. 19
  • 20. Solve HITS algorithm using Hadoop MR HDFS Storag e Step 1 : auth(p) = 1 and hub(p) = 1 Step 2 : Run Authority Update Rule auth(p) = X Step 3 : Run Hub Update Rule hub(p) = Y Step 4 : Normalize hub(p) and auth(p)Write Read Flow20
  • 21. Solve HITS algorithm using Spark HDFS Storag e Step 1 : auth(p) = 1 and hub(p) = 1 Step 2 : Run Authority Update Rule auth(p) = X Step 3 : Run Hub Update Rule hub(p) = Y Step 4 : Normalize hub(p) and auth(p) Write Read Flow21
  • 22. Spark [MZ12] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in- memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 2-2. Transformations/Actions Description Map(function f1) Pass each element of the RDD through f1 in parallel and return the resulting RDD. Filter(function f2) Select elements of RDD that return true when passed through f2. flatMap(function f3) Similar to Map, but f3 returns a sequence to facilitate mapping single input to multiple outputs. Union(RDD r1) Returns result of union of the RDD r1 with the self. Sample(flag, p, seed) Returns a randomly sampled (with seed) p percentage of the RDD. groupByKey(noTasks) Can only be invoked on key-value paired data – returns data grouped by value. No. of parallel tasks is given as an argument (default is 8). reduceByKey(function f4, noTasks) Aggregates result of applying f4 on elements with same key. No. of parallel tasks is the second argument. Join(RDD r2, noTasks) Joins RDD r2 with self – computes all possible pairs for given key. groupWith(RDD r3, noTasks) Joins RDD r3 with self and groups by key. sortByKey(flag) Sorts the self RDD in ascending or descending based on flag. Reduce(function f5) Aggregates result of applying function f5 on all elements of self RDD Collect() Return all elements of the RDD as an array. Count() Count no. of elements in RDD take(n) Get first n elements of RDD. First() Equivalent to take(1) saveAsTextFile(path) Persists RDD in a file in HDFS or other Hadoop supported file system at given path. saveAsSequenceFile(path ) Persist RDD as a Hadoop sequence file. Can be invoked only on key-value paired RDDs that implement Hadoop writable interface or equivalent. foreach(function f6) Run f6 in parallel on elements of self RDD.
  • 24. Spark: Use Cases 24 Ooyala Uses Cassandra for video data personalization. Pre-compute aggregates VS on- the-fly queries. Moved to Spark for ML and computing views. Moved to Shark for on-the-fly queries – C* OLAP aggregate queries on Cassandra 130 secs, 60 ms in Spark Conviva Uses Hive for repeatedly running ad-hoc queries on video data. Optimized ad-hoc queries using Spark RDDs – found Spark is 30 times faster than Hive ML for connection analysis and video streaming optimization. Yahoo Advertisement targeting: 30K nodes on Hadoop Yarn Hadoop – batch processing Spark – iterative processing Storm – on-the-fly processing Content recommendation – collaborative filtering
  • 25. 25 Spark Use Cases: Spark is good for linear algebra, optimization and N-body problems.Computations/Operations Giant 1 (simple stats) is perfect for Hadoop 1.0. Giants 2 (linear algebra), 3 (N- body), 4 (optimization) Spark from UC Berkeley is efficient. Logistic regression, kernel SVMs, conjugate gradient descent, collaborative filtering, Gibbs sampling, alternating least squares. Example is social group-first approach for consumer churn analysis [2] Interactive/On-the-fly data processing – Storm. OLAP – data cube operations. Dremel/Drill Data sets – not embarrassingly parallel? Deep Learning Artificial Neural Networks/Deep Belief Networks Machine vision from Google [3] Speech analysis from Microsoft Giant 5 – Graph processing – GraphLab, Pregel, Giraph [1] National Research Council. Frontiers in Massive Data Analysis . Washington, DC: The National Academies Press, 2013. [2] Richter, Yossi ; Yom-Tov, Elad ; Slonim, Noam: Predicting Customer Churn in Mobile Networks through Analysis of Social Groups. In: Proceedings of SIAM International Conference on Data Mining, 2010, S. 732-741 [3] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, Andrew Y. Ng: Large Scale Distributed Deep Networks. NIPS 2012:
  • 26. Some Spark(ling) examples Scala code (serial) var count = 0 for (i <- 1 to 100000) { val x = Math.random * 2 - 1 val y = Math.random * 2 - 1 if (x*x + y*y < 1) count += 1 } println("Pi is roughly " + 4 * count / 100000.0) Sample random point on unit circle – count how many are inside them (roughly about PI/4). Hence, u get approximate value for PI. Based on the PS/PC = AS/AC=4/PI, so PI = 4 * (PC/PS).
  • 27. Some Spark(ling) examples Spark code (parallel) val spark = new SparkContext(<Mesos master>) var count = spark.accumulator(0) for (i <- spark.parallelize(1 to 100000, 12)) { val x = Math.random * 2 – 1 val y = Math.random * 2 - 1 if (x*x + y*y < 1) count += 1 } println("Pi is roughly " + 4 * count / 100000.0) Notable points: 1. Spark context created – talks to Mesos1 master. 2. Count becomes shared variable – accumulator. 3. For loop is an RDD – breaks scala range object (1 to 100000) into 12 slices. 4. Parallelize method invokes foreach method of RDD. 1 Mesos is an Apache incubated clustering system –
  • 28. Logistic Regression in Spark: Serial Code // Read data file and convert it into Point objects val lines ="data.txt").getLines() val points = => parsePoint(x)) // Run logistic regression var w = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = Vector.zeros(D) for (p <- points) { val scale = (1/(1+Math.exp(-p.y*(w dot p.x)))-1)*p.y gradient += scale * p.x } w -= gradient } println("Result: " + w)
  • 29. Logistic Regression in Spark // Read data file and transform it into Point objects val spark = new SparkContext(<Mesos master>) val lines = spark.hdfsTextFile("hdfs://.../data.txt") val points = => parsePoint(x)).cache() // Run logistic regression var w = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = spark.accumulator(Vector.zeros(D)) for (p <- points) { val scale = (1/(1+Math.exp(-p.y*(w dot p.x)))-1)*p.y gradient += scale * p.x } w -= gradient.value } println("Result: " + w)
  • 30. Deep Learning on Spark Fully Distributed Deep learning network implementation on Spark. Spark would handle the parallelism, synchronization, distribution, and fail over. The input data set in HDFS, intermediate data in local file system Publish/subscribe message passing framework built on top of Apache Spark using Akka Framework.
  • 31.
  • 32. Thank You! Copyright @Impetus Technologies, 2015
  • 34. Copyright @Impetus Technologies, 2014 • RBM are Energy Based Models (EBM) • EBM associate an energy with every configuration of a system • Learning corresponds to modifying the shape of energy function, so that it has desirable properties • Like in physics, lower energy = more stability • So, modify shape of energy function such that the desirable configurations have lower energy Energy Based Models func.png
  • 35. Other DL networks: Convolutional Networks Copyright @Impetus Technologies, 2015 Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with Gradient-Based Learning. In Shape, Contour and Grouping in Computer Vision, David A. Forsyth, Joseph L. Mundy, Vito Di Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK, UK, 319-.
  • 36. • Recurrent Neural networks • Long Short Term Memory (LSTM), Temporal data • Sum-product networks • Deep architectures of sum-product networks • Hierarchical temporal memory • online structural and algorithmic model of neocortex. Other Brain-like Approaches Copyright @Impetus Technologies, 2015
  • 37. • Connections between units form a Directed cycle i.e. a typical feed back connections • RNNs can use their internal memory to process arbitrary sequences of inputs • RNNs cannot learn to look far back past • LSTM solve this problem by introducing stem cells • These stem cells can remember a value for an arbitrary amount of time Recurrent Neural Networks Copyright @Impetus Technologies, 2015
  • 38. • SPN is deep network model and is a directed acyclic graph • These networks allow to compute the probability of an event quickly • SPNs try to convert multi linear functions to ones in computationally short forms i.e. it must consist of multiple additions and multiplications • Leaves correspond to variables and nodes correspond to sums and products Sum-Product Networks (SPN) Copyright @Impetus Technologies, 2015
  • 39. • Is a online machine learning model developed by Jeff Hawkins • This model learns one instance at a time • Best explained by online stock model. Today’s situation of stock helps in prediction of tomorrow’s stock • A HTM network is tree shaped hierarchy of levels • Higher hierarchy levels can use patterns learned at lower levels. This is adopted from learning model adopted by brain in the form of neo cortex Hierarchical Temporal Memory Copyright @Impetus Technologies, 2015
  • 41. Mathematical Equations • The Energy Function is defined as follows: b’ and c’ are the biases 𝐸 𝑥, ℎ = −𝑏′ 𝑥 − 𝑐′ℎ − ℎ′ 𝑊𝑥 where, W represents the weights connecting visible layer and hidden layer. Copyright @Impetus Technologies, 2015
  • 42. Learning Energy Based Models • Energy based models can be learnt by performing gradient descent on negative log-likelihood of training data • It has the following form: − 𝜕 log 𝑝 𝑥 𝜕θ = 𝜕 𝐹 𝑥 𝜕θ − 𝑥̃ 𝑝 𝑥 𝜕 𝐹 𝑥 𝜕θ Positive phaseNegative phase Copyright @Impetus Technologies, 2015
  • 43. • ANN to Distributed Deep Learning • Key ideas in deep learning • Need for distributed realizations. • DistBelief, deeplearning4j etc. • Our work on large scale distributed deep learning • Deep learning leads us from statistics based machine learning towards brain inspired AI. Conclusions Copyright @Impetus Technologies, 2015

Editor's Notes

  2. Refined by Lecun in 1989 – mainly to apply CNNs to identify variability in 2D image data. Introduced in 1980 by Fukushima A type of RBMs where the communication is absent across the nodes in the same layer Nodes are not connected to every other node of next layer. Symmetry is not there Convolution networks learn images by pieces rather than learning as a whole (RBM does this) Designed to use minimal amounts of pre processing