SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Parallel Machine Learning
Janani Chakkaradhari
Information Technology for Business Intelligence
Technische Universit¨ t Berlin
a
February 13, 2014
Abstract
Scalability has been an essential factor for any kind of computational algorithm while considering its performance. In this Big Data era, gathering of large
amounts of data is becoming easy. Data analysis on Big Data is not feasible using
the existing Machine Learning (ML) algorithms and it perceives them to perform
poorly. This is due to the fact that the computational logic for these algorithms is
previously designed in sequential way. MapReduce [1] becomes the solution for
handling billions of data efficiently. In this report we discuss the basic building
block for the computation behind ML algorithms, two different attempts to parallelize machine learning algorithms using MapReduce and a brief description on
the overhead in parallelization of ML algorithms.

1

Introduction

The significance of Machine Learning algorithms are widely known and its acquaintance in various applications brings in much more benefits in business as well as in
research community. In traditional ML algorithms, the computational methods were
built by thinking the data fits in memory. On the other hand, the current distributed
infrastructure of Information Systems (IS) facilitates the computerized society to easily access and also generate data in almost every action involved in their day to-day
life. This perpetual increase of data leads to degrade in performance of ML algorithms
which had been proved to produce fast and prominent results with smaller datasets
which in turn becomes the cause for “curse of modularity” [9].
With the advent of MapReduce programming model, data voluminous is handled
efficiently in parallel as it follows divide and conquer methodology for execution.
“Learning can become limited by computation time and not by data volume with help
of MapReduce and large clusters of machines” [8] and this imposes the fact that ML
algorithms has to be re-modified in order to be executed in parallel architecture.
Thus parallelization of ML algorithms using MapReduce model would results in
increase in speed of computation. Earlier works on this topic had been proved to produce increased performance. This report presents a gentle background study on the
exploitation of Linear Algebra in ML in section 2, followed by an overview of one
of the novel approach for parallelization of Stochastic Gradient Descent algorithm for
Matrix Factorization [2] in section 3, and a brief summary on declarative ML which is
an attempt to provide a declarative way of executing some of the ML algorithms and
linear algebra primitives on Hadoop using a system called SystemML [3] in section 4.

1
2

Computational Engine for Machine Learning

Mathematics and computer science are like the tracks of a train, they always go together
to make sure a good journey for real world users. Linear algebra has prominent role
in ML. Transforming problem space into linear functions is one of the elementary
approaches used in predictive algorithms. Matrices are used as means of representing
linear functions. In other words, the interaction between two entities of a system can be
represented in two dimensional form known as matrix. The elements inside the matrix
represents the magnitude of those interactions between two finite set of objects also
known as dyadic data [4]. Analysis of the system using matrix technique allows one to
predict the effect of individual interactions on the overall system. Some of the eminent
applications in ML based on linear algebra are listed below,
• Singular Value Decomposition (SVD) is one of famous method for its applications in image compression, determining oscillations or damages in structures
like bridge during the design phase and many more.
• Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA)
are used as a feature extraction step before classification.
• Eigen value and Eigen vectors has its proven results in PageRank algorithm.
• Analysis based on dyads such as topic modeling, keyword search and recommender systems are based on Non Negative Matrix Factorization technique [6].

3

Large Scale Matrix Factorization with DSGD

In this section, an overview of Distributed Stochastic Gradient Descent algorithm is
described with a brief review on optimization of Matrix Factorization using Stochastic
Gradient Descent and a quick introduction to functional usage of Matrix Factorization
and Stochastic Gradient Descent.

3.1

Matrix Factorization

Matrix Factorization is mainly used to extract interaction structure from dyadic data
[6]. The interaction structure includes the following [4]
• Co-occurrence
• Strength of preference or the association
• Word clustering, word sense disambiguation and thesaurus construction in text
based information retrieval
• Modeling of preference and consumption behavior
• The dyad in computer vision applications represents the feature observed at a
particular image location.

2
3.2

Stochastic Gradient Descent (SGD)

Gradient descent has fruitful applications in optimization problems. It predominantly
helps in minimizing the cost function of ML algorithms such as linear regression where
the weight vector or the parameter vector is determined by minimizing the average of
sum of square errors between the predictions minus the actual values in the training set
[7].
One main drawback of gradient descent is that it requires all the training data set
for computing the average square error in each step of updating parameter vector and
repeats this process until the parameter vector converges. This slows down the speed
of algorithm. It is also termed as Batch Gradient Descent.
In contrast, Stochastic Gradient Descent takes single training data at a time randomly and updates the parameter vector with respect to that training data in each step
and repeats the process until it converges. So this eliminates the need to look at the entire data set in each step and scans the entire training set for repetition of the algorithm.

3.3

Stochastic Gradient Descent for Matrix Factorization

Matrix Factorization helps to reconstruct the original matrix from the partially observed
matrix using some approximation technique. For example in the Netflix matrix problem of recommendation [5], the rows represent the user and columns represents the
movie. The matrix is partially filled with user ratings given to the movies. By considering the existing rating values, Matrix Factorization tries to find the missing values. In
simplest form, this can be done by associating each user and each movie some numbers
(factors) such that the product of these two numbers would be close as possible as the
original rating.
The discrepancies between the original input matrix and product of the factors here
is the cost function. We would try to reduce this cost function to get the most appropriate factors. One way to do this, is by employing Stochastic Gradient Descent
algorithm and SGD usually produces greater performance results in sequential execution. Since SGD approximation would end up with noisy values the cost function in
here includes regularization and other informations along with prediction error. SGD
tries to minimize sum of all losses in the entire matrix. SGD works as follows [2],
• Step 1: Takes a random entry from the training set
• Step 2: Evaluate loss function
• Step 3: Update parameter spaces
• Step 4: Repeat Step 1 to 3 for all the entries in the matrix
We can not run this algorithm in parallel using MapReduce. The reason is the
following, each mapper runs SGD on the subsets of large matrix. It reads current
row and current column of the subset, evaluates local loss function and updates the
parameters (i.e. the rows and columns) of the corresponding matrix subset. As we
considered SGD runs in parallel, it could be possible for the algorithm to be executed
on another subset of the matrix which is dependent (the same column but different
row). This deliberately leads the second mapper to read the values that are updated by
the first mapper at the same time. So this makes the algorithm not to run in parallel
architecture.

3
As described by Gemulla [2], not all the subsets are dependents in the matrix. In
Most of the cases the subsets are completely independent to each other so that it could
be possible to run SGD by locking the rows and columns of that subset. This idea
forms the basis for parallelized SGD.

3.4

Distributed SGD for Matrix Factorization (DSGD)

DSGD utilizes the concept of independent rows and columns. Suppose if we have d
number of nodes in the cluster, we split the input matrix (the training set of known
ratings) into d ¢ d smaller matrices and distribute the smaller matrix into the d blocks
such that the each node has the blocks of entire row as shown in the Figure 1.

Figure 1: Example Stratum of 3 Cluster nodes
The interchangeable sub matrices is called stratum basically represents a partition
of the underlying matrix dataset. In the paper [2], the stratification is performed by
permutation such that d nodes has the possible independent block combinationsd!. For
example 3 nodes have 6 possible stratums and this 6 stratums forms a single sequence
of stratra. The DSGD algorithm works as follows, Assuming there are d nodes available, Z is training set input matrix, W and H are the parameter factors of the input
matrix.
• Step 1: Divide the input matrix to Z into dd and distribute it over the clusters. H
and W parameters are equally distributed on d blocks on rows and columns such
that W with d ¢ 1 and H with 1 ¢ d dimensions. Compute the strata sequence for
the input blocks using permutations. For each stratum in the strata, do step 2 and
step 3
• Step 2: Select a stratum that are independent, for example the blocks along the
diagonal the red boxes as shown in the figure from the sequence of strata (all
possible combinations of stratum).
• Step 3: Run SGD on the selected blocks in parallel to find the local minimum
for loss function. Sum up the results of local losses computed at each block and
update the corresponding factor matrices W and H
This is how DSGD runs SGD algorithm in a distributed manner within a stratum.
DSGD outperforms ALS (Alternating Least Squares) method for matrix factorization
[2]. Since DSGD avoid averaging over loss functions when executed in parallel which
makes the algorithm simpler and versatile

4
4

Declarative Machine Learning: SystemML

The overhead in parallelizing ML algorithms can be easily understood by simple SGD
algorithm as we discussed in previous section. This makes a very clear argument that
the researchers have to carefully analyze each sequentially powerful ML algorithm to
make it parallel and to be executed in MapReduce programming model. The cost of implementing as MapReduce jobs is high and also for better performance sometimes the
same algorithm has to be hand tuned. Hence there is no space for the discussion of optimization in MapReduce jobs. For example in case of matrix multiplication problem,
the order execution of multiplication has higher performance impact [3]. Researchers
from IBM Almaden and Watson research center has proposed a new approach for handling parallelization of ML algorithms which also considers optimization into account
and it is called SystemML.
SystemML is analogous to HiveQL developed by Facebook for executing data
warehouse queries on large clusters where the queries are converted to MapReduce jobs
which will be executed on Hadoop by the HiveQL engine. Similarly SystemML provides a declarative platform for expressing ML algorithms and linear algebra primitives
and converts the abstract representation into executable MapReduce jobs on Hadoop.

4.1

Application areas of SystemML

In SystemML, ML algorithms are expressed in High Level Language called Declarative Machine Learning (DML) which is comparable to R. DML supports operations
such as transpose of a matrix, matrix multiplication, iterative algorithms using “for”
and “while” constructs and soon. So this makes user to focus on writing scripts that
answers to what constructs to use for computation rather than how to express computation. SystemML is highly scalable and efficiently tunes the performance. It is
used in different fields such as predictive modeling, recommender systems, and search
analysis.

4.2

System Architecture of SystemML

SystemML takes the DML script as input and passes through the different components
[3] and results in parsed representation of the initial script. It supports built in data
types for representing matrices and scalars. The first step in SystemML is Identifying
the statement blocks based on the constructs that breaks the sequential flow of DML
program. For each statement block it does the following,

4.3

High level Operator (HOP)

HOP component analysis consumes and results in the following input and output.
Input: Parsed statement blocks
Action: The computation in each statement block instantiates one HOP Dag (Directed Acyclic Graph). HOP Dag represents the basic operations on Matrices and scalar
such as an operation or transformation.
Optimizations: Algebraic rewrites, selection of physical representation for intermediate matrices and cost based optimizations
Output: High level execution plan (HOP Dags) representing dataflow

5
4.4

Low level Operator (LOP)

LOP component analysis is following by HOP and the corresponding input and output
are as follows,
Input: High level execution plan (HOP Dags)
Action: HOP Dags are converted into Low level physical plans (LOP Dags) that
can be executed as MapReduce jobs. HOP Dags are parsed from bottom to top. Each
HOP Dag is converted into one or more LOP Dags. The input and the output formats
of each LOP is key value pairs. Since single computation leads to multiple LOPs,
SystemML tries to combine these LOPs to fit into a single MapReduce job. This is implemented by using a novel algorithm named piggybacking which reduces the number
of scans performed on input data during the execution of MR jobs. This is described in
section
Output: Low level execution plan (LOP Dags)

4.5

Runtime

The runtime makes sure that the input matrices are represented as key value pairs by
disregarding the cells without a value in the matrices and by that way it reduces the size
of input matrix representation as they are inherently sparse. SystemML collects the
local sparsity information by employing blocking operation on the input matrix. The
input matrix is divided into smaller matrices called blocks and each block is represented
with a block id and the cells represent the block value along with parameter indicating
whether the block is dense or sparse. The block size has major impact on generated
number of key value pairs by runtime [3].
Generic MapReduce Job (GM-R) is the main Execution engine in System ML and
it is instantiated by the Piggybacking algorithm (Multiple LOPs inside single MR jobs)
Control Module helps in coordinating the execution MapReduce jobs and involved
in computations such as arithmetic operations, predictive evaluations and soon. Multiple optimizations are performed in the runtime component (dynamically deciding
based on data characteristics)

4.6

Piggybacking

This algorithm packages multiple LOPs in the SystemML into a single MapReduce job
by considering the execution locations of each LOP at runtime. The execution location
identifies whether a LOP operation can be executed in Map or Reduce or it requires
both Map and Reduce for complete execution of the operation. 2 shows the list of
different LOP operations and their corresponding execution location. For example the
group operation of LOP has to be executed on both Map and Reduce phase and so it is
marked as MapAndReduce.
We consider the following example in 3 to layout the logic behind piggybacking
algorithm. The left part of the diagram represents the LOP Dag for a matrix multiplication of matrix W with its transpose. LOP Dags are parsed from bottom up fashion. The
algorithm starts by sorting LOP operations in topological order and the result of sort is
represented in center of the diagram. The algorithm works iteratively where it creates
a new MR job at the beginning of each iteration. The order of assigning each LOP
into the MR job is as follows, it first assigns the LOPs that only requires Map phase
indicated by Map or Reduce location in 2 followed by assigning LOPs that needs both
MapAndReduce phases and finally ends by assigning LOPs that requires only Reduce

6
Figure 2: Execution locations of LOP from [3]
phase. The algorithm makes sure that another descendant with execution location of
MapAndReduce will not be assigned to the same job.

Figure 3: Example Piggybacking
In our example since Data W and Transform LOPs spans only Map or Reduce
operation, it is assigned to the Map of first MR job. mmcj is the first LOP that spans
Map and Reduce phases, it is assigned to the both Map and Reduce phases of first MR
job. Since the first MR job is already has a LOP with location MapAndReduce, the
Group LOP which also has the same location of execution can not be assigned to the
first MR job. Hence the iteration ends and the next iteration start by instantiating the
second new MR job. Finally, Group and Aggregation operations are assigned to this
second MR job which completes the piggy backing algorithm in this examples.

5

Conclusion

In this report we have seen the requirements and the importance of research works in
the parallelization of ML algorithms and the role of the branch of mathematics, Linear
Algebra in ML algorithms. The realization of the level of difficulty in parallelizing ML
algorithms is covered by explaining a novel approach employed by DSGD algorithm
which is an effort to parallelize SGD for large clusters of data. Moreover we also
discussed about SystemML which provides an easier declarative platform for executing
ML algorithms to the users in different fields.
Even though SystemML is concise and provides user friendly platform for executing limited forms of ML algorithms and some linear algebra primitives such as matrix
multiplication, arithmetic operations and MF, DML does not support more complex
7
features of object oriented paradigm. It also does not support data structures such as
Arrays and Lists that are frequently used in most of the ML algorithms instead this is
possible in R, a language that provides a comprehensive set of flexible constructs statistical and ML algorithms. On the other hand, Apache Mahout also provides complete
set of ML algorithms that are Hadoop based packages but it still needs to be hand tuned
for different data sets and it is more complex in users perspective.

References
[1] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on
large clusters. Communications of the ACM, 51(1):107–113, 2008.
[2] Rainer Gemulla, Erik Nijkamp, Peter J Haas, and Yannis Sismanis. Large-scale
matrix factorization with distributed stochastic gradient descent. In Proceedings
of the 17th ACM SIGKDD international conference on Knowledge discovery and
data mining, pages 69–77. ACM, 2011.
[3] Amol Ghoting, Rajasekar Krishnamurthy, Edwin Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar
Vaithyanathan. Systemml: Declarative machine learning on mapreduce. In Data
Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 231–
242. IEEE, 2011.
[4] Thomas Hofmann, Jan Puzicha, and Michael I Jordan. Learning from dyadic data.
Advances in neural information processing systems, pages 466–472, 1999.
[5] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques
for recommender systems. Computer, 42(8):30–37, 2009.
[6] Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, and Yi-Min Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on
mapreduce. In Proceedings of the 19th international conference on World wide
web, pages 681–690. ACM, 2010.
[7] Andrew Ng. Cs229 lecture notes. CS229 Lecture notes, 1(1):1–3, 2000.
[8] Tutorial on Modeling with Hadoop in KDD2011 by Vijay Narayanan and Milind
Bhandarkar. Modeling with hadoop.
[9] Charles Parker. Unexpected challenges in large scale machine learning. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pages 1–6. ACM, 2012.

8

Weitere ähnliche Inhalte

Was ist angesagt?

Image segmentation by modified map ml estimations
Image segmentation by modified map ml estimationsImage segmentation by modified map ml estimations
Image segmentation by modified map ml estimationsijesajournal
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
 
facility layout paper
 facility layout paper facility layout paper
facility layout paperSaurabh Tiwary
 
Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...journalBEEI
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
 
Probabilistic model based image segmentation
Probabilistic model based image segmentationProbabilistic model based image segmentation
Probabilistic model based image segmentationijma
 
Detection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processingDetection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processingNaeem Shehzad
 
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...IJPEDS-IAES
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
 
Fault detection based on novel fuzzy modelling
Fault detection based on novel fuzzy modelling Fault detection based on novel fuzzy modelling
Fault detection based on novel fuzzy modelling csijjournal
 
llorma_jmlr copy
llorma_jmlr copyllorma_jmlr copy
llorma_jmlr copyGuy Lebanon
 

Was ist angesagt? (18)

N41049093
N41049093N41049093
N41049093
 
Image segmentation by modified map ml estimations
Image segmentation by modified map ml estimationsImage segmentation by modified map ml estimations
Image segmentation by modified map ml estimations
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
facility layout paper
 facility layout paper facility layout paper
facility layout paper
 
Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 
D010332630
D010332630D010332630
D010332630
 
Probabilistic model based image segmentation
Probabilistic model based image segmentationProbabilistic model based image segmentation
Probabilistic model based image segmentation
 
Detection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processingDetection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processing
 
Data reduction
Data reductionData reduction
Data reduction
 
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
Fault detection based on novel fuzzy modelling
Fault detection based on novel fuzzy modelling Fault detection based on novel fuzzy modelling
Fault detection based on novel fuzzy modelling
 
llorma_jmlr copy
llorma_jmlr copyllorma_jmlr copy
llorma_jmlr copy
 
ssc_icml13
ssc_icml13ssc_icml13
ssc_icml13
 
lcr
lcrlcr
lcr
 
Plant Layout Algorithm
Plant Layout AlgorithmPlant Layout Algorithm
Plant Layout Algorithm
 

Ähnlich wie Parallel Machine Learning

mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning Pranya Prabhakar
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningVenkata Karthik Gullapalli
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4Subhajit Sahu
 
Implementing Merge Sort
Implementing Merge SortImplementing Merge Sort
Implementing Merge Sortsmita gupta
 
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...csandit
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspectiveপল্লব রায়
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMIJCSEA Journal
 
Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
 
Operation's research models
Operation's research modelsOperation's research models
Operation's research modelsAbhinav Kp
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONDECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONcscpconf
 
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...IJCI JOURNAL
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
PAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph ComputationPAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph Computation1crore projects
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
Decision Tree Clustering : A Columnstores Tuple Reconstruction
Decision Tree Clustering : A Columnstores Tuple ReconstructionDecision Tree Clustering : A Columnstores Tuple Reconstruction
Decision Tree Clustering : A Columnstores Tuple Reconstructioncsandit
 

Ähnlich wie Parallel Machine Learning (20)

mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
Exploring optimizations for dynamic PageRank algorithm based on GPU : V4
 
Implementing Merge Sort
Implementing Merge SortImplementing Merge Sort
Implementing Merge Sort
 
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Optimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data PerspectiveOptimal Chain Matrix Multiplication Big Data Perspective
Optimal Chain Matrix Multiplication Big Data Perspective
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
 
Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...
 
Operation's research models
Operation's research modelsOperation's research models
Operation's research models
 
Aggreagate awareness
Aggreagate awarenessAggreagate awareness
Aggreagate awareness
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONDECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
 
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
PAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph ComputationPAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph Computation
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
Y34147151
Y34147151Y34147151
Y34147151
 
Decision Tree Clustering : A Columnstores Tuple Reconstruction
Decision Tree Clustering : A Columnstores Tuple ReconstructionDecision Tree Clustering : A Columnstores Tuple Reconstruction
Decision Tree Clustering : A Columnstores Tuple Reconstruction
 

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Parallel Machine Learning

  • 1. Parallel Machine Learning Janani Chakkaradhari Information Technology for Business Intelligence Technische Universit¨ t Berlin a February 13, 2014 Abstract Scalability has been an essential factor for any kind of computational algorithm while considering its performance. In this Big Data era, gathering of large amounts of data is becoming easy. Data analysis on Big Data is not feasible using the existing Machine Learning (ML) algorithms and it perceives them to perform poorly. This is due to the fact that the computational logic for these algorithms is previously designed in sequential way. MapReduce [1] becomes the solution for handling billions of data efficiently. In this report we discuss the basic building block for the computation behind ML algorithms, two different attempts to parallelize machine learning algorithms using MapReduce and a brief description on the overhead in parallelization of ML algorithms. 1 Introduction The significance of Machine Learning algorithms are widely known and its acquaintance in various applications brings in much more benefits in business as well as in research community. In traditional ML algorithms, the computational methods were built by thinking the data fits in memory. On the other hand, the current distributed infrastructure of Information Systems (IS) facilitates the computerized society to easily access and also generate data in almost every action involved in their day to-day life. This perpetual increase of data leads to degrade in performance of ML algorithms which had been proved to produce fast and prominent results with smaller datasets which in turn becomes the cause for “curse of modularity” [9]. With the advent of MapReduce programming model, data voluminous is handled efficiently in parallel as it follows divide and conquer methodology for execution. “Learning can become limited by computation time and not by data volume with help of MapReduce and large clusters of machines” [8] and this imposes the fact that ML algorithms has to be re-modified in order to be executed in parallel architecture. Thus parallelization of ML algorithms using MapReduce model would results in increase in speed of computation. Earlier works on this topic had been proved to produce increased performance. This report presents a gentle background study on the exploitation of Linear Algebra in ML in section 2, followed by an overview of one of the novel approach for parallelization of Stochastic Gradient Descent algorithm for Matrix Factorization [2] in section 3, and a brief summary on declarative ML which is an attempt to provide a declarative way of executing some of the ML algorithms and linear algebra primitives on Hadoop using a system called SystemML [3] in section 4. 1
  • 2. 2 Computational Engine for Machine Learning Mathematics and computer science are like the tracks of a train, they always go together to make sure a good journey for real world users. Linear algebra has prominent role in ML. Transforming problem space into linear functions is one of the elementary approaches used in predictive algorithms. Matrices are used as means of representing linear functions. In other words, the interaction between two entities of a system can be represented in two dimensional form known as matrix. The elements inside the matrix represents the magnitude of those interactions between two finite set of objects also known as dyadic data [4]. Analysis of the system using matrix technique allows one to predict the effect of individual interactions on the overall system. Some of the eminent applications in ML based on linear algebra are listed below, • Singular Value Decomposition (SVD) is one of famous method for its applications in image compression, determining oscillations or damages in structures like bridge during the design phase and many more. • Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are used as a feature extraction step before classification. • Eigen value and Eigen vectors has its proven results in PageRank algorithm. • Analysis based on dyads such as topic modeling, keyword search and recommender systems are based on Non Negative Matrix Factorization technique [6]. 3 Large Scale Matrix Factorization with DSGD In this section, an overview of Distributed Stochastic Gradient Descent algorithm is described with a brief review on optimization of Matrix Factorization using Stochastic Gradient Descent and a quick introduction to functional usage of Matrix Factorization and Stochastic Gradient Descent. 3.1 Matrix Factorization Matrix Factorization is mainly used to extract interaction structure from dyadic data [6]. The interaction structure includes the following [4] • Co-occurrence • Strength of preference or the association • Word clustering, word sense disambiguation and thesaurus construction in text based information retrieval • Modeling of preference and consumption behavior • The dyad in computer vision applications represents the feature observed at a particular image location. 2
  • 3. 3.2 Stochastic Gradient Descent (SGD) Gradient descent has fruitful applications in optimization problems. It predominantly helps in minimizing the cost function of ML algorithms such as linear regression where the weight vector or the parameter vector is determined by minimizing the average of sum of square errors between the predictions minus the actual values in the training set [7]. One main drawback of gradient descent is that it requires all the training data set for computing the average square error in each step of updating parameter vector and repeats this process until the parameter vector converges. This slows down the speed of algorithm. It is also termed as Batch Gradient Descent. In contrast, Stochastic Gradient Descent takes single training data at a time randomly and updates the parameter vector with respect to that training data in each step and repeats the process until it converges. So this eliminates the need to look at the entire data set in each step and scans the entire training set for repetition of the algorithm. 3.3 Stochastic Gradient Descent for Matrix Factorization Matrix Factorization helps to reconstruct the original matrix from the partially observed matrix using some approximation technique. For example in the Netflix matrix problem of recommendation [5], the rows represent the user and columns represents the movie. The matrix is partially filled with user ratings given to the movies. By considering the existing rating values, Matrix Factorization tries to find the missing values. In simplest form, this can be done by associating each user and each movie some numbers (factors) such that the product of these two numbers would be close as possible as the original rating. The discrepancies between the original input matrix and product of the factors here is the cost function. We would try to reduce this cost function to get the most appropriate factors. One way to do this, is by employing Stochastic Gradient Descent algorithm and SGD usually produces greater performance results in sequential execution. Since SGD approximation would end up with noisy values the cost function in here includes regularization and other informations along with prediction error. SGD tries to minimize sum of all losses in the entire matrix. SGD works as follows [2], • Step 1: Takes a random entry from the training set • Step 2: Evaluate loss function • Step 3: Update parameter spaces • Step 4: Repeat Step 1 to 3 for all the entries in the matrix We can not run this algorithm in parallel using MapReduce. The reason is the following, each mapper runs SGD on the subsets of large matrix. It reads current row and current column of the subset, evaluates local loss function and updates the parameters (i.e. the rows and columns) of the corresponding matrix subset. As we considered SGD runs in parallel, it could be possible for the algorithm to be executed on another subset of the matrix which is dependent (the same column but different row). This deliberately leads the second mapper to read the values that are updated by the first mapper at the same time. So this makes the algorithm not to run in parallel architecture. 3
  • 4. As described by Gemulla [2], not all the subsets are dependents in the matrix. In Most of the cases the subsets are completely independent to each other so that it could be possible to run SGD by locking the rows and columns of that subset. This idea forms the basis for parallelized SGD. 3.4 Distributed SGD for Matrix Factorization (DSGD) DSGD utilizes the concept of independent rows and columns. Suppose if we have d number of nodes in the cluster, we split the input matrix (the training set of known ratings) into d ¢ d smaller matrices and distribute the smaller matrix into the d blocks such that the each node has the blocks of entire row as shown in the Figure 1. Figure 1: Example Stratum of 3 Cluster nodes The interchangeable sub matrices is called stratum basically represents a partition of the underlying matrix dataset. In the paper [2], the stratification is performed by permutation such that d nodes has the possible independent block combinationsd!. For example 3 nodes have 6 possible stratums and this 6 stratums forms a single sequence of stratra. The DSGD algorithm works as follows, Assuming there are d nodes available, Z is training set input matrix, W and H are the parameter factors of the input matrix. • Step 1: Divide the input matrix to Z into dd and distribute it over the clusters. H and W parameters are equally distributed on d blocks on rows and columns such that W with d ¢ 1 and H with 1 ¢ d dimensions. Compute the strata sequence for the input blocks using permutations. For each stratum in the strata, do step 2 and step 3 • Step 2: Select a stratum that are independent, for example the blocks along the diagonal the red boxes as shown in the figure from the sequence of strata (all possible combinations of stratum). • Step 3: Run SGD on the selected blocks in parallel to find the local minimum for loss function. Sum up the results of local losses computed at each block and update the corresponding factor matrices W and H This is how DSGD runs SGD algorithm in a distributed manner within a stratum. DSGD outperforms ALS (Alternating Least Squares) method for matrix factorization [2]. Since DSGD avoid averaging over loss functions when executed in parallel which makes the algorithm simpler and versatile 4
  • 5. 4 Declarative Machine Learning: SystemML The overhead in parallelizing ML algorithms can be easily understood by simple SGD algorithm as we discussed in previous section. This makes a very clear argument that the researchers have to carefully analyze each sequentially powerful ML algorithm to make it parallel and to be executed in MapReduce programming model. The cost of implementing as MapReduce jobs is high and also for better performance sometimes the same algorithm has to be hand tuned. Hence there is no space for the discussion of optimization in MapReduce jobs. For example in case of matrix multiplication problem, the order execution of multiplication has higher performance impact [3]. Researchers from IBM Almaden and Watson research center has proposed a new approach for handling parallelization of ML algorithms which also considers optimization into account and it is called SystemML. SystemML is analogous to HiveQL developed by Facebook for executing data warehouse queries on large clusters where the queries are converted to MapReduce jobs which will be executed on Hadoop by the HiveQL engine. Similarly SystemML provides a declarative platform for expressing ML algorithms and linear algebra primitives and converts the abstract representation into executable MapReduce jobs on Hadoop. 4.1 Application areas of SystemML In SystemML, ML algorithms are expressed in High Level Language called Declarative Machine Learning (DML) which is comparable to R. DML supports operations such as transpose of a matrix, matrix multiplication, iterative algorithms using “for” and “while” constructs and soon. So this makes user to focus on writing scripts that answers to what constructs to use for computation rather than how to express computation. SystemML is highly scalable and efficiently tunes the performance. It is used in different fields such as predictive modeling, recommender systems, and search analysis. 4.2 System Architecture of SystemML SystemML takes the DML script as input and passes through the different components [3] and results in parsed representation of the initial script. It supports built in data types for representing matrices and scalars. The first step in SystemML is Identifying the statement blocks based on the constructs that breaks the sequential flow of DML program. For each statement block it does the following, 4.3 High level Operator (HOP) HOP component analysis consumes and results in the following input and output. Input: Parsed statement blocks Action: The computation in each statement block instantiates one HOP Dag (Directed Acyclic Graph). HOP Dag represents the basic operations on Matrices and scalar such as an operation or transformation. Optimizations: Algebraic rewrites, selection of physical representation for intermediate matrices and cost based optimizations Output: High level execution plan (HOP Dags) representing dataflow 5
  • 6. 4.4 Low level Operator (LOP) LOP component analysis is following by HOP and the corresponding input and output are as follows, Input: High level execution plan (HOP Dags) Action: HOP Dags are converted into Low level physical plans (LOP Dags) that can be executed as MapReduce jobs. HOP Dags are parsed from bottom to top. Each HOP Dag is converted into one or more LOP Dags. The input and the output formats of each LOP is key value pairs. Since single computation leads to multiple LOPs, SystemML tries to combine these LOPs to fit into a single MapReduce job. This is implemented by using a novel algorithm named piggybacking which reduces the number of scans performed on input data during the execution of MR jobs. This is described in section Output: Low level execution plan (LOP Dags) 4.5 Runtime The runtime makes sure that the input matrices are represented as key value pairs by disregarding the cells without a value in the matrices and by that way it reduces the size of input matrix representation as they are inherently sparse. SystemML collects the local sparsity information by employing blocking operation on the input matrix. The input matrix is divided into smaller matrices called blocks and each block is represented with a block id and the cells represent the block value along with parameter indicating whether the block is dense or sparse. The block size has major impact on generated number of key value pairs by runtime [3]. Generic MapReduce Job (GM-R) is the main Execution engine in System ML and it is instantiated by the Piggybacking algorithm (Multiple LOPs inside single MR jobs) Control Module helps in coordinating the execution MapReduce jobs and involved in computations such as arithmetic operations, predictive evaluations and soon. Multiple optimizations are performed in the runtime component (dynamically deciding based on data characteristics) 4.6 Piggybacking This algorithm packages multiple LOPs in the SystemML into a single MapReduce job by considering the execution locations of each LOP at runtime. The execution location identifies whether a LOP operation can be executed in Map or Reduce or it requires both Map and Reduce for complete execution of the operation. 2 shows the list of different LOP operations and their corresponding execution location. For example the group operation of LOP has to be executed on both Map and Reduce phase and so it is marked as MapAndReduce. We consider the following example in 3 to layout the logic behind piggybacking algorithm. The left part of the diagram represents the LOP Dag for a matrix multiplication of matrix W with its transpose. LOP Dags are parsed from bottom up fashion. The algorithm starts by sorting LOP operations in topological order and the result of sort is represented in center of the diagram. The algorithm works iteratively where it creates a new MR job at the beginning of each iteration. The order of assigning each LOP into the MR job is as follows, it first assigns the LOPs that only requires Map phase indicated by Map or Reduce location in 2 followed by assigning LOPs that needs both MapAndReduce phases and finally ends by assigning LOPs that requires only Reduce 6
  • 7. Figure 2: Execution locations of LOP from [3] phase. The algorithm makes sure that another descendant with execution location of MapAndReduce will not be assigned to the same job. Figure 3: Example Piggybacking In our example since Data W and Transform LOPs spans only Map or Reduce operation, it is assigned to the Map of first MR job. mmcj is the first LOP that spans Map and Reduce phases, it is assigned to the both Map and Reduce phases of first MR job. Since the first MR job is already has a LOP with location MapAndReduce, the Group LOP which also has the same location of execution can not be assigned to the first MR job. Hence the iteration ends and the next iteration start by instantiating the second new MR job. Finally, Group and Aggregation operations are assigned to this second MR job which completes the piggy backing algorithm in this examples. 5 Conclusion In this report we have seen the requirements and the importance of research works in the parallelization of ML algorithms and the role of the branch of mathematics, Linear Algebra in ML algorithms. The realization of the level of difficulty in parallelizing ML algorithms is covered by explaining a novel approach employed by DSGD algorithm which is an effort to parallelize SGD for large clusters of data. Moreover we also discussed about SystemML which provides an easier declarative platform for executing ML algorithms to the users in different fields. Even though SystemML is concise and provides user friendly platform for executing limited forms of ML algorithms and some linear algebra primitives such as matrix multiplication, arithmetic operations and MF, DML does not support more complex 7
  • 8. features of object oriented paradigm. It also does not support data structures such as Arrays and Lists that are frequently used in most of the ML algorithms instead this is possible in R, a language that provides a comprehensive set of flexible constructs statistical and ML algorithms. On the other hand, Apache Mahout also provides complete set of ML algorithms that are Hadoop based packages but it still needs to be hand tuned for different data sets and it is more complex in users perspective. References [1] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. [2] Rainer Gemulla, Erik Nijkamp, Peter J Haas, and Yannis Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 69–77. ACM, 2011. [3] Amol Ghoting, Rajasekar Krishnamurthy, Edwin Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar Vaithyanathan. Systemml: Declarative machine learning on mapreduce. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 231– 242. IEEE, 2011. [4] Thomas Hofmann, Jan Puzicha, and Michael I Jordan. Learning from dyadic data. Advances in neural information processing systems, pages 466–472, 1999. [5] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009. [6] Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, and Yi-Min Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In Proceedings of the 19th international conference on World wide web, pages 681–690. ACM, 2010. [7] Andrew Ng. Cs229 lecture notes. CS229 Lecture notes, 1(1):1–3, 2000. [8] Tutorial on Modeling with Hadoop in KDD2011 by Vijay Narayanan and Milind Bhandarkar. Modeling with hadoop. [9] Charles Parker. Unexpected challenges in large scale machine learning. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pages 1–6. ACM, 2012. 8