Linear Regression Ordinary Least Squares Distributed Calculation Example
1. Linear Regression –
Ordinary Least Squares Distributed Calculation Example
Author: Marjan Sterjev
Linear regression is one of the most essential machine learning algorithms. It is an approach for
modeling the relationship between a scalar dependent variable y and one or more explanatory
variables X: x1 x2 x3...xn. The model is also known as trend line. If we can explain that relationship
with simple linear equation in the form y= bn*xn +… + b2*x2+ b1*x1+ b0 than we can predict the value
of y based on the X values substituted in that equation.
For example consider that we have the following pairs of numbers (x,y):
0 3
1 16
2 24
3 37
4 44
5 56
Based on the provided example pairs (x,y), our task is to find linear equation y= b1*x1+ b0 that will
match the above pairs as much as possible:
b1 * 0 + b0 ~ 3
b1 * 1 + b0 ~ 16
b1 * 2 + b0 ~ 24
b1 * 3 + b0 ~ 37
b1 * 4 + b0 ~ 44
b1 * 5 + b0 ~ 56
The solution for the coefficients b1 and b0 shall minimize the overall squared error between linear
equation predicted values and the real ones.
Let's define the matrices X, B and Y:
X B Y
0 1 b1 3
1 1 b0 16
2 1 24
3 1 37
4 1 44
5 1 56
1
2. The matrix form of the conditions above is:
X * B ~ Y
The Ordinary Least Squares (https://en.wikipedia.org/wiki/Ordinary_least_squares) closed form
solution for B is:
B=(XT*X)-1 * XT*Y
In R linear regression model coefficients can be calculated as:
> X <- matrix(c(0,1,1,1,2,1,3,1,4,1,5,1),ncol=2, byrow=TRUE)
> Y <- matrix(c(3,16,24,37,44,56), ncol=1, byrow=TRUE)
> solve(t(X)%*%X, t(X)%*%Y)
[,1]
[1,] 10.342857
[2,] 4.142857
The linear regression coefficients are:
b1=10.34
b0= 4.14
Based on the linear regression model we can calculate and predict value y for previously unseen x
variable. For example if x=7 the predicted y value will be:
10.34*7+4.14=76.52
The problem arises if the number of pairs (x,y) is very large, several billions for example. The matrices
X and Y will have several billions of rows too. Calculating the matrix products XT *X and XT*Y will be
time and memory space consuming, i.e. single worker process shall store matrices X and Y in memory
and execute billions of multiplications and additions.
The natural question is if we can divide the job among several processes that will join their efforts and
calculate XT *X and XT*Y in a distributed fashion.
Let us split the above input pairs (x,y) into 3 chunks that will be processed by 3 different processes
(the mappers):
X1 Y1
0 1 3
1 1 16
X2 Y2
2 1 24
3 1 37
2
3. X3 Y3
4 1 44
5 1 56
For each chunk the mapper will produce partial matrix products Xi
T *Xi and Xi
T * Yi (i=1,2,3).
Map Input Map Output
X1
T X1 Y1 X1
T*X1 X1
T*Y1
0 1 0 1 3 1 1 16
1 1 1 1 16 1 2 19
X2
T X2 Y2 X2
T*X2 X2
T*Y2
2 3 2 1 24 13 5 159
1 1 3 1 37 5 2 61
X3
T X3 Y3 X3
T*X3 X3
T*Y3
4 5 4 1 44 41 9 456
1 1 5 1 56 9 2 100
Note that the partial multiplication is executed with matrices that are small and that multiplication is
fast.
All partial matrix product results shall be collected by another process (the reducer) that will sum the
partial matrices and reconstruct the same result as if the complete matrix cross products were
produced by a single process.
R1=Reduce Output R2=Reduce Output
XT*X=
X1
T*X1+X2
T*X2+X3
T*X3
XT*Y=
X1
T*Y1+X2
T*Y2+X3
T*Y3
55 15 631
15 6 180
Once we have the reconstructed matrices XT*X and XT*Y, the solution is as simple as:
(XT *X) *B= XT * Y
B= (XT *X) -1* XT * Y = [10.34, 4.14]
The approach described above is an example of Map-Reduce based linear regression model training
that can be easily implemented on top of Apache Hadoop. The pairs of numbers can be stored into
files (single line per pair). Once the model calculation starts, Hadoop file splitting mechanism will
automatically delegate units of work to several map processes. The partial results distribution to the
3
4. anchor reducer is also automatically handled by Hadoop. What is left to the developer is providing
several lines of mapper/reducer code that will parse the input lines into (small) matrices and execute
cross products and additions against those matrices.
4