SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Linear Regression –
Ordinary Least Squares Distributed Calculation Example
Author: Marjan Sterjev
Linear regression is one of the most essential machine learning algorithms. It is an approach for
modeling the relationship between a scalar dependent variable y and one or more explanatory
variables X: x1 x2 x3...xn. The model is also known as trend line. If we can explain that relationship
with simple linear equation in the form y= bn*xn +… + b2*x2+ b1*x1+ b0 than we can predict the value
of y based on the X values substituted in that equation.
For example consider that we have the following pairs of numbers (x,y):
0 3
1 16
2 24
3 37
4 44
5 56
Based on the provided example pairs (x,y), our task is to find linear equation y= b1*x1+ b0 that will
match the above pairs as much as possible:
b1 * 0 + b0 ~ 3
b1 * 1 + b0 ~ 16
b1 * 2 + b0 ~ 24
b1 * 3 + b0 ~ 37
b1 * 4 + b0 ~ 44
b1 * 5 + b0 ~ 56
The solution for the coefficients b1 and b0 shall minimize the overall squared error between linear
equation predicted values and the real ones.
Let's define the matrices X, B and Y:
X B Y
0 1 b1 3
1 1 b0 16
2 1 24
3 1 37
4 1 44
5 1 56
1
The matrix form of the conditions above is:
X * B ~ Y
The Ordinary Least Squares (https://en.wikipedia.org/wiki/Ordinary_least_squares) closed form
solution for B is:
B=(XT*X)-1 * XT*Y
In R linear regression model coefficients can be calculated as:
> X <- matrix(c(0,1,1,1,2,1,3,1,4,1,5,1),ncol=2, byrow=TRUE)
> Y <- matrix(c(3,16,24,37,44,56), ncol=1, byrow=TRUE)
> solve(t(X)%*%X, t(X)%*%Y)
[,1]
[1,] 10.342857
[2,] 4.142857
The linear regression coefficients are:
b1=10.34
b0= 4.14
Based on the linear regression model we can calculate and predict value y for previously unseen x
variable. For example if x=7 the predicted y value will be:
10.34*7+4.14=76.52
The problem arises if the number of pairs (x,y) is very large, several billions for example. The matrices
X and Y will have several billions of rows too. Calculating the matrix products XT *X and XT*Y will be
time and memory space consuming, i.e. single worker process shall store matrices X and Y in memory
and execute billions of multiplications and additions.
The natural question is if we can divide the job among several processes that will join their efforts and
calculate XT *X and XT*Y in a distributed fashion.
Let us split the above input pairs (x,y) into 3 chunks that will be processed by 3 different processes
(the mappers):
X1 Y1
0 1 3
1 1 16
X2 Y2
2 1 24
3 1 37
2
X3 Y3
4 1 44
5 1 56
For each chunk the mapper will produce partial matrix products Xi
T *Xi and Xi
T * Yi (i=1,2,3).
Map Input Map Output
X1
T X1 Y1 X1
T*X1 X1
T*Y1
0 1 0 1 3 1 1 16
1 1 1 1 16 1 2 19
X2
T X2 Y2 X2
T*X2 X2
T*Y2
2 3 2 1 24 13 5 159
1 1 3 1 37 5 2 61
X3
T X3 Y3 X3
T*X3 X3
T*Y3
4 5 4 1 44 41 9 456
1 1 5 1 56 9 2 100
Note that the partial multiplication is executed with matrices that are small and that multiplication is
fast.
All partial matrix product results shall be collected by another process (the reducer) that will sum the
partial matrices and reconstruct the same result as if the complete matrix cross products were
produced by a single process.
R1=Reduce Output R2=Reduce Output
XT*X=
X1
T*X1+X2
T*X2+X3
T*X3
XT*Y=
X1
T*Y1+X2
T*Y2+X3
T*Y3
55 15 631
15 6 180
Once we have the reconstructed matrices XT*X and XT*Y, the solution is as simple as:
(XT *X) *B= XT * Y
B= (XT *X) -1* XT * Y = [10.34, 4.14]
The approach described above is an example of Map-Reduce based linear regression model training
that can be easily implemented on top of Apache Hadoop. The pairs of numbers can be stored into
files (single line per pair). Once the model calculation starts, Hadoop file splitting mechanism will
automatically delegate units of work to several map processes. The partial results distribution to the
3
anchor reducer is also automatically handled by Hadoop. What is left to the developer is providing
several lines of mapper/reducer code that will parse the input lines into (small) matrices and execute
cross products and additions against those matrices.
4

Weitere ähnliche Inhalte

Was ist angesagt?

Basic concepts of curve fittings
Basic concepts of curve fittingsBasic concepts of curve fittings
Basic concepts of curve fittingsTarun Gehlot
 
Matlab polynimials and curve fitting
Matlab polynimials and curve fittingMatlab polynimials and curve fitting
Matlab polynimials and curve fittingAmeen San
 
Method of least square
Method of least squareMethod of least square
Method of least squareSomya Bagai
 
METHOD OF LEAST SQURE
METHOD OF LEAST SQUREMETHOD OF LEAST SQURE
METHOD OF LEAST SQUREDanial Mirza
 
Non linear curve fitting
Non linear curve fitting Non linear curve fitting
Non linear curve fitting Anumita Mondal
 
Regression analysis presentation
Regression analysis presentationRegression analysis presentation
Regression analysis presentationMuhammadFaisal733
 
Lesson 6 coefficient of determination
Lesson 6   coefficient of determinationLesson 6   coefficient of determination
Lesson 6 coefficient of determinationMehediHasan1023
 
Least Square Optimization and Sparse-Linear Solver
Least Square Optimization and Sparse-Linear SolverLeast Square Optimization and Sparse-Linear Solver
Least Square Optimization and Sparse-Linear SolverJi-yong Kwon
 
Least square method
Least square methodLeast square method
Least square methodSomya Bagai
 
Complex Variable & Numerical Method
Complex Variable & Numerical MethodComplex Variable & Numerical Method
Complex Variable & Numerical MethodNeel Patel
 
Applied numerical methods lec6
Applied numerical methods lec6Applied numerical methods lec6
Applied numerical methods lec6Yasser Ahmed
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statisticsTarun Gehlot
 
Lu decomposition
Lu decompositionLu decomposition
Lu decompositiongilandio
 

Was ist angesagt? (20)

Basic concepts of curve fittings
Basic concepts of curve fittingsBasic concepts of curve fittings
Basic concepts of curve fittings
 
Matlab polynimials and curve fitting
Matlab polynimials and curve fittingMatlab polynimials and curve fitting
Matlab polynimials and curve fitting
 
Method of least square
Method of least squareMethod of least square
Method of least square
 
METHOD OF LEAST SQURE
METHOD OF LEAST SQUREMETHOD OF LEAST SQURE
METHOD OF LEAST SQURE
 
Non linear curve fitting
Non linear curve fitting Non linear curve fitting
Non linear curve fitting
 
Curve fitting
Curve fittingCurve fitting
Curve fitting
 
Regression analysis presentation
Regression analysis presentationRegression analysis presentation
Regression analysis presentation
 
Lesson 6 coefficient of determination
Lesson 6   coefficient of determinationLesson 6   coefficient of determination
Lesson 6 coefficient of determination
 
Curvefitting
CurvefittingCurvefitting
Curvefitting
 
Chapter7
Chapter7Chapter7
Chapter7
 
Chapter13
Chapter13Chapter13
Chapter13
 
Least Square Optimization and Sparse-Linear Solver
Least Square Optimization and Sparse-Linear SolverLeast Square Optimization and Sparse-Linear Solver
Least Square Optimization and Sparse-Linear Solver
 
numerical methods
numerical methodsnumerical methods
numerical methods
 
Least square method
Least square methodLeast square method
Least square method
 
Complex Variable & Numerical Method
Complex Variable & Numerical MethodComplex Variable & Numerical Method
Complex Variable & Numerical Method
 
Applied numerical methods lec6
Applied numerical methods lec6Applied numerical methods lec6
Applied numerical methods lec6
 
Chapter 5 Slope-Intercept Form
Chapter 5 Slope-Intercept FormChapter 5 Slope-Intercept Form
Chapter 5 Slope-Intercept Form
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statistics
 
Lu decomposition
Lu decompositionLu decomposition
Lu decomposition
 
Mathematics xii paper 13 with answer with value vased questions
Mathematics xii paper 13 with answer with value vased questionsMathematics xii paper 13 with answer with value vased questions
Mathematics xii paper 13 with answer with value vased questions
 

Ähnlich wie Linear Regression Ordinary Least Squares Distributed Calculation Example

Affine Yield Curves: Flexibility versus Incompleteness
Affine Yield Curves: Flexibility versus IncompletenessAffine Yield Curves: Flexibility versus Incompleteness
Affine Yield Curves: Flexibility versus IncompletenessDhia Eddine Barbouche
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Derbew Tesfa
 
Computer graphics LINE DRAWING algorithm.pptx
Computer graphics LINE DRAWING algorithm.pptxComputer graphics LINE DRAWING algorithm.pptx
Computer graphics LINE DRAWING algorithm.pptxR S Anu Prabha
 
دالة الاكسبونيشل الرياضية والفريدة من نوعها و كيفية استخدامهال
دالة الاكسبونيشل الرياضية والفريدة من نوعها و كيفية استخدامهالدالة الاكسبونيشل الرياضية والفريدة من نوعها و كيفية استخدامهال
دالة الاكسبونيشل الرياضية والفريدة من نوعها و كيفية استخدامهالzeeko4
 
conference_poster_5_UCSB
conference_poster_5_UCSBconference_poster_5_UCSB
conference_poster_5_UCSBXining Li
 
maths_formula_sheet.pdf
maths_formula_sheet.pdfmaths_formula_sheet.pdf
maths_formula_sheet.pdfVanhoaTran2
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)CrackDSE
 
C2 st lecture 2 handout
C2 st lecture 2 handoutC2 st lecture 2 handout
C2 st lecture 2 handoutfatima d
 
STA003_WK2_L.pptx
STA003_WK2_L.pptxSTA003_WK2_L.pptx
STA003_WK2_L.pptxMAmir23
 
DIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptxDIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptxOchiriaEliasonyait
 
STA003_WK2_L.pdf
STA003_WK2_L.pdfSTA003_WK2_L.pdf
STA003_WK2_L.pdfMAmir23
 
Straight-Line-Graphs-Final -2.pptx
Straight-Line-Graphs-Final -2.pptxStraight-Line-Graphs-Final -2.pptx
Straight-Line-Graphs-Final -2.pptxKviskvis
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
Bresenham derivation
Bresenham derivationBresenham derivation
Bresenham derivationKumar
 
Bresenham derivation
Bresenham derivationBresenham derivation
Bresenham derivationMuhammad Fiaz
 

Ähnlich wie Linear Regression Ordinary Least Squares Distributed Calculation Example (20)

Affine Yield Curves: Flexibility versus Incompleteness
Affine Yield Curves: Flexibility versus IncompletenessAffine Yield Curves: Flexibility versus Incompleteness
Affine Yield Curves: Flexibility versus Incompleteness
 
2_Simplex.pdf
2_Simplex.pdf2_Simplex.pdf
2_Simplex.pdf
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research
 
Computer graphics LINE DRAWING algorithm.pptx
Computer graphics LINE DRAWING algorithm.pptxComputer graphics LINE DRAWING algorithm.pptx
Computer graphics LINE DRAWING algorithm.pptx
 
دالة الاكسبونيشل الرياضية والفريدة من نوعها و كيفية استخدامهال
دالة الاكسبونيشل الرياضية والفريدة من نوعها و كيفية استخدامهالدالة الاكسبونيشل الرياضية والفريدة من نوعها و كيفية استخدامهال
دالة الاكسبونيشل الرياضية والفريدة من نوعها و كيفية استخدامهال
 
conference_poster_5_UCSB
conference_poster_5_UCSBconference_poster_5_UCSB
conference_poster_5_UCSB
 
maths_formula_sheet.pdf
maths_formula_sheet.pdfmaths_formula_sheet.pdf
maths_formula_sheet.pdf
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 
C2 st lecture 2 handout
C2 st lecture 2 handoutC2 st lecture 2 handout
C2 st lecture 2 handout
 
Ch02 6
Ch02 6Ch02 6
Ch02 6
 
Bonus math project
Bonus math projectBonus math project
Bonus math project
 
STA003_WK2_L.pptx
STA003_WK2_L.pptxSTA003_WK2_L.pptx
STA003_WK2_L.pptx
 
DIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptxDIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptx
 
STA003_WK2_L.pdf
STA003_WK2_L.pdfSTA003_WK2_L.pdf
STA003_WK2_L.pdf
 
05_AJMS_332_21.pdf
05_AJMS_332_21.pdf05_AJMS_332_21.pdf
05_AJMS_332_21.pdf
 
Lesson 7
Lesson 7Lesson 7
Lesson 7
 
Straight-Line-Graphs-Final -2.pptx
Straight-Line-Graphs-Final -2.pptxStraight-Line-Graphs-Final -2.pptx
Straight-Line-Graphs-Final -2.pptx
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Bresenham derivation
Bresenham derivationBresenham derivation
Bresenham derivation
 
Bresenham derivation
Bresenham derivationBresenham derivation
Bresenham derivation
 

Kürzlich hochgeladen

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Linear Regression Ordinary Least Squares Distributed Calculation Example

  • 1. Linear Regression – Ordinary Least Squares Distributed Calculation Example Author: Marjan Sterjev Linear regression is one of the most essential machine learning algorithms. It is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables X: x1 x2 x3...xn. The model is also known as trend line. If we can explain that relationship with simple linear equation in the form y= bn*xn +… + b2*x2+ b1*x1+ b0 than we can predict the value of y based on the X values substituted in that equation. For example consider that we have the following pairs of numbers (x,y): 0 3 1 16 2 24 3 37 4 44 5 56 Based on the provided example pairs (x,y), our task is to find linear equation y= b1*x1+ b0 that will match the above pairs as much as possible: b1 * 0 + b0 ~ 3 b1 * 1 + b0 ~ 16 b1 * 2 + b0 ~ 24 b1 * 3 + b0 ~ 37 b1 * 4 + b0 ~ 44 b1 * 5 + b0 ~ 56 The solution for the coefficients b1 and b0 shall minimize the overall squared error between linear equation predicted values and the real ones. Let's define the matrices X, B and Y: X B Y 0 1 b1 3 1 1 b0 16 2 1 24 3 1 37 4 1 44 5 1 56 1
  • 2. The matrix form of the conditions above is: X * B ~ Y The Ordinary Least Squares (https://en.wikipedia.org/wiki/Ordinary_least_squares) closed form solution for B is: B=(XT*X)-1 * XT*Y In R linear regression model coefficients can be calculated as: > X <- matrix(c(0,1,1,1,2,1,3,1,4,1,5,1),ncol=2, byrow=TRUE) > Y <- matrix(c(3,16,24,37,44,56), ncol=1, byrow=TRUE) > solve(t(X)%*%X, t(X)%*%Y) [,1] [1,] 10.342857 [2,] 4.142857 The linear regression coefficients are: b1=10.34 b0= 4.14 Based on the linear regression model we can calculate and predict value y for previously unseen x variable. For example if x=7 the predicted y value will be: 10.34*7+4.14=76.52 The problem arises if the number of pairs (x,y) is very large, several billions for example. The matrices X and Y will have several billions of rows too. Calculating the matrix products XT *X and XT*Y will be time and memory space consuming, i.e. single worker process shall store matrices X and Y in memory and execute billions of multiplications and additions. The natural question is if we can divide the job among several processes that will join their efforts and calculate XT *X and XT*Y in a distributed fashion. Let us split the above input pairs (x,y) into 3 chunks that will be processed by 3 different processes (the mappers): X1 Y1 0 1 3 1 1 16 X2 Y2 2 1 24 3 1 37 2
  • 3. X3 Y3 4 1 44 5 1 56 For each chunk the mapper will produce partial matrix products Xi T *Xi and Xi T * Yi (i=1,2,3). Map Input Map Output X1 T X1 Y1 X1 T*X1 X1 T*Y1 0 1 0 1 3 1 1 16 1 1 1 1 16 1 2 19 X2 T X2 Y2 X2 T*X2 X2 T*Y2 2 3 2 1 24 13 5 159 1 1 3 1 37 5 2 61 X3 T X3 Y3 X3 T*X3 X3 T*Y3 4 5 4 1 44 41 9 456 1 1 5 1 56 9 2 100 Note that the partial multiplication is executed with matrices that are small and that multiplication is fast. All partial matrix product results shall be collected by another process (the reducer) that will sum the partial matrices and reconstruct the same result as if the complete matrix cross products were produced by a single process. R1=Reduce Output R2=Reduce Output XT*X= X1 T*X1+X2 T*X2+X3 T*X3 XT*Y= X1 T*Y1+X2 T*Y2+X3 T*Y3 55 15 631 15 6 180 Once we have the reconstructed matrices XT*X and XT*Y, the solution is as simple as: (XT *X) *B= XT * Y B= (XT *X) -1* XT * Y = [10.34, 4.14] The approach described above is an example of Map-Reduce based linear regression model training that can be easily implemented on top of Apache Hadoop. The pairs of numbers can be stored into files (single line per pair). Once the model calculation starts, Hadoop file splitting mechanism will automatically delegate units of work to several map processes. The partial results distribution to the 3
  • 4. anchor reducer is also automatically handled by Hadoop. What is left to the developer is providing several lines of mapper/reducer code that will parse the input lines into (small) matrices and execute cross products and additions against those matrices. 4