SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy

Lesson 4 - The linear Regression Model: Theory and
Application - 21.01.2014
Introduction

In the past praticals we analyzed one variable.
For certain reasons, it is even usefull to analyze two or more
variables together.
The question we want to asnwer regards what are the relations,
the causal effects determining changes in a variable. Analyze if a
certain phenomenon is endogenous or exogenous.
In symbols, the idea can be represent as follow:
y = f (x1 , x2 , ...)
Y is the response, which is a function (it depends on) one or more
variables.
Objectives

All in all, the regression model is the instrument used to:
measure the entity of the relations between two or more
variables: Y / X ,
and to measure the causal direction ( X −→
viceversa? )

Y or

forecast the value of the variable Y in response to some
changes in the others X1 , X2 , ... (called explanatories),
or for some cases that are not considered in the sample.
Simple linear regression model
The regression model is stochastic, not deterministic.
Giving two sets of values (two variables) from a random sample of
length n: x = {x1 , x2 , ..., xi , ..xn }; y = {y1 , y2 , ..., yi , ..yn }:
Deterministic formula:
yi = β0 + β1 xi , ∀i = 1, .., n
Stochastic formula:
yi = β0 + β1 xi +
where

i

i

∀i = 1, .., n

is the stochastic component.

β1 define the slope in the relations between X and Y (See graph in
chart 1)
Simple linear regression model - 2

ˆ
ˆ ˆ
We need to find β = {β0 , β1 } as estimators of β0 and β1 .
After β is estimated, we can draw the estimated regression line,
which corresponds to the estimated regression model, as
follow:
ˆ
ˆ
yi = β0 + β1 xi
ˆ
Here, ˆi = yi − yi .
ˆ
Where yi is the i-element of the estimated Y vector, and yi is the
ˆ
i-elements of the real Y vector. (see graph in chart 2)
Steps in the Analysis

1. Study the relations (scatterplot, correlations) between two or
more variables.
ˆ
ˆ ˆ
2. Estimation of the parameters of the model β = {β0 , β1 }.
ˆ
3. Hypotesis tests on the estimated β1 to verify the casual
effects between X and Y
4. Robustness check of the model.
5. Use the model to analyze the causal effect and/or to do
forecasting.
Why linear?

It is simple to estimate, to analyze and to interpret
it likely fits with most of empirical cases, in which the
relations between two phenomenon is linear.
There are a lot of implemented methods to transorm variables
in order to obtain a linear relationship (log transformation,
normalization, etc.. )
Model Hypotesis

In order the estimation and the utilization of the model to be
correct, certain hypotesis must hold:
E ( i ) = 0, ∀i −→ E (yi ) = β0 + β1 xi
Omoschedasticity: V ( i ) = σi2 = σ 2 , ∀i
Null covariance: Cov ( i , j ) = 0, ∀i = j
Null covariance among residuals and explanatories:
Cov (xi , i ) = 0, ∀i, since X is deterministic (known)
Normal assumption:

i

∼ N(0, σ 2 )
Model Hypotesis - 2

From the hypotesis above, follow that:
V (yi ) = σ 2 , ∀i. Y is stochastic only for the

component.

Cov (yi , yj ) = 0, ∀i = j. Since the residuals are uncorrelated.
yi ∼ N[(β0 + β1 x1 ), σ 2 ] Since also the residuals are normal in
shape.
Ordinary Least Squares (OLS) Estimation

The OLS is the estimation method used to estimate the vector β.
The idea is to minimize the value of the residuals.
Since ei = yi − yi we are interested in minimize the component
ˆ
ˆ
ˆ
yi − β0 − β1 xi .
N.B.

i

ˆ
ˆ
= β0 − β1 xi , while ei = β0 − β1 xi

The method consist in minimize the sum of the square
differences:
n
i (yi

− yi )2 =
ˆ

n 2
i ei

= Min,

which is equal to solve this 2 equation system derived using
derivates.
Ordinary Least Squares (OLS) Estimation - 2

n

ei2 = 0

(1)

ei2 = 0

δ/δβ0

(2)

i
n

δ/δβ1
i

After some arithmetics, we end up with this estimators for the
vector β:

β0 = y − β1 x
¯ ˆ ¯
n
¯
¯
i (yi − y )(xi − x )
β1 =
n
2
¯
i (xi − x )

(3)
(4)
OLS estimators

ˆ
ˆ
OLS β0 and β1 are stochastic estimators (they have a
distribution in a sample space of all the possible estimtors
define with different samples)
ˆ
β1 : measure the estimated variation in Y determined by a
unitary variation in X (δY /δX )
ˆ
The OLS estimators are correct (E (β1 ) = β1 ),
and they are BLUE (corrects and with the lowest variance)
Linear dependency index (R 2 )
The R 2 index is the most used index to measure the linear fitting
of the model.
R 2 is confined in the boundary [−1, 1], where, values near to 1 (or
-1) means the explanatories are usefull to describe the changes in
Y.
Let define
SQT = SQR + SQE , or
n
i (yi

− y )2 =
¯

n
y
i (ˆi

The R 2 is defined as
R2 =

n
y y 2
i (ˆi −¯)
n
y 2
i (yi −¯)

− y )2 +
¯

SQR
SQT

or 1 −

n
i (yi

− y i )2
ˆ

SQE
SQT .

Or, equivalent:
Hypotesis testing on β1
The estimated slope parameter β1 is stochastic. It distributes as a
gaussian:
ˆ
β1 ∼ N[β1 , σ 2 /SSx]
We can make use of the hypotesis tests approach to investigate on
the causal relation between Y and X :
H0 : β1 = 0
H1 : β1 = 0,
where, alternative hypotesis mean causal relation.
The test is:
z=

ˆ
β1 −β1
sqrt(σ 2 /SSx)

∼ N(0, 1).

When SSx is unknown, we estimate it as : SSx =
and we use t − test with n − 1 degrees of freedom

n
i (xi

− y )2 ,
¯
Forecasting within the regresion model

The question we want to answer is the following: Which is the
expected value of Y (say yn+1 ), for a certain observation that is
not in the sample?.
Suppose we have, for that observation, the value for the variable X
(say xn+1 )
We make use of the estimated β to determine:
ˆ
ˆ
yn+1 = β0 + β1 xn+1
ˆ
Model Checking
Several methods are used to test the robustness of the model,
most of them based on the stochastic part of the the model: the
estimated residuals.
Graphical checks: Plot residuals versus fitted values
qq-plot for the normality
Shapiro wilk test for normality
Durbin-Watson test for serial correlation
Breusch-Pagan test for heteroschedasticity
Moreover, the leverage is used to evaluate th importance of each
observation in determining the estimated coefficients β.
The Stepwise procedure is used to choice between different model
specifications.
Model Checking using estimated residuals - Linearity
An example of departure from the linearity assumption: we can
draw a curve (not a horizontal line) to interpolate the points

Figure: residuals (Y) versus estimated (X) values
Model Checking using estimated residuals Omoscedasticity
An example of departure from the omoschedasticity assumption
(the estimated residuals increases as the predicted values
increase)
Model Checking using estimated residuals - Normality
An example of departure from the normality assumption: the
qq-points do not follow the qq-line

Figure: residuals (Y) versus estimated (X) values
Model Checking using estimated residuals - Serial
correlation
An example of departure from the serial incorrelation assumption:
the residual at i depend on the value at i − 1
Homeworks

1. Using cement data (n = 13), determine the β0 and β1
coefficients manually, using OLS formula at page 11, of the
model y = β0 + β1 x1
2. Using cement data, estimate the R 2 index of the model
y = β0 + β1 x1 , using formula at page 13.
Charts - 1

Figure: Slope coefficient in the linear model
Charts - 2

Figure: Fitted (line) versus real (points) values

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Correspondence analysis final
Correspondence analysis finalCorrespondence analysis final
Correspondence analysis final
 
Linear regression theory
Linear regression theoryLinear regression theory
Linear regression theory
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Presentation on regression analysis
Presentation on regression analysisPresentation on regression analysis
Presentation on regression analysis
 
Regression
RegressionRegression
Regression
 
Regression
RegressionRegression
Regression
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
 
Ols by hiron
Ols by hironOls by hiron
Ols by hiron
 
Regression
RegressionRegression
Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
SPSS
SPSSSPSS
SPSS
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 

Andere mochten auch

Careers in botany
Careers in botanyCareers in botany
Careers in botanyentranzz123
 
Application of Regression Analysis: Model Building and Validation
Application of Regression Analysis: Model Building and Validation  Application of Regression Analysis: Model Building and Validation
Application of Regression Analysis: Model Building and Validation Kalaivanan Murthy
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in tradingAashay Harlalka
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
Multiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression AnalysisMinha Hwang
 
Role of Statistics in Scientific Research
Role of Statistics in Scientific ResearchRole of Statistics in Scientific Research
Role of Statistics in Scientific ResearchVaruna Harshana
 

Andere mochten auch (9)

Careers in botany
Careers in botanyCareers in botany
Careers in botany
 
Science
ScienceScience
Science
 
Tutorialgroups
TutorialgroupsTutorialgroups
Tutorialgroups
 
Application of Regression Analysis: Model Building and Validation
Application of Regression Analysis: Model Building and Validation  Application of Regression Analysis: Model Building and Validation
Application of Regression Analysis: Model Building and Validation
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in trading
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Multiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression Analysis
 
Chap12 simple regression
Chap12 simple regressionChap12 simple regression
Chap12 simple regression
 
Role of Statistics in Scientific Research
Role of Statistics in Scientific ResearchRole of Statistics in Scientific Research
Role of Statistics in Scientific Research
 

Ähnlich wie Linear Regression Model Explained

Ähnlich wie Linear Regression Model Explained (20)

Talk 4
Talk 4Talk 4
Talk 4
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download here
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 
Reg
RegReg
Reg
 
Multivariate reg analysis
Multivariate reg analysisMultivariate reg analysis
Multivariate reg analysis
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Chapter 14 Part Ii
Chapter 14 Part IiChapter 14 Part Ii
Chapter 14 Part Ii
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
regression analysis .ppt
regression analysis .pptregression analysis .ppt
regression analysis .ppt
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
Regression
RegressionRegression
Regression
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.ppt
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 

Mehr von University of Salerno

Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataUniversity of Salerno
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2University of Salerno
 
A strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataA strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataUniversity of Salerno
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...University of Salerno
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSUniversity of Salerno
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataUniversity of Salerno
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team PerformanceUniversity of Salerno
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...University of Salerno
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...University of Salerno
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...University of Salerno
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...University of Salerno
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramUniversity of Salerno
 

Mehr von University of Salerno (20)

Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large data
 
Regression models for panel data
Regression models for panel dataRegression models for panel data
Regression models for panel data
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
A strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataA strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census data
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORS
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone data
 
Poster venezia
Poster veneziaPoster venezia
Poster venezia
 
Metulini280818 iasi
Metulini280818 iasiMetulini280818 iasi
Metulini280818 iasi
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
 
Big Data Analytics for Smart Cities
Big Data Analytics for Smart CitiesBig Data Analytics for Smart Cities
Big Data Analytics for Smart Cities
 
Meeting progetto ode_sm_rm
Meeting progetto ode_sm_rmMeeting progetto ode_sm_rm
Meeting progetto ode_sm_rm
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
 
Metulini1503
Metulini1503Metulini1503
Metulini1503
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
 
The Global Virtual Water Network
The Global Virtual Water NetworkThe Global Virtual Water Network
The Global Virtual Water Network
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with Kriskogram
 
Ad b 1702_metu_v2
Ad b 1702_metu_v2Ad b 1702_metu_v2
Ad b 1702_metu_v2
 

Kürzlich hochgeladen

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Kürzlich hochgeladen (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

Linear Regression Model Explained

  • 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Lesson 4 - The linear Regression Model: Theory and Application - 21.01.2014
  • 2. Introduction In the past praticals we analyzed one variable. For certain reasons, it is even usefull to analyze two or more variables together. The question we want to asnwer regards what are the relations, the causal effects determining changes in a variable. Analyze if a certain phenomenon is endogenous or exogenous. In symbols, the idea can be represent as follow: y = f (x1 , x2 , ...) Y is the response, which is a function (it depends on) one or more variables.
  • 3. Objectives All in all, the regression model is the instrument used to: measure the entity of the relations between two or more variables: Y / X , and to measure the causal direction ( X −→ viceversa? ) Y or forecast the value of the variable Y in response to some changes in the others X1 , X2 , ... (called explanatories), or for some cases that are not considered in the sample.
  • 4. Simple linear regression model The regression model is stochastic, not deterministic. Giving two sets of values (two variables) from a random sample of length n: x = {x1 , x2 , ..., xi , ..xn }; y = {y1 , y2 , ..., yi , ..yn }: Deterministic formula: yi = β0 + β1 xi , ∀i = 1, .., n Stochastic formula: yi = β0 + β1 xi + where i i ∀i = 1, .., n is the stochastic component. β1 define the slope in the relations between X and Y (See graph in chart 1)
  • 5. Simple linear regression model - 2 ˆ ˆ ˆ We need to find β = {β0 , β1 } as estimators of β0 and β1 . After β is estimated, we can draw the estimated regression line, which corresponds to the estimated regression model, as follow: ˆ ˆ yi = β0 + β1 xi ˆ Here, ˆi = yi − yi . ˆ Where yi is the i-element of the estimated Y vector, and yi is the ˆ i-elements of the real Y vector. (see graph in chart 2)
  • 6. Steps in the Analysis 1. Study the relations (scatterplot, correlations) between two or more variables. ˆ ˆ ˆ 2. Estimation of the parameters of the model β = {β0 , β1 }. ˆ 3. Hypotesis tests on the estimated β1 to verify the casual effects between X and Y 4. Robustness check of the model. 5. Use the model to analyze the causal effect and/or to do forecasting.
  • 7. Why linear? It is simple to estimate, to analyze and to interpret it likely fits with most of empirical cases, in which the relations between two phenomenon is linear. There are a lot of implemented methods to transorm variables in order to obtain a linear relationship (log transformation, normalization, etc.. )
  • 8. Model Hypotesis In order the estimation and the utilization of the model to be correct, certain hypotesis must hold: E ( i ) = 0, ∀i −→ E (yi ) = β0 + β1 xi Omoschedasticity: V ( i ) = σi2 = σ 2 , ∀i Null covariance: Cov ( i , j ) = 0, ∀i = j Null covariance among residuals and explanatories: Cov (xi , i ) = 0, ∀i, since X is deterministic (known) Normal assumption: i ∼ N(0, σ 2 )
  • 9. Model Hypotesis - 2 From the hypotesis above, follow that: V (yi ) = σ 2 , ∀i. Y is stochastic only for the component. Cov (yi , yj ) = 0, ∀i = j. Since the residuals are uncorrelated. yi ∼ N[(β0 + β1 x1 ), σ 2 ] Since also the residuals are normal in shape.
  • 10. Ordinary Least Squares (OLS) Estimation The OLS is the estimation method used to estimate the vector β. The idea is to minimize the value of the residuals. Since ei = yi − yi we are interested in minimize the component ˆ ˆ ˆ yi − β0 − β1 xi . N.B. i ˆ ˆ = β0 − β1 xi , while ei = β0 − β1 xi The method consist in minimize the sum of the square differences: n i (yi − yi )2 = ˆ n 2 i ei = Min, which is equal to solve this 2 equation system derived using derivates.
  • 11. Ordinary Least Squares (OLS) Estimation - 2 n ei2 = 0 (1) ei2 = 0 δ/δβ0 (2) i n δ/δβ1 i After some arithmetics, we end up with this estimators for the vector β: β0 = y − β1 x ¯ ˆ ¯ n ¯ ¯ i (yi − y )(xi − x ) β1 = n 2 ¯ i (xi − x ) (3) (4)
  • 12. OLS estimators ˆ ˆ OLS β0 and β1 are stochastic estimators (they have a distribution in a sample space of all the possible estimtors define with different samples) ˆ β1 : measure the estimated variation in Y determined by a unitary variation in X (δY /δX ) ˆ The OLS estimators are correct (E (β1 ) = β1 ), and they are BLUE (corrects and with the lowest variance)
  • 13. Linear dependency index (R 2 ) The R 2 index is the most used index to measure the linear fitting of the model. R 2 is confined in the boundary [−1, 1], where, values near to 1 (or -1) means the explanatories are usefull to describe the changes in Y. Let define SQT = SQR + SQE , or n i (yi − y )2 = ¯ n y i (ˆi The R 2 is defined as R2 = n y y 2 i (ˆi −¯) n y 2 i (yi −¯) − y )2 + ¯ SQR SQT or 1 − n i (yi − y i )2 ˆ SQE SQT . Or, equivalent:
  • 14. Hypotesis testing on β1 The estimated slope parameter β1 is stochastic. It distributes as a gaussian: ˆ β1 ∼ N[β1 , σ 2 /SSx] We can make use of the hypotesis tests approach to investigate on the causal relation between Y and X : H0 : β1 = 0 H1 : β1 = 0, where, alternative hypotesis mean causal relation. The test is: z= ˆ β1 −β1 sqrt(σ 2 /SSx) ∼ N(0, 1). When SSx is unknown, we estimate it as : SSx = and we use t − test with n − 1 degrees of freedom n i (xi − y )2 , ¯
  • 15. Forecasting within the regresion model The question we want to answer is the following: Which is the expected value of Y (say yn+1 ), for a certain observation that is not in the sample?. Suppose we have, for that observation, the value for the variable X (say xn+1 ) We make use of the estimated β to determine: ˆ ˆ yn+1 = β0 + β1 xn+1 ˆ
  • 16. Model Checking Several methods are used to test the robustness of the model, most of them based on the stochastic part of the the model: the estimated residuals. Graphical checks: Plot residuals versus fitted values qq-plot for the normality Shapiro wilk test for normality Durbin-Watson test for serial correlation Breusch-Pagan test for heteroschedasticity Moreover, the leverage is used to evaluate th importance of each observation in determining the estimated coefficients β. The Stepwise procedure is used to choice between different model specifications.
  • 17. Model Checking using estimated residuals - Linearity An example of departure from the linearity assumption: we can draw a curve (not a horizontal line) to interpolate the points Figure: residuals (Y) versus estimated (X) values
  • 18. Model Checking using estimated residuals Omoscedasticity An example of departure from the omoschedasticity assumption (the estimated residuals increases as the predicted values increase)
  • 19. Model Checking using estimated residuals - Normality An example of departure from the normality assumption: the qq-points do not follow the qq-line Figure: residuals (Y) versus estimated (X) values
  • 20. Model Checking using estimated residuals - Serial correlation An example of departure from the serial incorrelation assumption: the residual at i depend on the value at i − 1
  • 21. Homeworks 1. Using cement data (n = 13), determine the β0 and β1 coefficients manually, using OLS formula at page 11, of the model y = β0 + β1 x1 2. Using cement data, estimate the R 2 index of the model y = β0 + β1 x1 , using formula at page 13.
  • 22. Charts - 1 Figure: Slope coefficient in the linear model
  • 23. Charts - 2 Figure: Fitted (line) versus real (points) values