SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy
Lesson 4 - The linear Regression Model: Theory and
Application - 23.01.2015
Introduction
In the past lessons we analyzed one variable.
For some reasons, it is even useful to analyze two or more variables
together.
The question we want to answer regards:
What are the relations and the causal effects between two or
more variables,
analyze the determinants of changes in a variable,
forecast or predict a variable for unknown n or t.
In symbols, the idea can be represented as follows:
Y = f (Y1, Y2, ...)
Y is the response, which is a function (it depends on) one or more
variables.
Objectives
All in all, the regression model is the instrument used to:
measure the entity of the relations between two or more
variables: Y / X,
and to measure the causal direction ( X −→ Y or
viceversa?)
forecast the value of the variable Y in response to some
changes in the others X1, X2, ... (called explanatories),
or for some cases that are not considered in the sample.
Simple linear regression model
The regression model is a stochastic model, which differ from a
deterministic one.
Giving two sets of values (two variables) from a random sample of
length n: x = {x1, x2, ..., xi , ..xn}; y = {y1, y2, ..., yi , ..yn}:
Deterministic formula:
yi = β1 + β2xi , ∀i = 1, .., n
Stochastic formula:
yi = β1 + β2xi + i ∀i = 1, .., n
where i is the stochastic component.
β2 define the slope in the relations between X and Y (See graph in
chart 1)
Simple linear regression model - 2
We need to find ˆβ = {ˆβ1, ˆβ2} as estimators of β1 and β2.
After β is estimated, we can draw the estimated regression line,
which corresponds to the estimated regression model, as
follow:
ˆyi = ˆβ1 + ˆβ2xi
Here, ˆi = ˆyi − yi .
Where ˆyi is the i-element of the estimated y vector, and yi is the
i-elements of the real y vector. (see graph in chart 2)
Empirical Steps in the Regression Analysis
1. Study of the relations (scatter plot, correlations) between two
or more variables.
2. Estimation of the parameters of the model ˆβ = {ˆβ1, ˆβ2}.
3. Hypothesis tests on the estimated ˆβ2 to verify the causal
effects of X over Y
4. Robustness checks on the estimated model.
5. Use of the model to analyse the causal effect and/or to do
forecasts/predictions.
Why linear?
It is simple to estimate, to analyse and to interpret.
It likely fits with most of empirical cases, in which the
relations between two phenomenon is linear (NOT REALLY
SURE OF IT!)1.
1
The real complex world is not linear in the relations: logit, probit, mixed
model, generalized additive models (GAM) are only some examples of non
linear more advanced models you will study on econometric classes.
Model Hypotesis
In order the OLS estimation of the model to be unbiased, certain
hypothesis must hold:
E( i ) = 0, ∀i −→ E(yi ) = β1 + β2xi
Omoschedasticity: V ( i ) = σ2
i = σ2, ∀i
Null covariance: Cov( i , j ) = 0, ∀i = j
Null covariance among residuals and explanatories:
Cov(xi , i ) = 0, ∀i
Normal assumption: i ∼ N(0, σ2)
Model Hypotesis - 2
From the hypotesis above, it follows that:
V (yi ) = σ2, ∀i. Y is stochastic only for the component.
Cov(yi , yj ) = 0, ∀i = j. Since the residuals are uncorrelated.
yi ∼ N[(β1 + β2x1), σ2]. Since the residuals are also normal in
shape.
Ordinary Least Squares (OLS) Estimation
The OLS is the estimation method used to estimate the vector β.
The method comes from the idea to minimize the values of the
residuals.
Since ei (ˆi ) = yi − ˆyi we are interested in minimize the component
ei = yi − ˆβ1 − ˆβ2xi .
N.B. i = β1 − β2xi , while ei = ˆβ1 − ˆβ2xi
The method consists in minimize the sum of the square
differences:
n
i (yi − ˆyi )2 = n
i e2
i = Min,
which is equal to solve the following two-equation-system derived
using derivatives.
Ordinary Least Squares (OLS) Estimation - 2
δ
δβ1
n
i
e2
i = 0 (1)
δ
δβ2
n
i
e2
i = 0 (2)
After some maths, we end up with this estimators for the vector
ˆβ:
ˆβ1 = ¯y − ˆβ2 ¯x (3)
ˆβ2 =
n
i (yi − ¯y)(xi − ¯x)
n
i (xi − ¯x)2
(4)
OLS estimators
OLS ˆβ1 and ˆβ2 are stochastic estimators (they are part of a
distribution. They belong to the sample space of all the
possible estimators defined with different samples)
ˆβ2: measure the estimated variation in Y determined by a
unitary variation in X (δY /δX)
The OLS estimators are both unbiased (E(ˆβ1) = β1 and
E(ˆβ2) = β2),
and they are BLUE (corrects and with the lowest variance,
furthermore, they are constructed on the full sample).
Linear dependency index (R2
)
The R2 index is the most used measure to evaluate the linear
fitting of the model.
R2 is confined in the boundary [0, 1], where, values near to 1
means that the explanatories are properly describing the changes in
Y (the model is well defined).
How R2 is constructed:
SQT = SQR + SQE, or
n
i (yi − ¯y)2 = n
i (ˆyi − ¯y)2 + n
i (yi − ˆyi )2, or
total variation = model variation + residual variation
The R2 is defined as SQR
SQT or 1 − SQE
SQT . Or, equivalent:
R2 =
n
i (ˆyi −¯y)2
n
i (yi −¯y)2
Hypotesis testing on β2
The hypothesis test for the slope parameter is really similar to the
tests for the mean parameter. The estimated slope parameter β2 is
stochastic. It distributes as a normal variable, when the sample is
large:
ˆβ2 ∼ N[β2, σ2/SSx]
We can make use of the hypothesis tests approach to investigate
on the causal relation between Y and X:
H0 : β2 = 0; H1 : β2 = 0,
where, alternative hypothesis means causal relation. The test
is:
z =
ˆβ2−β2√
σ2/SSx)
∼ N(0, 1).
Since SSx is, generally, unknown, we estimate it as :
ˆSSx = n
i (xi − ¯x)2, and we use t − test with n − 1 degrees of
freedom (in case n is small).
Prediction within the regression model
The question we want to answer is the following: Which is the
expected value of y (say yn+1), for a certain observation that is not
in the sample?
Suppose we have, for that observation, the value for the variable X
(say xn+1)
We make use of the estimated β to estimate ˆyn+1 as:
ˆyn+1 = ˆβ1 + ˆβ2xn+1
Model Checking
Several methods are used to test for the robustness of the model,
most of them based on the stochastic part of the model (the
estimated residuals).
Graphical (at eye) checks: To plot the residuals versus the
fitted values (residual hypothesis)
qq-plot and Shapiro-Wilk test for normality
Durbin-Watson test for residual correlation
Breusch-Pagan test for residual heteroschedasticity.
Moreover, the Leverage is used to evaluate the contribution of each
observation in determining the estimated coefficients β.
The Stepwise procedure is used to choice between different model
specifications, in other words, to remove the explanatories which
are not significant.
Model Checking using estimated residuals - Linearity
An example of departure from the linearity assumption. In this
case we can draw a curve (not a horizontal line) to interpolate the
points.
Figure: residuals (Y) versus estimated (X) values
Model Checking using estimated residuals -
Omoscedasticity
An example of departure from the omoschedasticity assumption. In
this picture the estimated residuals increases as the predicted
values increases.
Figure: residuals (Y) versus estimated (X) values
Model Checking using estimated residuals - Normality
An example of departure from the normality assumption. Here the
qq-points do not lie into the the qq-line bounds.
Figure: residuals (Y) versus estimated (X) values
Model Checking using estimated residuals - Serial
correlation
An example of departure from the assumption of no serial
correlation of residuals: the residual at i depends on the value at
i − 1
Homeworks
1. Using cement data (n = 13), determine the β1 and β2
coefficients manually, using OLS formula at page 11, of the
model y = β1 + β2x1
2. Using cement data, estimate the R2 index of the model
y = β1 + β2x1, using formula at page 13.
Charts - 1
Figure: Slope coefficient in the linear model
Charts - 2
Figure: Fitted (line) versus real (points) values

Weitere ähnliche Inhalte

Was ist angesagt?

Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tearsAnkit Sharma
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLKumud Arora
 
Regression analysis by akanksha Bali
Regression analysis by akanksha BaliRegression analysis by akanksha Bali
Regression analysis by akanksha BaliAkanksha Bali
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regressionnaveedaliabad
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regressionMaria Theresa
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsTransweb Global Inc
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliAkanksha Bali
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regressiondessybudiyanti
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24Ruru Chowdhury
 

Was ist angesagt? (20)

Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tears
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Simple Regression
Simple RegressionSimple Regression
Simple Regression
 
04 regression
04 regression04 regression
04 regression
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in ML
 
Chi square using excel
Chi square using excelChi square using excel
Chi square using excel
 
Reg
RegReg
Reg
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Regression analysis by akanksha Bali
Regression analysis by akanksha BaliRegression analysis by akanksha Bali
Regression analysis by akanksha Bali
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regression
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Chapter13
Chapter13Chapter13
Chapter13
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | Eonomics
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
 
Chap11 simple regression
Chap11 simple regressionChap11 simple regression
Chap11 simple regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24
 

Ähnlich wie Talk 4

REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 
Multivariate reg analysis
Multivariate reg analysisMultivariate reg analysis
Multivariate reg analysisIrfan Hussain
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download herekeerthanakshatriya20
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxDevendraRavindraPati
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxPatilDevendra5
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysisNimrita Koul
 
regression analysis .ppt
regression analysis .pptregression analysis .ppt
regression analysis .pptTapanKumarDash3
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Japheth Muthama
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JMJapheth Muthama
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptAdnanAli861711
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.pptbranlymbunga1
 

Ähnlich wie Talk 4 (20)

REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
Chapter 14 Part Ii
Chapter 14 Part IiChapter 14 Part Ii
Chapter 14 Part Ii
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 
Multivariate reg analysis
Multivariate reg analysisMultivariate reg analysis
Multivariate reg analysis
 
Econometrics- lecture 10 and 11
Econometrics- lecture 10 and 11Econometrics- lecture 10 and 11
Econometrics- lecture 10 and 11
 
Seattle.Slides.7
Seattle.Slides.7Seattle.Slides.7
Seattle.Slides.7
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download here
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Regression
RegressionRegression
Regression
 
regression analysis .ppt
regression analysis .pptregression analysis .ppt
regression analysis .ppt
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
Regression Analysis by Muthama JM
Regression Analysis by Muthama JM Regression Analysis by Muthama JM
Regression Analysis by Muthama JM
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JM
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.ppt
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 

Mehr von University of Salerno

Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataUniversity of Salerno
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2University of Salerno
 
A strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataA strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataUniversity of Salerno
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...University of Salerno
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSUniversity of Salerno
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataUniversity of Salerno
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team PerformanceUniversity of Salerno
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...University of Salerno
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...University of Salerno
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...University of Salerno
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...University of Salerno
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramUniversity of Salerno
 

Mehr von University of Salerno (20)

Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large data
 
Regression models for panel data
Regression models for panel dataRegression models for panel data
Regression models for panel data
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
A strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataA strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census data
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORS
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone data
 
Poster venezia
Poster veneziaPoster venezia
Poster venezia
 
Metulini280818 iasi
Metulini280818 iasiMetulini280818 iasi
Metulini280818 iasi
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
 
Big Data Analytics for Smart Cities
Big Data Analytics for Smart CitiesBig Data Analytics for Smart Cities
Big Data Analytics for Smart Cities
 
Meeting progetto ode_sm_rm
Meeting progetto ode_sm_rmMeeting progetto ode_sm_rm
Meeting progetto ode_sm_rm
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
 
Metulini1503
Metulini1503Metulini1503
Metulini1503
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
 
The Global Virtual Water Network
The Global Virtual Water NetworkThe Global Virtual Water Network
The Global Virtual Water Network
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with Kriskogram
 
Ad b 1702_metu_v2
Ad b 1702_metu_v2Ad b 1702_metu_v2
Ad b 1702_metu_v2
 

Kürzlich hochgeladen

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 

Talk 4

  • 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Lesson 4 - The linear Regression Model: Theory and Application - 23.01.2015
  • 2. Introduction In the past lessons we analyzed one variable. For some reasons, it is even useful to analyze two or more variables together. The question we want to answer regards: What are the relations and the causal effects between two or more variables, analyze the determinants of changes in a variable, forecast or predict a variable for unknown n or t. In symbols, the idea can be represented as follows: Y = f (Y1, Y2, ...) Y is the response, which is a function (it depends on) one or more variables.
  • 3. Objectives All in all, the regression model is the instrument used to: measure the entity of the relations between two or more variables: Y / X, and to measure the causal direction ( X −→ Y or viceversa?) forecast the value of the variable Y in response to some changes in the others X1, X2, ... (called explanatories), or for some cases that are not considered in the sample.
  • 4. Simple linear regression model The regression model is a stochastic model, which differ from a deterministic one. Giving two sets of values (two variables) from a random sample of length n: x = {x1, x2, ..., xi , ..xn}; y = {y1, y2, ..., yi , ..yn}: Deterministic formula: yi = β1 + β2xi , ∀i = 1, .., n Stochastic formula: yi = β1 + β2xi + i ∀i = 1, .., n where i is the stochastic component. β2 define the slope in the relations between X and Y (See graph in chart 1)
  • 5. Simple linear regression model - 2 We need to find ˆβ = {ˆβ1, ˆβ2} as estimators of β1 and β2. After β is estimated, we can draw the estimated regression line, which corresponds to the estimated regression model, as follow: ˆyi = ˆβ1 + ˆβ2xi Here, ˆi = ˆyi − yi . Where ˆyi is the i-element of the estimated y vector, and yi is the i-elements of the real y vector. (see graph in chart 2)
  • 6. Empirical Steps in the Regression Analysis 1. Study of the relations (scatter plot, correlations) between two or more variables. 2. Estimation of the parameters of the model ˆβ = {ˆβ1, ˆβ2}. 3. Hypothesis tests on the estimated ˆβ2 to verify the causal effects of X over Y 4. Robustness checks on the estimated model. 5. Use of the model to analyse the causal effect and/or to do forecasts/predictions.
  • 7. Why linear? It is simple to estimate, to analyse and to interpret. It likely fits with most of empirical cases, in which the relations between two phenomenon is linear (NOT REALLY SURE OF IT!)1. 1 The real complex world is not linear in the relations: logit, probit, mixed model, generalized additive models (GAM) are only some examples of non linear more advanced models you will study on econometric classes.
  • 8. Model Hypotesis In order the OLS estimation of the model to be unbiased, certain hypothesis must hold: E( i ) = 0, ∀i −→ E(yi ) = β1 + β2xi Omoschedasticity: V ( i ) = σ2 i = σ2, ∀i Null covariance: Cov( i , j ) = 0, ∀i = j Null covariance among residuals and explanatories: Cov(xi , i ) = 0, ∀i Normal assumption: i ∼ N(0, σ2)
  • 9. Model Hypotesis - 2 From the hypotesis above, it follows that: V (yi ) = σ2, ∀i. Y is stochastic only for the component. Cov(yi , yj ) = 0, ∀i = j. Since the residuals are uncorrelated. yi ∼ N[(β1 + β2x1), σ2]. Since the residuals are also normal in shape.
  • 10. Ordinary Least Squares (OLS) Estimation The OLS is the estimation method used to estimate the vector β. The method comes from the idea to minimize the values of the residuals. Since ei (ˆi ) = yi − ˆyi we are interested in minimize the component ei = yi − ˆβ1 − ˆβ2xi . N.B. i = β1 − β2xi , while ei = ˆβ1 − ˆβ2xi The method consists in minimize the sum of the square differences: n i (yi − ˆyi )2 = n i e2 i = Min, which is equal to solve the following two-equation-system derived using derivatives.
  • 11. Ordinary Least Squares (OLS) Estimation - 2 δ δβ1 n i e2 i = 0 (1) δ δβ2 n i e2 i = 0 (2) After some maths, we end up with this estimators for the vector ˆβ: ˆβ1 = ¯y − ˆβ2 ¯x (3) ˆβ2 = n i (yi − ¯y)(xi − ¯x) n i (xi − ¯x)2 (4)
  • 12. OLS estimators OLS ˆβ1 and ˆβ2 are stochastic estimators (they are part of a distribution. They belong to the sample space of all the possible estimators defined with different samples) ˆβ2: measure the estimated variation in Y determined by a unitary variation in X (δY /δX) The OLS estimators are both unbiased (E(ˆβ1) = β1 and E(ˆβ2) = β2), and they are BLUE (corrects and with the lowest variance, furthermore, they are constructed on the full sample).
  • 13. Linear dependency index (R2 ) The R2 index is the most used measure to evaluate the linear fitting of the model. R2 is confined in the boundary [0, 1], where, values near to 1 means that the explanatories are properly describing the changes in Y (the model is well defined). How R2 is constructed: SQT = SQR + SQE, or n i (yi − ¯y)2 = n i (ˆyi − ¯y)2 + n i (yi − ˆyi )2, or total variation = model variation + residual variation The R2 is defined as SQR SQT or 1 − SQE SQT . Or, equivalent: R2 = n i (ˆyi −¯y)2 n i (yi −¯y)2
  • 14. Hypotesis testing on β2 The hypothesis test for the slope parameter is really similar to the tests for the mean parameter. The estimated slope parameter β2 is stochastic. It distributes as a normal variable, when the sample is large: ˆβ2 ∼ N[β2, σ2/SSx] We can make use of the hypothesis tests approach to investigate on the causal relation between Y and X: H0 : β2 = 0; H1 : β2 = 0, where, alternative hypothesis means causal relation. The test is: z = ˆβ2−β2√ σ2/SSx) ∼ N(0, 1). Since SSx is, generally, unknown, we estimate it as : ˆSSx = n i (xi − ¯x)2, and we use t − test with n − 1 degrees of freedom (in case n is small).
  • 15. Prediction within the regression model The question we want to answer is the following: Which is the expected value of y (say yn+1), for a certain observation that is not in the sample? Suppose we have, for that observation, the value for the variable X (say xn+1) We make use of the estimated β to estimate ˆyn+1 as: ˆyn+1 = ˆβ1 + ˆβ2xn+1
  • 16. Model Checking Several methods are used to test for the robustness of the model, most of them based on the stochastic part of the model (the estimated residuals). Graphical (at eye) checks: To plot the residuals versus the fitted values (residual hypothesis) qq-plot and Shapiro-Wilk test for normality Durbin-Watson test for residual correlation Breusch-Pagan test for residual heteroschedasticity. Moreover, the Leverage is used to evaluate the contribution of each observation in determining the estimated coefficients β. The Stepwise procedure is used to choice between different model specifications, in other words, to remove the explanatories which are not significant.
  • 17. Model Checking using estimated residuals - Linearity An example of departure from the linearity assumption. In this case we can draw a curve (not a horizontal line) to interpolate the points. Figure: residuals (Y) versus estimated (X) values
  • 18. Model Checking using estimated residuals - Omoscedasticity An example of departure from the omoschedasticity assumption. In this picture the estimated residuals increases as the predicted values increases. Figure: residuals (Y) versus estimated (X) values
  • 19. Model Checking using estimated residuals - Normality An example of departure from the normality assumption. Here the qq-points do not lie into the the qq-line bounds. Figure: residuals (Y) versus estimated (X) values
  • 20. Model Checking using estimated residuals - Serial correlation An example of departure from the assumption of no serial correlation of residuals: the residual at i depends on the value at i − 1
  • 21. Homeworks 1. Using cement data (n = 13), determine the β1 and β2 coefficients manually, using OLS formula at page 11, of the model y = β1 + β2x1 2. Using cement data, estimate the R2 index of the model y = β1 + β2x1, using formula at page 13.
  • 22. Charts - 1 Figure: Slope coefficient in the linear model
  • 23. Charts - 2 Figure: Fitted (line) versus real (points) values