Econometric in application

Econometrics

ASSIGNMENT
ON
‘Multiple Regression Analysis’

Prepared For:-
Dr. Md. Kamal Uddin
Professor
Department of International Business
University of Dhaka

Prepared By:-
Hazera Akter
Roll No: 01
8th Semester , BBA 1St Batch
Department Of International Business
University of Dhaka

Date of Submission
7th April , 2012

Assignment Topic
‘Multiple Regression Analysis with Test of
Heteroskedasticity, Autocorrelation and Multicollinearity ’

Table of Contents

Topics Page No.

Analysis Summery

Data Set

ANALYSIS SUMMARY

2

In multiple regression analysis, we study the relationship between an explained
variable and a number of explanatory variables. In this Assignment, the current
salary structure has been analyzed with the effects of some influential factors for
setting salary. The purpose of this analysis includes,

Cause analysis: Learn more about the relationship between several independent
variables and a dependent variable.

Impact analysis: Assess the impact of changing an independent variable to the
value of dependent variable.

Time series analysis: Predict values of a time series, using either previous values
of just that one series, or values from other series as well.

In the detailed analysis of the Multiple Regression, The Interpretation incudes,

• Considering the R2 (0.491) value ,we can infer that for overall estimation
this model is not strong.

• The model for Salary estimation for Employee of Coca-Cola company includes
almost all collinear variables.

• But this model is very useful considering for having very low
Heteroskedasticity and Autocorrelation problem.

So, these overall analysis results would help the management of Coca-Cola
company to set or estimate Salary in revised decision round.

Data Set

3

A multinational corporation named “The Coca-Cola Company” would like to study on
their employees’ salary structure in their Bangladesh Subsidiary Venture, by
predicting Salary based on some influential factors like Gender, Age, Education
Level of the employees. A sample of 30 employees’ current salary data is
randomly drawn to perform a Regression analysis. The Data set is exhibited
below_

In this Data set,

Dependent Variable, Y= Current Salary
ID Current Gende Job Age Education Work Minority
Salary (Tk) r Seniority Level Experience Class
1 16080 0 81 28.50 16 0.25 0
2 41400 0 73 40.33 16 12.50 1
3 21960 1 83 31.08 15 4.08 0
4 19200 0 93 31.17 16 1.83 1
5 28350 0 83 41.92 19 13.00 0
6 27250 1 80 29.50 18 2.42 0
7 16080 0 79 28.00 15 3.17 0
8 14100 0 67 28.75 15 0.50 0
9 12420 1 96 27.42 15 1.17 1
10 12300 1 77 52.92 12 26.42 0
11 15720 0 84 33.50 15 6.00 1
12 8880 1 88 54.33 12 27.00 0
13 22000 0 93 32.33 17 2.67 0
14 22800 0 98 41.17 15 12.00 0
15 19020 1 64 31.92 19 2.25 1
16 12300 1 94 46.25 12 20.00 0
17 22200 1 81 30.75 19 5.17 0
18 10380 1 72 32.67 15 6.92 1
19 8520 0 70 58.50 15 31.00 0
20 27500 0 89 34.17 17 3.17 0
21 11460 1 79 46.58 15 21.75 1
22 20500 0 83 35.17 16 5.75 0
23 27700 0 85 43.25 20 11.17 1
24 28000 1 65 28.00 16 1.58 1
25 22000 1 65 39.75 19 10.75 0
26 27250 0 78 30.08 19 2.92 0
27 27000 0 83 30.17 17 0.75 1
28 9000 1 70 44.50 12 18.00 0
29 31300 0 91 30.17 18 3.92 1
30 11760 0 70 26.83 15 1.25 0

4

Independent Variable,

X1= Sex of Employee

X2= Job Seniority

X3= Age of Employee

X4= Education Level

X5= Work Experience

X6= Minority Classification

Type of Scales Used Here

Attributes of measurement object in this analysis can be measured by different
types of scales:

Nominal Scale: X1= Sex of Employee “ Where Male = 0 and Female = 1”

X6= Minority Classification “ Where White = 0 and Nonwhite = 1”

Ratio Scale: X2= Job Seniority(Years in only in Coca-Cola)

X3= Age of Employee(Years)

X4= Education Level(Scores)

X5= Work Experience(Years- overall job life)

All of these Variable has Numeric Value and can obtain an absolute Zero.

So, In this Multivariate Data Set we have to perform a Multiple Regression
Analysis for predicting Possible Current Salary of an employee.

NOTE: All the analysis has been performed with the “SPSS” Software. For the
ease of presentation of analysis the Variables are discussed with their detailed
names/meanings.

MULTIPLE REGRESSION ANALYSIS RESULTS

5

Variables Entered/Removed

Variables Variables
Model Entered Removed Method

1 MINORITY . Enter
CLASSIFICATIO
N, JOB
SENIORITY,
AGE OF
EMPLOYEE,
SEX OF
EMPLOYEE,
EDUCATIONAL
LEVEL, WORK
EXPERIENCEa

a. All requested variables entered.

Model Summaryb

Adjusted R Std. Error of the
Model R R Square Square Estimate

1 .701a .491 .358 6458.883

a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB
SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE,
EDUCATIONAL LEVEL, WORK EXPERIENCE

b. Dependent Variable: CURRENT SALARY

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 9.246E8 6 1.541E8 3.694 .010a

Residual 9.595E8 23 4.172E7

Total 1.884E9 29

6

Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -25969.540 23234.542 -1.118 .275

SEX OF EMPLOYEE -2126.081 2778.333 -.133 -.765 .452

JOB SENIORITY 82.398 130.286 .100 .632 .533

AGE OF EMPLOYEE 263.053 829.669 .286 .317 .754

EDUCATIONAL LEVEL 2026.429 707.189 .564 2.865 .009

WORK EXPERIENCE -298.406 870.804 -.329 -.343 .735

MINORITY 1846.496 2528.644 .112 .730 .473
CLASSIFICATION

a. Dependent Variable: CURRENT SALARY

Thus , The estimated Model of Multiple Regression Equation,

Y = −25969.54 −2126.081 X1 + 82.398X2 + 263.053X3 + 2026.429X4
−298.406 X5 +1846.496 X6 + Ui (Regression of y on x) R2=0.491 Ui= Errors

Commentary on resulted Model
This equation suggests that Education Level is far more important than all other
independent variables. The equation says that one more score on education
background, holding all other independent variables constant, results in an
increase in Salary of TK. 2026. That is, if we consider the persons with the
same level of other positions, the one with one more score of education
can be expected to have higher salary of TK. 2026.

After Education level Minority classification is considered highly in setting
salary structure. Here if we consider people with same level in all other

7

independent variables (constant), the one White/ Nonwhite (with any
particular race determined by company management) can expected to
have incrementing salary structure and thus higher salary of TK. 2126.

The equation also says that one more year of job seniority, holding all other
independent variables constant, results in an increase in Salary of TK. 82. That
is, if we consider the persons with the same level of other positions, the
one with one more year on job on the Coca-Cola company, can be
expected to have higher salary of TK. 82.

This equation also shows that one more year of Age, holding all other
independent variables constant, results in an increase in Salary of TK. 263. That
is, if we consider the persons with the same level of other positions, the
one with one more year of age, can be expected to have higher salary of
TK. 263.This shows the age of Employee is more influential than their job
years on the company.

Here if we consider people with same level in all other independent variables
(constant), the one with sex male/ female (with any particular sex
determined by company management) can expected to have
discriminatory salary structure and thus lower salary of TK. 2126.Of course,
all these numbers are subject to uncertainty, it will be clear that we should
be dropping the variable X1 completely.

Similarly if we consider two people with same education level and holding
all other independent variables constant, the one with one more year of
experience can expected to have lower salary of TK. 298 2126.Of course,
all these numbers are subject to uncertainty, it will be clear that we should
be dropping the variable X5 completely.

Interpretation of the constant term:

Clearly, that is the salary one would get with no qualification in variable
factors and only with minimum quality to be recruited in the company. But
a negative salary is not possible. So, what would be the salary if a person
just joined the firm?

In Conclusion, we have to state that the sample is not fully representative
from all people working in the company. We can not extrapolate the results

8

too far out of this sample range. We can not use the equation to predict
what a new entrant would earn. So at the inference, we can say that this
regression equation model should not be used also for making other generalized
decisions for any salary structure.

Simple Regression for Negative Influencing Factors Show,

Variables Entered/Removedb

Variables Variables

1 SEX OF . Enter
a
EMPLOYEE



Model Summary


1 .343a .118 .086 7705.174

a. Predictors: (Constant), SEX OF EMPLOYEE

Coefficientsa

Standardized


1 (Constant) 22191.765 1868.779 11.875 .000

SEX OF EMPLOYEE -5486.380 2838.880 -.343 -1.933 .063


It is found that the simple regression of Sex of Employee on Current Salary yet
shows negative influence without having all other variable’s influence. But initial
salary(α) is positive here.

9

Now,

Variables Entered/Removedb

Variables Variables

1 WORK . Enter
a
EXPERIENCE



Model Summary


1 .391a .153 .123 7549.967

a. Predictors: (Constant), WORK EXPERIENCE

Coefficientsa

Standardized


1 (Constant) 22884.178 1940.377 11.794 .000

WORK EXPERIENCE -355.087 157.964 -.391 -2.248 .033


Again, It is found that the simple regression of Work of experience on Current Salary yet
shows negative influence without having all other variable’s influence. But initial
salary(α) is also positive here.

However, after allowing for the effects of Sex of employee and Work of experience, we
find from the multiple regression equation that it also yields lower salary same as simple
regression. So, the omission of variables only yields the positive initial salary(α), but
similar effect of other independent variables.

10

HETEROSKEDASTICITY IN MULTIPLE REGRESSION

In multiple regression, one of the assumptions we have made until now that the
errors have a common variance. This is known as the homoskedasticity
assumption. But, if we don’t have a constant variance we say they are
heteroskedastic.
In our Data set analyzing through SPSS we get,

Descriptive Statistics

Mean Std. Deviation N

CURRENT SALARY 19814.33 8060.314 30

SEX OF EMPLOYEE .43 .504 30

JOB SENIORITY 80.47 9.748 30

AGE OF EMPLOYEE 36.3227 8.76549 30

EDUCATIONAL LEVEL 16.00 2.244 30

WORK EXPERIENCE 8.6453 8.87542 30

MINORITY .37 .490 30
CLASSIFICATION

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 10342.00 29286.66 19814.33 5313.421 30

Residual -8926.251 21585.666 .000 6061.042 30

Std. Predicted Value -1.783 1.783 .000 1.000 30

Std. Residual -1.447 3.499 .000 .983 30


11

Here, Residuals plot trumpet-shaped => Residuals do not have constant variance.

Using the residuals this histogram is associated with dependent variable, leaving
independent variables for ease of getting error variance. The graph shows that it
is not totally normal distribution. There are some disturbances in this data set.
So we get the prevailing, but lower Heteroskedasticity problem here.

Model Summaryb


1 .701a .491 .358 6458.883



According to White and Gleijser test, we measure Heteroskedasticity problem
based on R2. So here we don’t reject hypothesis of Homoskedasticity(R 2<0.50).

12

In this Normal P-P Plot, we get least square line which is also very near to be
normal. So, we get here also very lower Heteroskedasticity problem.

13

Again, regressing Standardized Residual on Standardized Predicted value, we find
very Heteroskedasticity problem for showing no particular trend in this plot.

Although, We have very low Heteroskedasticity problem, we can solve the rest
by

“Possible correction => log transformation of variable weight”

This log linear form’s R2 are not comparable, since the variance of dependent
variable is different.

14

AUTOCORRELATION IN MULTIPLE REGRESSION

In multiple Regression analysis, the correlation between error terms, is called
Autocorrelation. For detecting Autocorrelation problem Durbin-Watson test is
the simplest and most commonly used. Here the ϕ for testing hypothesis of
having Autocorrelation in Data set.

Model Summaryb

Model Durbin-Watson

1 2.168a



Coefficientsa

Correlations

Model Zero-order Partial Part

1 SEX OF EMPLOYEE -.343 -.158 -.114

JOB SENIORITY .094 .131 .094

AGE OF EMPLOYEE -.313 .066 .047

EDUCATIONAL LEVEL .659 .513 .426

WORK EXPERIENCE -.391 -.071 -.051

MINORITY .224 .151 .109
CLASSIFICATION


15

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 8323.94 31453.22 19814.33 5646.471 30

Residual -7812.773 20206.270 .000 5752.046 30

Std. Predicted Value -2.035 2.061 .000 1.000 30

Std. Residual -1.210 3.128 .000 .891 30


16

Correlations

MINORITY

EDUCATION WORK CURRENT SEX OF JOB AGE OF CLASSIFICAT

AL LEVEL EXPERIENCE SALARY EMPLOYEE SENIORITY EMPLOYEE ION

Pearson CURRENT .659 -.391 1.000 -.343 .094 -.313 .224

Correlation SALARY -.391

SEX OF -.274 .271 -.343 1.000 -.225 .183 .033

EMPLOYEE

JOB -.085 -.035 .094 -.225 1.000 .003 .000

SENIORITY

AGE OF -.411 .979 -.313 .183 .003 1.000 -.196

EMPLOYEE

EDUCATION 1.000 -.497 .659 -.274 -.085 -.411 .188

AL LEVEL

WORK -.497 1.000 -.391 .271 -.035 .979 -.200

EXPERIENC

E

MINORITY .188 -.200 .224 .033 .000 -.196 1.000

CLASSIFICA

TION

Sig. (1-tailed) CURRENT .000 .016 . .032 .311 .046 .117

SALARY

SEX OF .071 .074 .032 . .116 .166 .432

EMPLOYEE

JOB .327 .428 .311 .116 . .494 .498

SENIORITY

AGE OF .012 .000 .046 .166 .494 . .150

EMPLOYEE

EDUCATION . .003 .000 .071 .327 .012 .160

AL LEVEL

WORK .003 . .016 .074 .428 .000 .144

EXPERIENC

E

MINORITY .160 .144 .117 .432 .498 .150 .

CLASSIFICA

TION

17

As here the D-W Statistic is 2.168 which is very near to 2. We know that if D-W
Statistic is 2it indicates zero correlation (ϕ=0) between Error terms. So in our
data set, there is very low Autocorrelation problem.

In solution of Autocorrelation problem, we can apply the LM Test, BKW Test etc.

MULTICOLLINEARITY IN MULTIPLE REGRESSION
One important problem in the application of multiple regression analysis involves
the possible collinearity of the explanatory variables. This condition refers to
situations in which some of the explanatory variables are highly correlated with
each other.

One method of measuring multicollinearity uses the Variance Inflation
Factor(VIF)
For each explanatory variable. We get VIF shown below through SPSS,

Coefficientsa

Collinearity Statistics

Model Tolerance VIF

1 SEX OF EMPLOYEE .734 1.362

JOB SENIORITY .939 1.065

AGE OF EMPLOYEE .033 29.964

WORK EXPERIENCE .032 31.372

MINORITY CLASSIFICATION .950 1.053

a. Dependent Variable: EDUCATIONAL LEVEL

18

Coefficientsa


Model Tolerance VIF

1 SEX OF EMPLOYEE .848 1.179

JOB SENIORITY .924 1.082


MINORITY .937 1.068
CLASSIFICATION

EDUCATIONAL LEVEL .756 1.322

a. Dependent Variable: WORK EXPERIENCE

19

Coefficientsa


Model Tolerance VIF

1 JOB SENIORITY .918 1.089


MINORITY .947 1.056
CLASSIFICATION



a. Dependent Variable: SEX OF EMPLOYEE

20

Coefficientsa


Model Tolerance VIF

1 AGE OF EMPLOYEE .028 35.540

MINORITY .938 1.066
CLASSIFICATION



SEX OF EMPLOYEE .755 1.324

a. Dependent Variable: JOB SENIORITY

21

Coefficientsa


Model Tolerance VIF

1 MINORITY .938 1.066
CLASSIFICATION




a. Dependent Variable: AGE OF EMPLOYEE

22

Coefficientsa


Model Tolerance VIF

1 EDUCATIONAL LEVEL .610 1.641




a. Dependent Variable: MINORITY CLASSIFICATION

23

The tolerance for a variable is (1 - R-squared) for the regression of that variable
on all the other independents, ignoring the dependent. When tolerance is close
to 0 there is high multicollinearity of that variable with other independents and
the coefficients will be unstable.

VIF is the variance inflation factor, which is simply the reciprocal of tolerance.
Therefore, when VIF is high there is high multicollinearity and instability of the
coefficients.

24

As a rule of thumb, if tolerance is less than .20, a problem with multicollinearity is
indicated.

From above graph and considering VIF results, we can interpret there is very high
collinearity among the independent variables.

We can solve this problem through,

• Ridge Regression

• Principle component Regression

• Dropping the most influential variables

• Using Ratios or First Differences

• Using Extraneous Estimates

• Getting more data

Concluding Comments :
By analyzing the Multiple Regression, Considering the R2 (0.491) value ,we can
infer that for overall estimation this model is not strong.

Again, we have found that the model for Salary estimation for Employee of Coca-
Cola company includes almost all collinear variables. But this model is very useful
considering for having very low Heteroskedasticity and Autocorrelation problem.

25

Econometric in application

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Econometric in application

Ähnlich wie Econometric in application (20)

Econometric in application