SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Econometrics




        ASSIGNMENT
                  ON
‘Multiple Regression Analysis’


          Prepared For:-
           Dr. Md. Kamal Uddin
                Professor
   Department of International Business
           University of Dhaka



           Prepared By:-
              Hazera Akter
               Roll No: 01
       8th Semester , BBA 1St Batch
   Department Of International Business
           University of Dhaka



       Date of Submission
            7th April , 2012
Assignment Topic
       ‘Multiple Regression Analysis with Test of
Heteroskedasticity, Autocorrelation and Multicollinearity ’




                           Table of Contents



        Topics                                 Page No.

        Analysis Summery

        Data Set




                       ANALYSIS SUMMARY

                                                              2
In multiple regression analysis, we study the relationship between an explained
variable and a number of explanatory variables. In this Assignment, the current
salary structure has been analyzed with the effects of some influential factors for
setting salary. The purpose of this analysis includes,

Cause analysis: Learn more about the relationship between several independent
variables and a dependent variable.

Impact analysis: Assess the impact of changing an independent variable to the
value of dependent variable.

Time series analysis: Predict values of a time series, using either previous values
of just that one series, or values from other series as well.

In the detailed analysis of the Multiple Regression, The Interpretation incudes,

• Considering the R2 (0.491) value ,we can infer that for overall estimation
  this model is not strong.

• The model for Salary estimation for Employee of Coca-Cola company includes
  almost all collinear variables.

• But this model is very useful considering for having very low
  Heteroskedasticity and Autocorrelation problem.

So, these overall analysis results would help the management of Coca-Cola
company to set or estimate Salary in revised decision round.




                                    Data Set

                                                                                      3
A multinational corporation named “The Coca-Cola Company” would like to study on
their employees’ salary structure in their Bangladesh Subsidiary Venture, by
predicting Salary based on some influential factors like Gender, Age, Education
Level of the employees. A sample of 30 employees’ current salary data is
randomly drawn to perform a Regression analysis. The Data set is exhibited
below_

In this Data set,

Dependent Variable, Y= Current Salary
ID    Current       Gende   Job         Age     Education   Work         Minority
      Salary (Tk)   r       Seniority           Level       Experience   Class
1     16080         0       81          28.50   16          0.25         0
2     41400         0       73          40.33   16          12.50        1
3     21960         1       83          31.08   15          4.08         0
4     19200         0       93          31.17   16          1.83         1
5     28350         0       83          41.92   19          13.00        0
6     27250         1       80          29.50   18          2.42         0
7     16080         0       79          28.00   15          3.17         0
8     14100         0       67          28.75   15          0.50         0
9     12420         1       96          27.42   15          1.17         1
10    12300         1       77          52.92   12          26.42        0
11    15720         0       84          33.50   15          6.00         1
12    8880          1       88          54.33   12          27.00        0
13    22000         0       93          32.33   17          2.67         0
14    22800         0       98          41.17   15          12.00        0
15    19020         1       64          31.92   19          2.25         1
16    12300         1       94          46.25   12          20.00        0
17    22200         1       81          30.75   19          5.17         0
18    10380         1       72          32.67   15          6.92         1
19    8520          0       70          58.50   15          31.00        0
20    27500         0       89          34.17   17          3.17         0
21    11460         1       79          46.58   15          21.75        1
22    20500         0       83          35.17   16          5.75         0
23    27700         0       85          43.25   20          11.17        1
24    28000         1       65          28.00   16          1.58         1
25    22000         1       65          39.75   19          10.75        0
26    27250         0       78          30.08   19          2.92         0
27    27000         0       83          30.17   17          0.75         1
28    9000          1       70          44.50   12          18.00        0
29    31300         0       91          30.17   18          3.92         1
30    11760         0       70          26.83   15          1.25         0

                                                                                    4
Independent Variable,

X1= Sex of Employee

X2= Job Seniority

X3= Age of Employee

X4= Education Level

X5= Work Experience

X6= Minority Classification

Type of Scales Used Here

Attributes of measurement object in this analysis can be measured by different
types of scales:

Nominal Scale: X1= Sex of Employee “ Where Male = 0 and Female = 1”

                X6= Minority Classification “ Where White = 0 and Nonwhite = 1”

Ratio Scale: X2= Job Seniority(Years in only in Coca-Cola)

             X3= Age of Employee(Years)

             X4= Education Level(Scores)

             X5= Work Experience(Years- overall job life)

All of these Variable has Numeric Value and can obtain an absolute Zero.

So, In this Multivariate Data Set we have to perform a Multiple Regression
Analysis for predicting Possible Current Salary of an employee.

NOTE: All the analysis has been performed with the “SPSS” Software. For the
ease of presentation of analysis the Variables are discussed with their detailed
names/meanings.

           MULTIPLE REGRESSION ANALYSIS RESULTS




                                                                                   5
Variables Entered/Removed

                                        Variables               Variables
                        Model            Entered                Removed            Method

                        1            MINORITY                                 . Enter
                                     CLASSIFICATIO
                                     N, JOB
                                     SENIORITY,
                                     AGE OF
                                     EMPLOYEE,
                                     SEX OF
                                     EMPLOYEE,
                                     EDUCATIONAL
                                     LEVEL, WORK
                                     EXPERIENCEa

                        a. All requested variables entered.




                                                Model Summaryb

                                                                Adjusted R         Std. Error of the
                Model            R            R Square            Square                Estimate

                1                    .701a          .491                    .358           6458.883

                a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB
                SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE,
                EDUCATIONAL LEVEL, WORK EXPERIENCE

                b. Dependent Variable: CURRENT SALARY




                                                 ANOVAb

Model                       Sum of Squares          df           Mean Square              F            Sig.

1       Regression                   9.246E8               6            1.541E8            3.694          .010a

        Residual                     9.595E8               23           4.172E7

        Total                        1.884E9               29




                                                                                                                  6
Coefficientsa

                                                                      Standardized
                                   Unstandardized Coefficients        Coefficients

Model                                     B            Std. Error        Beta           t               Sig.

1       (Constant)                      -25969.540       23234.542                      -1.118             .275

        SEX OF EMPLOYEE                  -2126.081        2778.333              -.133       -.765          .452

        JOB SENIORITY                         82.398        130.286             .100         .632          .533

        AGE OF EMPLOYEE                   263.053           829.669             .286         .317          .754

        EDUCATIONAL LEVEL                2026.429           707.189             .564        2.865          .009

        WORK EXPERIENCE                   -298.406          870.804             -.329       -.343          .735

        MINORITY                         1846.496         2528.644              .112         .730          .473
        CLASSIFICATION

a. Dependent Variable: CURRENT SALARY




Thus , The estimated Model of Multiple Regression Equation,


          Y = −25969.54 −2126.081 X1 + 82.398X2 + 263.053X3 + 2026.429X4
−298.406 X5 +1846.496 X6 + Ui (Regression of y on x) R2=0.491 Ui= Errors




                     Commentary on resulted Model
This equation suggests that Education Level is far more important than all other
independent variables. The equation says that one more score on education
background, holding all other independent variables constant, results in an
increase in Salary of TK. 2026. That is, if we consider the persons with the
same level of other positions, the one with one more score of education
can be expected to have higher salary of TK. 2026.

After Education level Minority classification is considered highly in setting
salary structure. Here if we consider people with same level in all other



                                                                                                    7
independent variables (constant), the one White/ Nonwhite (with any
particular race determined by company management) can expected to
have incrementing salary structure and thus higher salary of TK. 2126.

The equation also says that one more year of job seniority, holding all other
independent variables constant, results in an increase in Salary of TK. 82. That
is, if we consider the persons with the same level of other positions, the
one with one more year on job on the Coca-Cola company, can be
expected to have higher salary of TK. 82.

This equation also shows that one more year of Age, holding all other
independent variables constant, results in an increase in Salary of TK. 263. That
is, if we consider the persons with the same level of other positions, the
one with one more year of age, can be expected to have higher salary of
TK. 263.This shows the age of Employee is more influential than their job
years on the company.

Here if we consider people with same level in all other independent variables
(constant), the one with sex male/ female (with any particular sex
determined by company management) can expected to have
discriminatory salary structure and thus lower salary of TK. 2126.Of course,
all these numbers are subject to uncertainty, it will be clear that we should
be dropping the variable X1 completely.

Similarly if we consider two people with same education level and holding
all other independent variables constant, the one with one more year of
experience can expected to have lower salary of TK. 298 2126.Of course,
all these numbers are subject to uncertainty, it will be clear that we should
be dropping the variable X5 completely.

Interpretation of the constant term:

Clearly, that is the salary one would get with no qualification in variable
factors and only with minimum quality to be recruited in the company. But
a negative salary is not possible. So, what would be the salary if a person
just joined the firm?

In Conclusion, we have to state that the sample is not fully representative
from all people working in the company. We can not extrapolate the results


                                                                               8
too far out of this sample range. We can not use the equation to predict
what a new entrant would earn. So at the inference, we can say that this
regression equation model should not be used also for making other generalized
decisions for any salary structure.

Simple Regression for Negative Influencing Factors Show,


              Variables Entered/Removedb

             Variables              Variables
Model         Entered               Removed            Method

1        SEX OF                                   . Enter
                          a
         EMPLOYEE

a. All requested variables entered.

b. Dependent Variable: CURRENT SALARY


                                 Model Summary

                                            Adjusted R         Std. Error of the
Model         R               R Square        Square               Estimate

1                 .343a            .118                .086            7705.174

a. Predictors: (Constant), SEX OF EMPLOYEE




                                                       Coefficientsa

                                                                              Standardized
                                          Unstandardized Coefficients          Coefficients

Model                                         B               Std. Error           Beta           t        Sig.

1        (Constant)                         22191.765            1868.779                         11.875      .000

         SEX OF EMPLOYEE                     -5486.380           2838.880                 -.343   -1.933      .063

a. Dependent Variable: CURRENT SALARY



It is found that the simple regression of Sex of Employee on Current Salary yet
shows negative influence without having all other variable’s influence. But initial
salary(α) is positive here.



                                                                                                              9
Now,

              Variables Entered/Removedb

             Variables           Variables
Model         Entered            Removed             Method

1        WORK                                    . Enter
                          a
         EXPERIENCE

a. All requested variables entered.

b. Dependent Variable: CURRENT SALARY




                              Model Summary

                                        Adjusted R          Std. Error of the
Model         R           R Square        Square               Estimate

1                 .391a         .153                 .123           7549.967

a. Predictors: (Constant), WORK EXPERIENCE




                                                     Coefficientsa

                                                                            Standardized
                                       Unstandardized Coefficients              Coefficients

Model                                        B              Std. Error             Beta           t        Sig.

1        (Constant)                      22884.178             1940.377                           11.794        .000

         WORK EXPERIENCE                     -355.087            157.964                  -.391   -2.248        .033

a. Dependent Variable: CURRENT SALARY



Again, It is found that the simple regression of Work of experience on Current Salary yet
shows negative influence without having all other variable’s influence. But initial
salary(α) is also positive here.


However, after allowing for the effects of Sex of employee and Work of experience, we
find from the multiple regression equation that it also yields lower salary same as simple
regression. So, the omission of variables only yields the positive initial salary(α), but
similar effect of other independent variables.


                                                                                                           10
HETEROSKEDASTICITY IN MULTIPLE REGRESSION

In multiple regression, one of the assumptions we have made until now that the
errors have a common variance. This is known as the homoskedasticity
assumption. But, if we don’t have a constant variance we say they are
heteroskedastic.
In our Data set analyzing through SPSS we get,

                       Descriptive Statistics

                              Mean          Std. Deviation             N

CURRENT SALARY                19814.33            8060.314                   30

SEX OF EMPLOYEE                      .43                 .504                30

JOB SENIORITY                     80.47                 9.748                30

AGE OF EMPLOYEE                 36.3227               8.76549                30

EDUCATIONAL LEVEL                 16.00                 2.244                30

WORK EXPERIENCE                  8.6453               8.87542                30

MINORITY                             .37                 .490                30
CLASSIFICATION




                                  Residuals Statisticsa

                       Minimum       Maximum             Mean              Std. Deviation    N

Predicted Value          10342.00          29286.66      19814.33                 5313.421       30

Residual                 -8926.251     21585.666                .000              6061.042       30

Std. Predicted Value        -1.783            1.783             .000                 1.000       30

Std. Residual               -1.447            3.499             .000                  .983       30

a. Dependent Variable: CURRENT SALARY




                                                                                                      11
Here, Residuals plot trumpet-shaped => Residuals do not have constant variance.




Using the residuals this histogram is associated with dependent variable, leaving
independent variables for ease of getting error variance. The graph shows that it
is not totally normal distribution. There are some disturbances in this data set.
So we get the prevailing, but lower Heteroskedasticity problem here.


                                        Model Summaryb

                                                  Adjusted R       Std. Error of the
              Model       R           R Square     Square             Estimate

              1               .701a        .491             .358           6458.883

              a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB
              SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE,
              EDUCATIONAL LEVEL, WORK EXPERIENCE

              b. Dependent Variable: CURRENT SALARY

According to White and Gleijser test, we measure Heteroskedasticity problem
based on R2. So here we don’t reject hypothesis of Homoskedasticity(R 2<0.50).



                                                                                       12
In this Normal P-P Plot, we get least square line which is also very near to be
normal. So, we get here also very lower Heteroskedasticity problem.




                                                                                  13
Again, regressing Standardized Residual on Standardized Predicted value, we find
very Heteroskedasticity problem for showing no particular trend in this plot.

Although, We have very low Heteroskedasticity problem, we can solve the rest
by

          “Possible correction => log transformation of variable weight”

This log linear form’s R2 are not comparable, since the variance of dependent
variable is different.




                                                                                14
AUTOCORRELATION IN MULTIPLE REGRESSION


In multiple Regression analysis, the correlation between error terms, is called
Autocorrelation. For detecting Autocorrelation problem Durbin-Watson test is
the simplest and most commonly used. Here the ϕ for testing hypothesis of
having Autocorrelation in Data set.


                     Model Summaryb

Model                     Durbin-Watson

1                          2.168a

a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB
SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE,
EDUCATIONAL LEVEL, WORK EXPERIENCE

b. Dependent Variable: CURRENT SALARY




                             Coefficientsa

                                                  Correlations

Model                                Zero-order      Partial      Part

1       SEX OF EMPLOYEE                      -.343        -.158      -.114

        JOB SENIORITY                        .094         .131       .094

        AGE OF EMPLOYEE                      -.313        .066       .047

        EDUCATIONAL LEVEL                    .659         .513       .426

        WORK EXPERIENCE                      -.391        -.071      -.051

        MINORITY                             .224         .151       .109
        CLASSIFICATION

a. Dependent Variable: CURRENT SALARY




                                                                                  15
Residuals Statisticsa

                       Minimum      Maximum        Mean         Std. Deviation    N

Predicted Value          8323.94      31453.22     19814.33           5646.471        30

Residual               -7812.773     20206.270           .000         5752.046        30

Std. Predicted Value      -2.035          2.061          .000             1.000       30

Std. Residual             -1.210          3.128          .000              .891       30

a. Dependent Variable: CURRENT SALARY




                                                                                           16
Correlations

                                                                                                                               MINORITY

                           EDUCATION          WORK                CURRENT        SEX OF          JOB            AGE OF        CLASSIFICAT

                           AL LEVEL        EXPERIENCE             SALARY        EMPLOYEE       SENIORITY       EMPLOYEE          ION

Pearson       CURRENT            .659             -.391               1.000          -.343             .094         -.313              .224

Correlation   SALARY                      -.391

              SEX OF            -.274             .271                -.343          1.000           -.225           .183              .033

              EMPLOYEE

              JOB               -.085             -.035                .094          -.225          1.000            .003              .000

              SENIORITY

              AGE OF            -.411                     .979        -.313           .183             .003         1.000              -.196

              EMPLOYEE

              EDUCATION         1.000                     -.497        .659          -.274           -.085          -.411              .188

              AL LEVEL

              WORK              -.497                 1.000           -.391           .271           -.035           .979              -.200

              EXPERIENC

              E

              MINORITY           .188                     -.200        .224           .033             .000         -.196           1.000

              CLASSIFICA

              TION

Sig. (1-tailed) CURRENT          .000                     .016              .         .032             .311          .046              .117

              SALARY

              SEX OF             .071                     .074         .032                .           .116          .166              .432

              EMPLOYEE

              JOB                .327                     .428         .311           .116                 .         .494              .498

              SENIORITY

              AGE OF             .012                     .000         .046           .166             .494               .            .150

              EMPLOYEE

              EDUCATION               .                   .003         .000           .071             .327          .012              .160

              AL LEVEL

              WORK               .003                         .        .016           .074             .428          .000              .144

              EXPERIENC

              E

              MINORITY           .160                     .144         .117           .432             .498          .150                  .

              CLASSIFICA

              TION

                                                                                                                    17
As here the D-W Statistic is 2.168 which is very near to 2. We know that if D-W
Statistic is 2it indicates zero correlation (ϕ=0) between Error terms. So in our
data set, there is very low Autocorrelation problem.

In solution of Autocorrelation problem, we can apply the LM Test, BKW Test etc.



         MULTICOLLINEARITY IN MULTIPLE REGRESSION
One important problem in the application of multiple regression analysis involves
the possible collinearity of the explanatory variables. This condition refers to
situations in which some of the explanatory variables are highly correlated with
each other.

One method of measuring multicollinearity uses the Variance Inflation
Factor(VIF)
For each explanatory variable. We get VIF shown below through SPSS,


                                 Coefficientsa

                                                       Collinearity Statistics

Model                                            Tolerance                VIF

1        SEX OF EMPLOYEE                                 .734                     1.362

         JOB SENIORITY                                   .939                     1.065

         AGE OF EMPLOYEE                                 .033                    29.964

         WORK EXPERIENCE                                 .032                    31.372

         MINORITY CLASSIFICATION                         .950                     1.053

a. Dependent Variable: EDUCATIONAL LEVEL




                                                                                          18
Coefficientsa

                                         Collinearity Statistics

Model                                   Tolerance         VIF

1       SEX OF EMPLOYEE                        .848          1.179

        JOB SENIORITY                          .924          1.082

        AGE OF EMPLOYEE                        .810          1.235

        MINORITY                               .937          1.068
        CLASSIFICATION

        EDUCATIONAL LEVEL                      .756          1.322

a. Dependent Variable: WORK EXPERIENCE




                                                                     19
Coefficientsa

                                         Collinearity Statistics

Model                                   Tolerance         VIF

1       JOB SENIORITY                          .918          1.089

        AGE OF EMPLOYEE                        .031        32.365

        MINORITY                               .947          1.056
        CLASSIFICATION

        EDUCATIONAL LEVEL                      .572          1.749

        WORK EXPERIENCE                        .028        35.927

a. Dependent Variable: SEX OF EMPLOYEE




                                                                     20
Coefficientsa

                                        Collinearity Statistics

Model                                  Tolerance         VIF

1       AGE OF EMPLOYEE                       .028        35.540

        MINORITY                              .938          1.066
        CLASSIFICATION

        EDUCATIONAL LEVEL                     .602          1.662

        WORK EXPERIENCE                       .025        40.063

        SEX OF EMPLOYEE                       .755          1.324

a. Dependent Variable: JOB SENIORITY




                                                                    21
Coefficientsa

                                       Collinearity Statistics

Model                                 Tolerance         VIF

1       MINORITY                             .938          1.066
        CLASSIFICATION

        EDUCATIONAL LEVEL                    .721          1.388

        WORK EXPERIENCE                      .718          1.392

        SEX OF EMPLOYEE                      .890          1.124

a. Dependent Variable: AGE OF EMPLOYEE




                                                                   22
Coefficientsa

                                     Collinearity Statistics

Model                            Tolerance            VIF

1       EDUCATIONAL LEVEL                  .610          1.641

        WORK EXPERIENCE                    .025        40.044

        SEX OF EMPLOYEE                    .763          1.311

        AGE OF EMPLOYEE                    .028        35.539

a. Dependent Variable: MINORITY CLASSIFICATION




                                                                 23
The tolerance for a variable is (1 - R-squared) for the regression of that variable
on all the other independents, ignoring the dependent. When tolerance is close
to 0 there is high multicollinearity of that variable with other independents and
the coefficients will be unstable.

VIF is the variance inflation factor, which is simply the reciprocal of tolerance.
Therefore, when VIF is high there is high multicollinearity and instability of the
coefficients.




                                                                                      24
As a rule of thumb, if tolerance is less than .20, a problem with multicollinearity is
indicated.

From above graph and considering VIF results, we can interpret there is very high
collinearity among the independent variables.

We can solve this problem through,

   • Ridge Regression

   • Principle component Regression

   • Dropping the most influential variables

   • Using Ratios or First Differences

   • Using Extraneous Estimates

   • Getting more data

Concluding Comments :
By analyzing the Multiple Regression, Considering the R2 (0.491) value ,we can
infer that for overall estimation this model is not strong.

Again, we have found that the model for Salary estimation for Employee of Coca-
Cola company includes almost all collinear variables. But this model is very useful
considering for having very low Heteroskedasticity and Autocorrelation problem.




                                                                                    25
26
26

Weitere ähnliche Inhalte

Ähnlich wie Econometric in application

Quantitative Analysis
Quantitative AnalysisQuantitative Analysis
Quantitative Analysisunmgrc
 
5. vasiliniuc i., patriche c. elaborating a soil quality index
5. vasiliniuc i., patriche c.   elaborating a soil quality index5. vasiliniuc i., patriche c.   elaborating a soil quality index
5. vasiliniuc i., patriche c. elaborating a soil quality indexVasiliniuc Ionut
 
Tugas Anova Desti
Tugas Anova DestiTugas Anova Desti
Tugas Anova Destiguest578cc8
 
Tugas Anova Desti
Tugas Anova DestiTugas Anova Desti
Tugas Anova Destiguest578cc8
 
Speed bin report by vehicle class
Speed bin report by vehicle classSpeed bin report by vehicle class
Speed bin report by vehicle classMetroCount
 
Seepage analysis final
Seepage analysis finalSeepage analysis final
Seepage analysis finalsarvannn
 
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...aurkoiitk
 
ธนาคารกสิกรไทย
ธนาคารกสิกรไทย ธนาคารกสิกรไทย
ธนาคารกสิกรไทย Adcha'z Tawinprai
 
Galambos N Analysis Of Survey Results
Galambos N Analysis Of Survey ResultsGalambos N Analysis Of Survey Results
Galambos N Analysis Of Survey ResultsNora Galambos
 
Problem set 1x
Problem set 1xProblem set 1x
Problem set 1x3.com
 

Ähnlich wie Econometric in application (20)

Financial ratios
Financial ratiosFinancial ratios
Financial ratios
 
Quantitative Analysis
Quantitative AnalysisQuantitative Analysis
Quantitative Analysis
 
Bfi_barcelona08
Bfi_barcelona08Bfi_barcelona08
Bfi_barcelona08
 
5. vasiliniuc i., patriche c. elaborating a soil quality index
5. vasiliniuc i., patriche c.   elaborating a soil quality index5. vasiliniuc i., patriche c.   elaborating a soil quality index
5. vasiliniuc i., patriche c. elaborating a soil quality index
 
Tugas Anova Desti
Tugas Anova DestiTugas Anova Desti
Tugas Anova Desti
 
Tugas Anova Desti
Tugas Anova DestiTugas Anova Desti
Tugas Anova Desti
 
Tugas Anova Desti
Tugas Anova DestiTugas Anova Desti
Tugas Anova Desti
 
Tugas Anova Desti
Tugas Anova DestiTugas Anova Desti
Tugas Anova Desti
 
Tugas Anova Desti
Tugas Anova DestiTugas Anova Desti
Tugas Anova Desti
 
Plan
PlanPlan
Plan
 
Ib.2008
Ib.2008Ib.2008
Ib.2008
 
Speed bin report by vehicle class
Speed bin report by vehicle classSpeed bin report by vehicle class
Speed bin report by vehicle class
 
Seepage analysis final
Seepage analysis finalSeepage analysis final
Seepage analysis final
 
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
 
CPHAP 007 Anemia Ferropenica
CPHAP 007 Anemia FerropenicaCPHAP 007 Anemia Ferropenica
CPHAP 007 Anemia Ferropenica
 
Ib.2009
Ib.2009Ib.2009
Ib.2009
 
ธนาคารกสิกรไทย
ธนาคารกสิกรไทย ธนาคารกสิกรไทย
ธนาคารกสิกรไทย
 
Galambos N Analysis Of Survey Results
Galambos N Analysis Of Survey ResultsGalambos N Analysis Of Survey Results
Galambos N Analysis Of Survey Results
 
(Fall 2012) How the Quality of Samples Affect the Accuracy of Face Recognition
(Fall 2012) How the Quality of Samples Affect the Accuracy of Face Recognition(Fall 2012) How the Quality of Samples Affect the Accuracy of Face Recognition
(Fall 2012) How the Quality of Samples Affect the Accuracy of Face Recognition
 
Problem set 1x
Problem set 1xProblem set 1x
Problem set 1x
 

Econometric in application

  • 1. Econometrics ASSIGNMENT ON ‘Multiple Regression Analysis’ Prepared For:- Dr. Md. Kamal Uddin Professor Department of International Business University of Dhaka Prepared By:- Hazera Akter Roll No: 01 8th Semester , BBA 1St Batch Department Of International Business University of Dhaka Date of Submission 7th April , 2012
  • 2. Assignment Topic ‘Multiple Regression Analysis with Test of Heteroskedasticity, Autocorrelation and Multicollinearity ’ Table of Contents Topics Page No. Analysis Summery Data Set ANALYSIS SUMMARY 2
  • 3. In multiple regression analysis, we study the relationship between an explained variable and a number of explanatory variables. In this Assignment, the current salary structure has been analyzed with the effects of some influential factors for setting salary. The purpose of this analysis includes, Cause analysis: Learn more about the relationship between several independent variables and a dependent variable. Impact analysis: Assess the impact of changing an independent variable to the value of dependent variable. Time series analysis: Predict values of a time series, using either previous values of just that one series, or values from other series as well. In the detailed analysis of the Multiple Regression, The Interpretation incudes, • Considering the R2 (0.491) value ,we can infer that for overall estimation this model is not strong. • The model for Salary estimation for Employee of Coca-Cola company includes almost all collinear variables. • But this model is very useful considering for having very low Heteroskedasticity and Autocorrelation problem. So, these overall analysis results would help the management of Coca-Cola company to set or estimate Salary in revised decision round. Data Set 3
  • 4. A multinational corporation named “The Coca-Cola Company” would like to study on their employees’ salary structure in their Bangladesh Subsidiary Venture, by predicting Salary based on some influential factors like Gender, Age, Education Level of the employees. A sample of 30 employees’ current salary data is randomly drawn to perform a Regression analysis. The Data set is exhibited below_ In this Data set, Dependent Variable, Y= Current Salary ID Current Gende Job Age Education Work Minority Salary (Tk) r Seniority Level Experience Class 1 16080 0 81 28.50 16 0.25 0 2 41400 0 73 40.33 16 12.50 1 3 21960 1 83 31.08 15 4.08 0 4 19200 0 93 31.17 16 1.83 1 5 28350 0 83 41.92 19 13.00 0 6 27250 1 80 29.50 18 2.42 0 7 16080 0 79 28.00 15 3.17 0 8 14100 0 67 28.75 15 0.50 0 9 12420 1 96 27.42 15 1.17 1 10 12300 1 77 52.92 12 26.42 0 11 15720 0 84 33.50 15 6.00 1 12 8880 1 88 54.33 12 27.00 0 13 22000 0 93 32.33 17 2.67 0 14 22800 0 98 41.17 15 12.00 0 15 19020 1 64 31.92 19 2.25 1 16 12300 1 94 46.25 12 20.00 0 17 22200 1 81 30.75 19 5.17 0 18 10380 1 72 32.67 15 6.92 1 19 8520 0 70 58.50 15 31.00 0 20 27500 0 89 34.17 17 3.17 0 21 11460 1 79 46.58 15 21.75 1 22 20500 0 83 35.17 16 5.75 0 23 27700 0 85 43.25 20 11.17 1 24 28000 1 65 28.00 16 1.58 1 25 22000 1 65 39.75 19 10.75 0 26 27250 0 78 30.08 19 2.92 0 27 27000 0 83 30.17 17 0.75 1 28 9000 1 70 44.50 12 18.00 0 29 31300 0 91 30.17 18 3.92 1 30 11760 0 70 26.83 15 1.25 0 4
  • 5. Independent Variable, X1= Sex of Employee X2= Job Seniority X3= Age of Employee X4= Education Level X5= Work Experience X6= Minority Classification Type of Scales Used Here Attributes of measurement object in this analysis can be measured by different types of scales: Nominal Scale: X1= Sex of Employee “ Where Male = 0 and Female = 1” X6= Minority Classification “ Where White = 0 and Nonwhite = 1” Ratio Scale: X2= Job Seniority(Years in only in Coca-Cola) X3= Age of Employee(Years) X4= Education Level(Scores) X5= Work Experience(Years- overall job life) All of these Variable has Numeric Value and can obtain an absolute Zero. So, In this Multivariate Data Set we have to perform a Multiple Regression Analysis for predicting Possible Current Salary of an employee. NOTE: All the analysis has been performed with the “SPSS” Software. For the ease of presentation of analysis the Variables are discussed with their detailed names/meanings. MULTIPLE REGRESSION ANALYSIS RESULTS 5
  • 6. Variables Entered/Removed Variables Variables Model Entered Removed Method 1 MINORITY . Enter CLASSIFICATIO N, JOB SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE, EDUCATIONAL LEVEL, WORK EXPERIENCEa a. All requested variables entered. Model Summaryb Adjusted R Std. Error of the Model R R Square Square Estimate 1 .701a .491 .358 6458.883 a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE, EDUCATIONAL LEVEL, WORK EXPERIENCE b. Dependent Variable: CURRENT SALARY ANOVAb Model Sum of Squares df Mean Square F Sig. 1 Regression 9.246E8 6 1.541E8 3.694 .010a Residual 9.595E8 23 4.172E7 Total 1.884E9 29 6
  • 7. Coefficientsa Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) -25969.540 23234.542 -1.118 .275 SEX OF EMPLOYEE -2126.081 2778.333 -.133 -.765 .452 JOB SENIORITY 82.398 130.286 .100 .632 .533 AGE OF EMPLOYEE 263.053 829.669 .286 .317 .754 EDUCATIONAL LEVEL 2026.429 707.189 .564 2.865 .009 WORK EXPERIENCE -298.406 870.804 -.329 -.343 .735 MINORITY 1846.496 2528.644 .112 .730 .473 CLASSIFICATION a. Dependent Variable: CURRENT SALARY Thus , The estimated Model of Multiple Regression Equation, Y = −25969.54 −2126.081 X1 + 82.398X2 + 263.053X3 + 2026.429X4 −298.406 X5 +1846.496 X6 + Ui (Regression of y on x) R2=0.491 Ui= Errors Commentary on resulted Model This equation suggests that Education Level is far more important than all other independent variables. The equation says that one more score on education background, holding all other independent variables constant, results in an increase in Salary of TK. 2026. That is, if we consider the persons with the same level of other positions, the one with one more score of education can be expected to have higher salary of TK. 2026. After Education level Minority classification is considered highly in setting salary structure. Here if we consider people with same level in all other 7
  • 8. independent variables (constant), the one White/ Nonwhite (with any particular race determined by company management) can expected to have incrementing salary structure and thus higher salary of TK. 2126. The equation also says that one more year of job seniority, holding all other independent variables constant, results in an increase in Salary of TK. 82. That is, if we consider the persons with the same level of other positions, the one with one more year on job on the Coca-Cola company, can be expected to have higher salary of TK. 82. This equation also shows that one more year of Age, holding all other independent variables constant, results in an increase in Salary of TK. 263. That is, if we consider the persons with the same level of other positions, the one with one more year of age, can be expected to have higher salary of TK. 263.This shows the age of Employee is more influential than their job years on the company. Here if we consider people with same level in all other independent variables (constant), the one with sex male/ female (with any particular sex determined by company management) can expected to have discriminatory salary structure and thus lower salary of TK. 2126.Of course, all these numbers are subject to uncertainty, it will be clear that we should be dropping the variable X1 completely. Similarly if we consider two people with same education level and holding all other independent variables constant, the one with one more year of experience can expected to have lower salary of TK. 298 2126.Of course, all these numbers are subject to uncertainty, it will be clear that we should be dropping the variable X5 completely. Interpretation of the constant term: Clearly, that is the salary one would get with no qualification in variable factors and only with minimum quality to be recruited in the company. But a negative salary is not possible. So, what would be the salary if a person just joined the firm? In Conclusion, we have to state that the sample is not fully representative from all people working in the company. We can not extrapolate the results 8
  • 9. too far out of this sample range. We can not use the equation to predict what a new entrant would earn. So at the inference, we can say that this regression equation model should not be used also for making other generalized decisions for any salary structure. Simple Regression for Negative Influencing Factors Show, Variables Entered/Removedb Variables Variables Model Entered Removed Method 1 SEX OF . Enter a EMPLOYEE a. All requested variables entered. b. Dependent Variable: CURRENT SALARY Model Summary Adjusted R Std. Error of the Model R R Square Square Estimate 1 .343a .118 .086 7705.174 a. Predictors: (Constant), SEX OF EMPLOYEE Coefficientsa Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 22191.765 1868.779 11.875 .000 SEX OF EMPLOYEE -5486.380 2838.880 -.343 -1.933 .063 a. Dependent Variable: CURRENT SALARY It is found that the simple regression of Sex of Employee on Current Salary yet shows negative influence without having all other variable’s influence. But initial salary(α) is positive here. 9
  • 10. Now, Variables Entered/Removedb Variables Variables Model Entered Removed Method 1 WORK . Enter a EXPERIENCE a. All requested variables entered. b. Dependent Variable: CURRENT SALARY Model Summary Adjusted R Std. Error of the Model R R Square Square Estimate 1 .391a .153 .123 7549.967 a. Predictors: (Constant), WORK EXPERIENCE Coefficientsa Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 22884.178 1940.377 11.794 .000 WORK EXPERIENCE -355.087 157.964 -.391 -2.248 .033 a. Dependent Variable: CURRENT SALARY Again, It is found that the simple regression of Work of experience on Current Salary yet shows negative influence without having all other variable’s influence. But initial salary(α) is also positive here. However, after allowing for the effects of Sex of employee and Work of experience, we find from the multiple regression equation that it also yields lower salary same as simple regression. So, the omission of variables only yields the positive initial salary(α), but similar effect of other independent variables. 10
  • 11. HETEROSKEDASTICITY IN MULTIPLE REGRESSION In multiple regression, one of the assumptions we have made until now that the errors have a common variance. This is known as the homoskedasticity assumption. But, if we don’t have a constant variance we say they are heteroskedastic. In our Data set analyzing through SPSS we get, Descriptive Statistics Mean Std. Deviation N CURRENT SALARY 19814.33 8060.314 30 SEX OF EMPLOYEE .43 .504 30 JOB SENIORITY 80.47 9.748 30 AGE OF EMPLOYEE 36.3227 8.76549 30 EDUCATIONAL LEVEL 16.00 2.244 30 WORK EXPERIENCE 8.6453 8.87542 30 MINORITY .37 .490 30 CLASSIFICATION Residuals Statisticsa Minimum Maximum Mean Std. Deviation N Predicted Value 10342.00 29286.66 19814.33 5313.421 30 Residual -8926.251 21585.666 .000 6061.042 30 Std. Predicted Value -1.783 1.783 .000 1.000 30 Std. Residual -1.447 3.499 .000 .983 30 a. Dependent Variable: CURRENT SALARY 11
  • 12. Here, Residuals plot trumpet-shaped => Residuals do not have constant variance. Using the residuals this histogram is associated with dependent variable, leaving independent variables for ease of getting error variance. The graph shows that it is not totally normal distribution. There are some disturbances in this data set. So we get the prevailing, but lower Heteroskedasticity problem here. Model Summaryb Adjusted R Std. Error of the Model R R Square Square Estimate 1 .701a .491 .358 6458.883 a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE, EDUCATIONAL LEVEL, WORK EXPERIENCE b. Dependent Variable: CURRENT SALARY According to White and Gleijser test, we measure Heteroskedasticity problem based on R2. So here we don’t reject hypothesis of Homoskedasticity(R 2<0.50). 12
  • 13. In this Normal P-P Plot, we get least square line which is also very near to be normal. So, we get here also very lower Heteroskedasticity problem. 13
  • 14. Again, regressing Standardized Residual on Standardized Predicted value, we find very Heteroskedasticity problem for showing no particular trend in this plot. Although, We have very low Heteroskedasticity problem, we can solve the rest by “Possible correction => log transformation of variable weight” This log linear form’s R2 are not comparable, since the variance of dependent variable is different. 14
  • 15. AUTOCORRELATION IN MULTIPLE REGRESSION In multiple Regression analysis, the correlation between error terms, is called Autocorrelation. For detecting Autocorrelation problem Durbin-Watson test is the simplest and most commonly used. Here the ϕ for testing hypothesis of having Autocorrelation in Data set. Model Summaryb Model Durbin-Watson 1 2.168a a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE, EDUCATIONAL LEVEL, WORK EXPERIENCE b. Dependent Variable: CURRENT SALARY Coefficientsa Correlations Model Zero-order Partial Part 1 SEX OF EMPLOYEE -.343 -.158 -.114 JOB SENIORITY .094 .131 .094 AGE OF EMPLOYEE -.313 .066 .047 EDUCATIONAL LEVEL .659 .513 .426 WORK EXPERIENCE -.391 -.071 -.051 MINORITY .224 .151 .109 CLASSIFICATION a. Dependent Variable: CURRENT SALARY 15
  • 16. Residuals Statisticsa Minimum Maximum Mean Std. Deviation N Predicted Value 8323.94 31453.22 19814.33 5646.471 30 Residual -7812.773 20206.270 .000 5752.046 30 Std. Predicted Value -2.035 2.061 .000 1.000 30 Std. Residual -1.210 3.128 .000 .891 30 a. Dependent Variable: CURRENT SALARY 16
  • 17. Correlations MINORITY EDUCATION WORK CURRENT SEX OF JOB AGE OF CLASSIFICAT AL LEVEL EXPERIENCE SALARY EMPLOYEE SENIORITY EMPLOYEE ION Pearson CURRENT .659 -.391 1.000 -.343 .094 -.313 .224 Correlation SALARY -.391 SEX OF -.274 .271 -.343 1.000 -.225 .183 .033 EMPLOYEE JOB -.085 -.035 .094 -.225 1.000 .003 .000 SENIORITY AGE OF -.411 .979 -.313 .183 .003 1.000 -.196 EMPLOYEE EDUCATION 1.000 -.497 .659 -.274 -.085 -.411 .188 AL LEVEL WORK -.497 1.000 -.391 .271 -.035 .979 -.200 EXPERIENC E MINORITY .188 -.200 .224 .033 .000 -.196 1.000 CLASSIFICA TION Sig. (1-tailed) CURRENT .000 .016 . .032 .311 .046 .117 SALARY SEX OF .071 .074 .032 . .116 .166 .432 EMPLOYEE JOB .327 .428 .311 .116 . .494 .498 SENIORITY AGE OF .012 .000 .046 .166 .494 . .150 EMPLOYEE EDUCATION . .003 .000 .071 .327 .012 .160 AL LEVEL WORK .003 . .016 .074 .428 .000 .144 EXPERIENC E MINORITY .160 .144 .117 .432 .498 .150 . CLASSIFICA TION 17
  • 18. As here the D-W Statistic is 2.168 which is very near to 2. We know that if D-W Statistic is 2it indicates zero correlation (ϕ=0) between Error terms. So in our data set, there is very low Autocorrelation problem. In solution of Autocorrelation problem, we can apply the LM Test, BKW Test etc. MULTICOLLINEARITY IN MULTIPLE REGRESSION One important problem in the application of multiple regression analysis involves the possible collinearity of the explanatory variables. This condition refers to situations in which some of the explanatory variables are highly correlated with each other. One method of measuring multicollinearity uses the Variance Inflation Factor(VIF) For each explanatory variable. We get VIF shown below through SPSS, Coefficientsa Collinearity Statistics Model Tolerance VIF 1 SEX OF EMPLOYEE .734 1.362 JOB SENIORITY .939 1.065 AGE OF EMPLOYEE .033 29.964 WORK EXPERIENCE .032 31.372 MINORITY CLASSIFICATION .950 1.053 a. Dependent Variable: EDUCATIONAL LEVEL 18
  • 19. Coefficientsa Collinearity Statistics Model Tolerance VIF 1 SEX OF EMPLOYEE .848 1.179 JOB SENIORITY .924 1.082 AGE OF EMPLOYEE .810 1.235 MINORITY .937 1.068 CLASSIFICATION EDUCATIONAL LEVEL .756 1.322 a. Dependent Variable: WORK EXPERIENCE 19
  • 20. Coefficientsa Collinearity Statistics Model Tolerance VIF 1 JOB SENIORITY .918 1.089 AGE OF EMPLOYEE .031 32.365 MINORITY .947 1.056 CLASSIFICATION EDUCATIONAL LEVEL .572 1.749 WORK EXPERIENCE .028 35.927 a. Dependent Variable: SEX OF EMPLOYEE 20
  • 21. Coefficientsa Collinearity Statistics Model Tolerance VIF 1 AGE OF EMPLOYEE .028 35.540 MINORITY .938 1.066 CLASSIFICATION EDUCATIONAL LEVEL .602 1.662 WORK EXPERIENCE .025 40.063 SEX OF EMPLOYEE .755 1.324 a. Dependent Variable: JOB SENIORITY 21
  • 22. Coefficientsa Collinearity Statistics Model Tolerance VIF 1 MINORITY .938 1.066 CLASSIFICATION EDUCATIONAL LEVEL .721 1.388 WORK EXPERIENCE .718 1.392 SEX OF EMPLOYEE .890 1.124 a. Dependent Variable: AGE OF EMPLOYEE 22
  • 23. Coefficientsa Collinearity Statistics Model Tolerance VIF 1 EDUCATIONAL LEVEL .610 1.641 WORK EXPERIENCE .025 40.044 SEX OF EMPLOYEE .763 1.311 AGE OF EMPLOYEE .028 35.539 a. Dependent Variable: MINORITY CLASSIFICATION 23
  • 24. The tolerance for a variable is (1 - R-squared) for the regression of that variable on all the other independents, ignoring the dependent. When tolerance is close to 0 there is high multicollinearity of that variable with other independents and the coefficients will be unstable. VIF is the variance inflation factor, which is simply the reciprocal of tolerance. Therefore, when VIF is high there is high multicollinearity and instability of the coefficients. 24
  • 25. As a rule of thumb, if tolerance is less than .20, a problem with multicollinearity is indicated. From above graph and considering VIF results, we can interpret there is very high collinearity among the independent variables. We can solve this problem through, • Ridge Regression • Principle component Regression • Dropping the most influential variables • Using Ratios or First Differences • Using Extraneous Estimates • Getting more data Concluding Comments : By analyzing the Multiple Regression, Considering the R2 (0.491) value ,we can infer that for overall estimation this model is not strong. Again, we have found that the model for Salary estimation for Employee of Coca- Cola company includes almost all collinear variables. But this model is very useful considering for having very low Heteroskedasticity and Autocorrelation problem. 25
  • 26. 26
  • 27. 26