Aleksey Narko
II year Management
Econometrics Final Project
I took the data set about the wealth of nations and in particular the dependence between the population and total wealth of the country (nation).
Source: http://data.worldbank.org/data-catalog/wealth-of-nations
2011 WSB-NLU
Professor: Jacek Leskow
1. Aleksey Narko
II year Management
Econometrics Final Project
I took the data set about the wealth of nations and in particular the dependence between the
population and total wealth of the country (nation).
Source: http://data.worldbank.org/data-catalog/wealth-of-nations
Step 1: Anova analysis
A) Anova
The first factor (independent variable) is the income(low, lower middle, upper
middle,high)
P value is less than 5 % so we reject the null hypothese at 5 % level.
That means that Income is different among different groups.
Hypotheses of One-Way Anova Analysis:
Reject Ho
H1: not all the Income means are the same
H1 : Not all μ i are the same
Conclusion: Not all the Income means are the same among different groups.
B) Checking homoskedasticity
Hartley Statistic
H= (max{s12…sk2})/(min{s12…sk2})
In my case H = 28775,7 which seems to be bad
2. Bartlett test
My p-value is much less than 5%. After these two tests I can draw a conclusion that I have a
very serious problem with heteroskedasticity.
I don’t need to use Anova.
Step 2: Regression
A) Regression analysis
•
R2=7,1% and it is very low, which means that the dividing line between small and big
Total Wealth is not well coincided with the line between the big and small population of
the countries.
•
The standard error is equal to 1, 94E+13 of total wealth which is quiet big, compared
with the smallest amount of population among the countries.
•
Slope=36787-unitary increase by one mean in the population gives us 36787 dollars in
Total Wealth more
•
Y =3,34E+12 (intercept) +36787*X (population)
•
T stat=Slope/Standard error=3,28
3. •
P value=0,0013< 5%-reject, there is a significance between cylinders and displacement.
•
[14618; 58956]
The first number is the least in total wealth we can get by increase in one unit of
population(man), and the second is the maximum we can get by increase in one unit.
•
F test :
H0: R2=0- reject, model significant
Ha: R2>0
•
After sorting the studentized residuals I got two King Kongs: 71 and 135, which are >3
•
After eliminating these King Kongs we get R2=12,2% which means that our result improved a little
bit more than for 5% as previous R2 before elimination was 7,1%,which means that we need to
make another elimination process.
Regression Statistics
Multiple R
0,349702
R Square
0,122292
Adjusted R Square 0,115932
Standard Error
6,7E+12
Observations
140
•
After sorting the studentized residuals I got 4 King Kongs: 69,47,133,51,which are bigger
than 3.
69
47
133
51
•
3,889923198
4,856760879
5,488564929
6,213019657
After eliminating these King Kongs we get R2=22,1% which means that our result improved for
more than 5% as previous R2 before elimination was 12,2%,which means that we need to make
another elimination process.
Regression Statistics
Multiple R
0,47065427
4. R Square
Adjusted R Square
Standard Error
Observations
•
After sorting the studentized residuals I got 3 King Kongs: 24, 110 and 129 which are
bigger than 3.
24
110
129
•
0,221515442
0,215705856
4,46825E+12
136
3,448344982
3,480936339
8,403463852
After eliminating these King Kongs we get R2=23,4% which means that our result improved for
less than 5% as previous R2 before elimination was 22,1%, which means that at this time
elimination process didn’t improve the model a lot and we can stop it.
Regression Statistics
Multiple R
0,488095929
R Square
0,238237636
Adjusted R Square 0,232422656
Standard Error
4,31091E+12
Observations
133
Significance
F
2,53083E-09
•
Our P value is equal to 2,53E-09 and it means that it is not significant, which means that the pour
fit of the model is definitely not due to the presence of outliers.
B) Checking Normality
Here we don’t have any problems, because our sample size is much bigger than 30.
C) Checking Homoscedasticity
In our case we got Heteroskedasticity, because the growing number of residuals means growing
spread of percentage.
5. Fact:
Total wealth study produces simple regression model with heteroskedastic outliers. This is the most
likely reason for the poor fit of the model.
D) Checking autocorrelation
lag
ACF
1
2
PACF
0,009747857
0,085140888
0,009747857
0,085244009
1.96/sqrt(n)
-1.96/sqrt(n)
0,169953554
-0,169953554
0,169953554
-0,169953554
My ACF is between U and L, which means that I don’t have autocorrelation.
STEP 3: Final Interpretation
All these results that I got tell me that even with the problems of heteroscedasticity and these King
Kongs my model is significant and because of the lack of the problem with autocorrelation there is a
powerful relation between the Population and Total Wealth of the country.
6. Fact:
Total wealth study produces simple regression model with heteroskedastic outliers. This is the most
likely reason for the poor fit of the model.
D) Checking autocorrelation
lag
ACF
1
2
PACF
0,009747857
0,085140888
0,009747857
0,085244009
1.96/sqrt(n)
-1.96/sqrt(n)
0,169953554
-0,169953554
0,169953554
-0,169953554
My ACF is between U and L, which means that I don’t have autocorrelation.
STEP 3: Final Interpretation
All these results that I got tell me that even with the problems of heteroscedasticity and these King
Kongs my model is significant and because of the lack of the problem with autocorrelation there is a
powerful relation between the Population and Total Wealth of the country.