Testing of Hypothesis and Goodness of Fit
This document discusses hypothesis testing and goodness of fit. It defines hypothesis testing as a procedure to determine if sample data agrees with a hypothesized population characteristic. The key steps are stating the null and alternative hypotheses, selecting a significance level, determining the test distribution, defining rejection regions, performing the statistical test, and drawing a conclusion. Common hypothesis tests discussed include the Student's t-test and chi-square test of goodness of fit.
Junnasandra Call Girls: đ 7737669865 đ High Profile Model Escorts | Bangalore...
Â
Testing of hypothesis and Goodness of fit
1. Testing of Hypothesis
and
Good ness of fit
Dr. Anil V Dusane
Sir Parashurambhau College, Pune, India
anildusane@gmail.com
www.careerguru.co.com
1
2. Hypothesis
⢠An hypothesis is a speculation concerning about an observed phenomenon.
If the speculation is translated into a statement about some condition of a
population/s that statement is referred as statistical hypothesis.
Examples of hypothesis.
1. A particular feed will increase the weight in buffaloes.
2. The use of new soil fumigant will drastically reduce the nematode problems.
3. An average life of a product produced by company A is more than that of
company B.
2
3. Definition of hypothesis test
⢠A hypothesis test is a procedure used to conclude
whether the characteristic measured in a sample
agrees reasonably well with the hypothesis or not.
⢠In other words, hypothesis test is a procedure
governed by certain rules, that leads to take a
decision about the hypothesis for its acceptance or
rejection on the basis of sample values.
3
4. Steps involved in testing of hypothesis
Following important steps are involved in the testing of
hypothesis.
1. Starting the Null or alternative hypothesis.
2. Selecting the level of significance.
3. Determining the test distribution.
4. Defining the rejection (or critical) regions.
5. Performing statistical test.
6. Drawing statistical conclusion.
4
5. Stating of the null hypothesis
⢠A hypothesis, which is to be actually tested for acceptance or rejection, is
termed as null hypothesis.
⢠According to Fisher null hypothesis is a hypothesis which is tested for
possible rejection under the assumption that it is true.
⢠In general, null hypothesis refers the hypothesis that the researcher wishes to
disprove since disproving that hypothesis provides a stronger proof to the
researcher.
⢠It is denoted by Ho. The hypothesis which faulty rejection is more harmful
is treated as null hypothesis.
5
6. Alternative hypothesis
⢠This hypothesis gives an alternative to the first (null) hypothesis.
⢠This hypothesis is accepted only when the null hypothesis is
rejected.
⢠It is a complement of the null hypothesis or it is a hypothesis
which is used to verify the null hypothesis.
⢠It is demoted by H1.
6
7. Selection of a level of significance
⢠This is the second step in hypothesis testing. It is the selection of ďĄ
(confidence limit) level of significance.
⢠Based on the observational data a test is performed to decide whether the
postulated hypothesis to be accepted or not and this involves certain
amount of risk.
⢠This amount of risk is termed as a level of significance.
⢠The amount of evidence required to accept that an event is unlikely to have
arisen by chance is known as the significance level or critical p-value.
7
8. Level of significance
⢠The level of significance establishes a criterion for rejection or acceptance of
null hypothesis.
⢠It is denoted by âďĄâ conventionally chosen as 0.05 or 0.01 ďĄ= 0.01(%1) is used
for high precision and ďĄ = 0.05 (%5) for moderate precision.
⢠In other words, a probability of Ho when it is true is referred as the level of
significance.
⢠It is nothing but probability of committing the type I error (rejecting the null
hypothesis when it is true.
⢠It is desirable to select small value of ďĄ. However smaller ďĄ of increases beta
risk or the probability of incurring type II error (accepting false null
hypothesis). 8
9. Types of Errors
⢠Since the decision of acceptance or rejection of Ho (null hypothesis) is based on the
sampling. It is subject to the two kinds of errors.
⢠Type I errorârejecting null hypothesis Ho when it is true.
⢠Type II errorâaccept of null hypothesis Ho when it is false.
⢠A common misconception is that a statistically significant result is always of practical
significance, or demonstrates a large effect in the population. Given a sufficiently large
sample, extremely small and non-notable differences can be found to be statistically
significant.
⢠One of the more common problems in significance testing is the tendency for multiple
comparisons to yield spurious significant differences even where the null hypothesis is true.
⢠The "insignificance" does not mean unimportant, and propose that the scientific community
should abandon usage of the test altogether, as it can cause false hypotheses to be accepted
and true hypotheses to be rejected. 9
10. Determine the test distribution
⢠The test statistic is the value used to determine whether null
hypothesis should be rejected or accepted.
⢠The choice of the appropriate statistical test is based on the value of
appropriate sampling distribution i.e. the normal or any other
distribution such a t- distribution.
⢠If our sample size is less than 30 (nďź30) then we must use âtâ
distribution.
10
11. Defining the rejection or critical regions
⢠The critical value is the point of determination between the acceptance
and rejection regions, it can be presented in the standard units of
measurements (Z-scale) or in actual units of measurements (x-scale).
⢠The figure shows the critical regions of a test statistic when the sapling
distribution is assumed normally distributed.
⢠If the test statistics value falls in the acceptance region, the null
hypothesis is accepted.
⢠If the value falls in the region in the rejection region (0.025+0.025 =
0.050), null hypothesis is rejected.
11
12. Performing statistical test
⢠If we use testing a hypothesis about the population mean
(m) we must first calculate the sample mean. From this
estimate S.E. and establish the test ratio in standard units.
⢠The acceptance or rejection boundaries (critical values)
should be represented in the actual units of measurements
(x-scale)
12
13. Drawing a statistical conclusion
⢠Having performed the statistical test, it is now possible
to determine whether the sample information agrees
reasonably well with the hypothesis and to make
inference about the population.
⢠The acceptance or rejection of null hypothesis is the
conclusion of the test.
13
14. Studentâs âtâ test
⢠To test the agreement between hypothesis and observation two biometrical tests viz. chi-
square (Ď2 test) and students t-test are performed.
⢠It is invented by Gosset ( 1876-1937).
⢠The statistics of small samples is revolutionized by this test.
⢠A t-test is any statistical hypothesis test in which the test statistics follows a Studentâs t-
distribution in null hypothesis is true.
⢠It is most commonly applied when the test statistic would follow a normal distribution if
the value of a scaling term in the test statistic were known.
⢠When the scaling term is unknown and is replaced by an estimate based on the data, the
test statistic (under certain conditions) follows a Student's t distribution.
14
15. Importance of Studentâs âtâ test
1. Small samples do not follow the normal distribution therefore when
biologists has to interpret and draw the decision from the very small samples
of data he has to often make use of t-test. e.g. if number of animals with
some rare disease that is available for examination may be quite small and in
such cases the doctors can make the use of t-test.
2. The t-test is the most common statistical procedure in the medical literature.
3. This method is used to find out whether the differences between two
different samples are significant or mere fluctuations of errors.
4. This test ensures whether the mean difference between two related
populations /treatments /genotypes is significant or nor can be tested.
15
16. Importance of Studentâs âtâ test
5. When the sample size is large the t-test distribution closely approximates to
the normal distribution.
6. If a symmetric distribution (mean, mode and median coincide) unlike the
normal distribution, the shape of t-distribution varies according to the size of
the sample or according to the degree of freedom. The shape of t distribution is
flatter than the z- distribution.
16
17. Unpaired t test
⢠This test is applied to unpaired data of independent observations made on
individuals of two different or separate groups or samples drawn from two different
populations.
⢠Steps to be followed for calculation of âtâ
1. Calculate the means of two samples are calculated and the difference between the
means of two samples.
2. Calculate the standard error of the difference between two means.
3. Calculate t test value by the ratio between the observed difference of means and its
standard error.
4. Calculated t value is compared with t table to find out the significance at the
particular degree of freedom.
17
18. Paired t test
⢠It is applied when each individual give a pair of observations. Here the paired data of
independent observation from one sample only to be compared. This kind of
observations are made available in biological sciences such as
a)To study the effect of fertilizer, pesticide, drug on the plant.
b) to compare the effect of two different fertilizers or drugs.
c) to compare the result of two techniques or the accuracy of two different instruments.
⢠Steps to be followed for calculation of âtâ
1. The difference in each set of paired observations are made.
2. The mean difference is calculated.
3. The standard deviation of the difference (SD) and then standard error of mean from the
same is calculated.
4. The calculated value of t is compared with t table to find out the significance at the
particular degree of freedom.
18
19. Chi-square test (Ď2) of goodness of fit
⢠Chi-square is denoted Ď2
⢠Ď2 = â (O-E) 2 /E where O= observed value; E- expected value.
⢠Chi-square is used to determine whether the deviations of the
observed data from the expected data are small or large enough to
accept or reject the fitted curve.
⢠Each curve is tested at 5% of one percent level of significance,
whenever the observed value of Ď2 exceeded the table of Ď2 0.05/ Ď2
0.001, the difference between the observed and expected data will be
considered highly significant and fit will be considered poor.
19
20. Chi-square test (Ď2)
⢠In other words, if the observed and Ď2 is less than or equal to the
table value of chi-square then we accept the fit.
⢠Chi-square (Ď2) permits to test whether observed frequencies in
distribution differ significantly from the frequencies which can be
expected according to some hypothesis.
⢠Chi-square is also used to test how closely the actual distribution
approximates to a particular theoretical distribution.
⢠The validity of the normal distribution can be tested by chi-square
test.
20
21. Application of chi-square test
Chi-square has three major application related to statistics are
1.Testing of goodness of fit.
2.Testing for independence.
3.Testing homogeneity.
⢠Chi-square has applications in the field of bioinformatics. Chi-
square test is applied to compare the distribution of certain
properties of genes such as genomic content, mutation rate, etc.
belonging to different categories viz. disease genes, essential
genes, genes on a certain chromosome etc.
⢠It is also used for solving modern cryptographic problems.
21