This document discusses the t-test, which is used to test hypotheses about population means using small sample sizes. It provides background on how the t-test was developed by William Gosset under the pseudonym "Student" for Guinness brewery research. The main applications of the t-test are explained: comparing a sample to a population mean, comparing means of two independent samples, and comparing means of paired samples. Examples are provided to demonstrate how to perform t-tests to compare sample means to hypothesized values and determine if differences are statistically significant. The document also notes that t-tests can be conducted using statistical software like SPSS.
1. Lecture 7
Test of Hypothesis for Small Sample
size
Dr. Ashish. C. Patel
Assistant Professor,
Dept. of Animal Genetics & Breeding,
Veterinary College, Anand
STAT-531
Data Analysis using Statistical Packages
2. Test of Hypothesis for Small Sample size (“t-test”)
• This test is used in case of small samples (Generally
n<30).
• Following are the assumption of the t-test:
1. The population is normal
2. Sample has been selected randomly
3. Sample size is small
4. Population standard deviation is not known.
3. • The t-statistic was introduced in 1908 by William
Sealy Gosset, a chemist working for the Guinness
brewery in Dublin, Ireland.
• Gosset had been hired due to Claude Guinness’s
policy of recruiting the best graduates from Oxford
and Cambridge to apply biochemistry and statistics
to Guinness’s industrial processes.
• Gosset planned the t-test as a cheap way to
monitor the quality of stout.
• The t-test work was submitted to and accepted in
the journal Biometrika, the journal that Karl
Pearson had co-founded and for which he served as
the Editor-in-Chief.
4. • The company allowed Gosset to publish his
mathematical work, but only if he used a
pseudonym (he chose “Student”).
• Gosset left Guinness on study-leave during the
first two terms of the 1906-1907 academic year
to study in Professor Karl Pearson’s Biometric
Laboratory at University College London.
• Gosset’s work on the t-test was published in
Biometrika in 1908.
• Although it was William Gosset after whom the
term "Student" is penned, it was actually through
the work of Ronald Fisher that the distribution
became well known as "Student's distribution"
and "Student's t-test".
5. Following are the main applications of t-test:
• To compare a sample mean with the population mean
(“Student’s” t- test)
• Comparison of two means from two independent
samples (Fisher’s-t test)
• Testing the significance of a mean difference in case of
paired observation (Paired t-test)
6. 1. To compare a sample mean with the population mean
(“Student’s” t- test)
• The student t-distribution is used for testing hypotheses
about the population mean for a small sample (say n <
30) drawn from a normal population.
• The test statistic is a t random variable:
t =
7. EXERCISE 1. : The data are lactation milk yields of
10 cows. Is the arithmetic mean of the sample,
3800 kg, significantly different from 4000 kg? The
sample standard deviation is 500 kg.
The hypothetical mean is ÎĽ0 = 4000 kg and the hypotheses
are as follows:
Ho: μ = 4000 kg , Ha: μ ≠4000 kg
• The sample mean is = 3800 kg.
• The sample standard deviation is s = 500 kg.
• The standard error is: =
• The calculated value of the t-statistic is:
t = = = -1.26
•
8. • For α level of significance or upper limit of rejection =
0.05 and degrees of freedom (n – 1) = 9, the critical
value is tα/2 = –2.262.
• Since the calculated t = –1.26 is not more extreme
than the critical value tα/2 = –2.262, H0 is not rejected
with an α = 0.05 level of significance.
• The sample mean is not significantly different from
4000 kg.
•
9. 2. Comparison of two means from two
independent samples when variances are
homogeneous (Fisher’s-t test)
• In experimental work generally it becomes necessary
to test whether the two samples differ from one-
another significantly in their means, or whether they
may be regarded as belonging to the same
population.
• Suppose we got two samples, X11, X12,…..,X1n1 and
X21,X22,…..X2n2 . The following statistics will be
calculated for testing the significance of the
difference between their means.
10. • = . = .
• = . = .
• =
• Where, is the pooled variance, also known as
combined variance and , are the variance of
sample 1 and 2 respectively.
• Now, t-test is given by the following equations :
• t =
• Where, D.F. = .
11. • EXERCISE 2: Two groups of 18 & 20 cows were fed two
different rations A and B respectively to determine
which of those two rations will yield more milk in
lactation. At the end of the experiment the following
sample means and variances (in thousand kg) were
calculated:
Ration A Ration B
Mean ( ) 5.50 6.80
Sample variance (s2) 0.206 0.379
Size (n) 18 20
Find out of there is any difference in the effect of
both the ration.
12. • The hypotheses for a two-sided test are:
• H0: μ1 – μ2 = 0
• H1: μ1 – μ2 ≠0
• To estimate pooled variance,
• = =
= 0.297
• The calculated value of the t statistic is :
t = =
13. • so, calculated t-value is -7.432. The critical value is
tα/2 = t0.025 = –2.03.
• Since the calculated value of t = –7.342 is more
extreme than the critical value –t0.025 = –2.03, the
null hypothesis is rejected with 0.05 level of
significance, which implies that feeding cows ration B
will cause them to give more milk than feeding
ration A.
14. iii). To compare the sample mean of paired samples
or dependent samples (Paired t-test)
• Under some circumstances two samples are not
independent of each other.
• A typical example is taking measurements on the same
animal before and after applying a treatment.
• The effect of the treatment can be thought of as the
average difference between the two measurements.
• The value of the second measurement is related to or
depends on the value of the first measurement.
• In that case the difference between measurements
before and after the treatment for each animal is
calculated and the mean of those differences is tested to
determine if it is significantly different from zero.
15. • Let di denote the difference for an animal i. The
test statistic for dependent samples is:
• t =
• where, and are the mean and standard
deviation of the differences, and n is the number
of animals. The testing procedure and definition of
critical values is as before, except that degrees of
freedom are (n – 1). For this test to be valid the
distribution of observations must be approximately
normal.
16. EXERCISE 5. The effect of a treatment is tested
on milk production of dairy cows. The cows
were in the same parity and stage of lactation.
The milk yields were measured before (1) and
after (2) administration of the treatment:
• Test that whether there is any effect of
treatment.
•
17. The hypotheses for a two-sided test in case of paired test are:
H0: μD = 0 H1: μD ≠0
n= 9, so = . = = 3.11
Measuremen
t
Cow
1
Cow
2
Cow
3
Cow
4
Cow
5
Cow
6
Cow
7
Cow
8
Cow
9
Total
1 27 45 38 20 22 50 40 33 18
2 31 54 43 28 21 49 41 34 20
Difference
(d)
04 09 05 08 -01 -01 01 01 02 28
d2 16 81 25 64 01 01 01 01 04 194
= =
( )
= 3.58
Now, calculated t-statistic is, t = = = 2.613
18. • The critical value for (n – 1) = 8 degrees of
freedom is t0.05 = 2.306. Since the calculated
value t = 2.553 is more extreme than the
critical value 2.306, the null hypothesis is
rejected with α = 0.05 level of significance.
The treatment thus influences milk yield.