Differences Among Four Word Lists in Noisy Background
1. STAT 7010 Auburn University
Feb 21, 2011
Equivalency Test of Word Lists in Noisy Background
Abstract
Randomized Complete Block Design was used to test the equality of four standard
audiology word lists in the presence of background noise. Twenty-four participants conducted
the hearing tests, in which four word list tapes were played to each participant in a random order.
Considering that the differences in personal traits may interfere with environmental factors, each
participant was treated as a block in this study. ANOVA, multiply comparisons, and diagnostic
tests were conducted. The results indicated a significant difference among the four word lists.
Specifically, words in list 1 were much easier to be recognized in the presence of background
noise than their counterparts in list 3 and 4. Model diagnostic tests gave an overall reliable result
that the normal and addictive requirements were satisfied, except for a distractive yet small
concern on equal variance. In conclusion, the four standard word lists were not as suitable to use
in noisy background as in quiet environment.
Introduction
Hearing aids assist people’s hearing by amplifying the voice volume (DASL, 1996).
However, the background noise can be amplified as well. This project focused on whether
different word lists, which are the standard audiology tools for assessing hearing in quiet
conditions, are equal in the level of difficulty in the presence of background noise. Fifty English
words, calibrated to be equally difficult to perceive under no noise condition, were used in this
study.
Randomized Complete Block Design (RCBD) was used by considering individual
differences as the block variable. Twenty-four participants with normal hearing was invited to
the experiment. Background noise was present during the experiment. They were guided to listen
1
2. STAT 7010 Auburn University
Feb 21, 2011
to four lists of standard audiology tapes at low volume. The order of tapes was randomized. Each
word was repeated by the participants in order to check whether they had correctly recognized
the word. The number of correct words was recorded as the dependent variable. The order of list
was randomized.
Data
The data set was obtained from online resources (DASL, 1996). It contains three
variables with 96 observations. Twenty-four subjects were recorded as the block variable. Four
word lists were considered as the independent variable while the hearing score corresponding to
each participant was the dependent variable. Whether the four lists were detected to have
different level of hearing scores was of our interest.
Methods
ANOVA test was conducted to explore the potential differences among four word lists in
noise background. If significant differences were shown as predicted, pairwise comparison was
further applied to analyze the inequality of means between pairs of lists. Additionally, the
normality, constant variance, and additivity assumption were tested to ensure the validity of the
design.
The statistical model for RCBD was:
yij = μ + τi + βj + εij, i=1, 2, 3, 4, j=1,…,24
where μ is the overall mean of hearing scores, τ is the effect due to treatment, β is the effect due
to block, ε is the effect due to random error, i (i=1, 2, 3, 4) represents four treatment levels, j
(j=1,…,24) is the block or the assigned identification number of each participant. The null
hypothesis of ANOVA analysis was: H0: μ1 =μ2=μ3 =μ4, and thus the alternative hypothesis was:
HA μi ≠ μj for at least one pair i ≠ j. Pairwise comparisons of means (Turkey, Bonferroni, &
2
3. STAT 7010 Auburn University
Feb 21, 2011
Scheffe procedure) were conducted to identify all possible differences between hearing scores of
two word lists.
Additionally, we assumed that error term ε follows normal distribution N (0, σ2) and σ2
should be equal cross all treatments (i.e., four lists). Thus, the constant variance assumption
would be: H0: and HA: . Shapiro-Wilk, Kolmogorov-Smirnov test, Cramer-von
Mises, and Anderson-Darling tests were conducted to test the normality assumption while
Levene’s test was used to check the constant variance assumption.
Since RCBD assumed that the effects due to blocks and treatments are additive, Tukey’s
1 df test was also included in our diagnostic tests. The interactive model in contrast to our initial
model was:
yij = μ + τi + βj + γ τi βj + εij, i=1, 2, 3, 4, j=1,…,24
where γ represents the interaction coefficients. Thus H0: γ = 0 and HA: γ ≠ 0 was the null and
alternative hypotheses, respectively.
Results
Principle Analysis
There were significant differences among the four word lists (F(3, 69) = 8.45, p < .0001,
for more information, please refer to Appendix - 1). Thus, multiple comparisons were employed
to check the potential differences among pairs of lists. All three pairwise comparison tests
(Turkey, Bonferroni, & Scheffe) indicated a same pattern that List 2, 3, and 4 were not
significantly different from each other while List 1 was not significantly different from List 2 but
distinguishable from the other two lists (Table 1). The easiest word lists was List 1 (Mean =
32.750) compared with other three lists.
3
4. STAT 7010 Auburn University
Feb 21, 2011
Table 1 Comparison of Means
List N Mean
1 24 32.750a
2 24 29.667ab
4 24 25.583b
3 24 25.250b
*Means with the same letter are not significantly different from each other
Diagnostic Analysis
Tukey's test of additivity (1 df Nonadditivity Test, Table 2) verified the assumption that
the RCBD model was additive and thus ruled out the possibility of interaction effect between
lists and participants (F0(1, 68) = 0.11, p = .7561, no evidence to declare nonadditivity).
Table 2 Turkey's 1 df Nonadditivity Test
Source df SS MS F-statistic P-value
Lists 3 920.4583 306.8194 8.34 <.0001
Subjects 23 3231.6250 140.5054 3.82 <.0001
Psquare 1 3.8888 3.8888 0.11 0.7461
Error 68 2502.6529 36.8037
Total 96 6658.6250 <.0001
Brown and Forsythe’s test for homogeneity showed no evident to indicate variance
heterogeneity among hearing scores of four word lists (F = 0.57, p = 0.63). Plots of residuals
further verified the homogeneity of variance assumption. However, the spread of residuals seems
to differ from subject to subject (F = 1.88, p = 0.228). Several outliers in Figure 1 (L) and (R)
suggested their peculiarity among other data points. Thus, there might be some concerns on
equal variance. In addition, the normality assumption was also verified since all four normality
tests gave a consensus result (Shapiro-Wilk test, p = .7811; Kolmogorov-Smirnov test, p >.1500;
Cramer-von Mises test, p > .2500; Anderson-Darling test, p > .2500, for residual plot, please
refer to Appendix - 2).
4
5. STAT 7010 Auburn University
Feb 21, 2011
Figure 1 Residual Plot: (L) Residual vs. Predict; (R) Residual vs. Block, (B) Residual vs. Treatment
Discussion and Summary
The RCBD was successful in this study considering a noticeable MS of block variable
(MSblock =140.5054, MSerror =36.3266, MSlist =306.8194). Since List 1 was the problematic one
that exhibited significant differences in comparison with other lists, the standard tools for
evaluation hearing in quiet environment may not be suitable to extend to conditions when the
background noise is present. To assess the hearing competency in a noisy environment, it might
be more suitable to pick List 2, 3, and 4 since they were less vulnerability to be unequal.
Furthermore, since the homogeneity assumption was not well supported in our analysis, it would
be necessary to transfer the data and return the principle analysis. Further studies may consider
more specific factors as block variables, such as age, gender, and habit of using headsets.
5
6. STAT 7010 Auburn University
Feb 21, 2011
References
DASL. (1996). The Data and Story Library: Hearing. Retrieved from
http://lib.stat.cmu.edu/DASL/Datafiles/Hearing.html
Loven, Faith. (1981). A Study of the Interlist Equivalency of the CID W-22 Word List
Presented in Quiet and in Noise. Unpublished MS Thesis, University of Iowa.
Appendices
1 - ANOVA table of principle test
Source df SS MS F-statistic P-value
List 3 920.458333 306.8194 8.45 <.0001
Subject 23 3231.625000 140.5054 3.87 <.0001
Error 69 2506.541667 36.3266
Corrected Total 95 6658.625000 <.0001
2 – Plot of normality test
3 – SAS code
data hearing; 10 1 32
input subject list score; 11 1 32
cards; 12 1 38
1 1 28 13 1 32
2 1 24 14 1 40
3 1 32 15 1 28
4 1 30 16 1 48
5 1 34 17 1 34
6 1 30 18 1 28
7 1 36 19 1 40
8 1 32 20 1 18
9 1 48 21 1 20
6