Factors affecting test scores and test evaluation in class
1. 1
Topic 7: Factors affecting test scores &
simple test evaluation for class teacher
(Bachman, 1990, Chapters 6 &7)
(Madsen, 1983, Chapter 9)
Hoa Nguyen
2. 2
Factors affecting test scores
(test bias)
Communicative language ability
Test scores
Test method facets
Random factors
Personal attributes
3. 3
• Test method facets such as test format/ response format, input
format, length of test
– Same testing skills and sub-skills but different test formats or
response formats (may be in favour of one group but not the other)
• Test content
– Culture background
• Culture features embedded in test content may biased one group but not
the other
– Background knowledge
• Eg. an IELTS test listening section 3 about process of doing an
assignment in Western education or a mini-lecture about birds in
Tasmania, a TOEFL iBT lecturer or a lecture about Pluto.
• In ESP testing, it is necessary to distinguish between language proficiency
and background knowledge and the test designed should define specific
language ability as part of language ability to be tested. Score from an
ESP reading test should not be interpreted as general reading ability.
Part 1: Factors affecting test scores
4. 4
• Cognitive characteristics
– Field independence
• Field independence is “the extent to which a person perceives part of a
field as discrete from the surrounding field as a whole, rather embedded,
or… the extent to which a person perceives analytically” (p.275).
• Test takers who are field independent are likely perform better than those
field dependence especially discrete point tests.
– Ambiguity tolerance
• Ambiguity tolerance is “a person’s ability to function rationally and calmly
in a situation in which interpretation of all stimuli is not clear” (p. 227).
• Individuals with high ambiguity tolerance might perform better than those
with low ambiguity tolerance.
– Eg. Cloze test (clearer till the end of test, dictation: some words could not be
recognized until the second or last item of reading).
• This evidence is not so clear with multiple-choice response format though
research show a significant (but low) correlation between scores on a
measure of ambiguity and multiple choice measures of E proficiency.
5. 5
• Random factor: testing environment
• Native language background, ethnicity, sex,
and age
• These characteristics of test takers are not
facets of test methods they cannot be
considered as possible sources of
measurement error.
• All these can only provide information about
how language learning varies with age,
ethnicity, sex, and other individual
characteristics.
6. 6
Part 2: Practical approach of test
evaluation for class teacher
• Preparing an item analysis
– score all the test takers
– rank them from the highest to the lowest and
divide them equally into three groups high –
middle - low
– record students’ responses: from high group
to low group; if it is a multiple choice, circle
the correct option.
7. 7
Item 1 High group Low group
A / ///
(B) //// //
C // /
D / //
X / //
8. 8
1. Difficulty level
High correct + Low
correct
Hc + Lc
Total number in sample
(H +L)
or
N
Eg. In this case, (5+2):20 = 7/20 = 35%
Note. easy ≥90%, difficult ≤ 30%
2. Discrimination level
High correct - Low
correct
Hc - Lc
Total number in sample
(H +L)
or
N
Eg. In this case, (5-2):20 = 3/20 = 15%, ≥15% is acceptable;
from 10% to 15% is questionable.
Note. ≤ 10% discrimination is not acceptable
9. 9
• Distractor evaluation
• Weak distractors: poor discrimination or not
chosen by test takers.
• Only one or two distractors attract attention.
Reasons
– the item has been revised well in the class and
everyone masters it: the answer is obvious
– no other choices seem likely to be the answers
– obviously impossible distractor(s)
• If many items are left blank near the end of the
test, the test should be shortened or more time
should be given to it.
10. 10
1. Practice exercises
Item 1 (D) Item 2 (A) Item 3 (B) Item 4 (C)
High Low High Low High Low High Low
A // // // //// //
B / // //// / //
C //// // / // //// //// //// //
D //// / //// / / /// //
X / / / //
Questions:
1. Calculate the level of difficulty for each of the four items
Item 1 = Item 2 = Item 3 = Item 4 =
Which of these items are too difficult, and which are too easy?
2. Calculate the discrimination of each item.
Which item has the poorest discrimination? ………………………….
Which item has unsatisfactory discrimination? ………………………….
Which item(s) has borderline? ………………………….
3. Look at the distractors on the four items.
In which are they the most effective? ………………………….
In which are they the least effective? ………………………….
4. Is there any item with negative discrimination? If so, which one?
………………………………………………………………………
5. Which item did the fewest students leave blank? ………………………….
Which item did the most leave blank? ………………………….
11. 11
Question 1
• 1 = 50% 2 = 39% 3 = 33% 4 = 89%
• none = too difficult none = too easy
Question 2
• 1 = 5%; 2 = negative (no discrimination); 3 = 22%; 4 = 11%
• 2 = poorest discrimination and it is unsatisfactory
• no other unsatisfactory or borderline items
Question 3
• 3 = most effective 4 = least effective
Question 4
• Yes; 2 = negative
Question 5
• Fewest leave blank = 4 Most leave blank = 2 and 3