This document provides an overview of key concepts in educational measurement and evaluation, including definitions of basic terms like tests, measurement, assessment and evaluation. It discusses the differences between these terms and provides guidelines for writing instructional objectives. Bloom's Taxonomy of educational objectives is also summarized, which categorizes objectives into three domains: cognitive, affective and psychomotor. The cognitive domain includes levels like knowledge, comprehension and evaluation. The affective domain focuses on feelings and attitudes.
1. EDUCATIONAL
MEASUREMENT AND
EVALUATION
A Teaching Note For Regular
Undergraduate Students
12/8/2020
1
CREATED FOR BY:
Dr. Mengistu Debele Gerbi (Ph.D)
Ambo University
2020
Prepared by Mengistu Debele
2. UNIT 1: BASIC CONCEPTS AND PRINCIPLES IN
EDUCATIONAL MEASUREMENT AND EVALUATION
1.1 Meaning and Definitions of Basic Terms
Dear students, can you define the following terms?
- TEST
- TESTING
- MEASUREMENT
- ASSESSMENT
- EVALUATION
212/8/2020 Prepared by Mengistu Debele
3. TEST
According to Gronlund (1981), test refers to the
presentation of a standardized set of questions to be
answered by pupils. It is also an instrument for
measuring samples of a person’s behavior by posing a set
of questions in a uniform manner.
Test is a method used to determine students’ ability to
complete certain tasks or demonstrate mastery of a skill
or knowledge of content.
Test is a method of measuring person’s quality, ability,
knowledge or performance in a given domain.
312/8/2020 Prepared by Mengistu Debele
4. 1. Intelligence test, 12. Speed test
2. Personality test, 13. Power test
3. Aptitude test, 14. Objective test
4. Achievement test, 15. subjective test
5. Prognostic test, 16. Teacher-made test
6. Performance test, 17. Formative test
7. Diagnostic test, 18. Summative test
8. Preference test, 19. Placement test
9. Accomplishment test, 20. Standardize test
10. Scale test , 21. Norm-reference test
11. Criterion-reference test,
General Classification of test
12/8/2020 4Prepared by Mengistu Debele
5. Testing is the process of administering the test to the
pupils.
a technique of obtaining information needed for
evaluation purposes.
Tests, quizzes, measuring instruments- are devices
used to obtain such information.
Test is the most commonly used method of making
measurements in education.
TESTING
12/8/2020 5Prepared by Mengistu Debele
6. MEASUREMENT
Students! Are test and measurement the same? Of course, they are not the same.
Measurement refers to giving or assigning a numerical value to certain attributes
or behaviors.
It is a systematic process of obtaining the quantified degree .
It is an assignment of numbers (quantity), uses variety of instruments: test, rating
scale.
It is the process of obtaining numerical description of the degree of individual
possesses. Quantifying of how much does learner learned. E.g. Ujulu scored 8/10
in M &E test.
A measurement is takes place after a test is given and score is obtained. Thus, test
are instrument of measurement. We use test, project, home work, laboratory work,
quiz, and assignment as instruments of measurement.
Measurement can also refers to both the score obtained and the process
used.
612/8/2020 Prepared by Mengistu Debele
7. ASSESSMENT
Assessment is a process of gathering and organizing data in order
to monitor the progress of students’ learning.
According to Airasia(1997) and Kruikshank, et al.(1999),
assessment refers to the process of collecting (through paper-and-
pencil tests, observation, self-reported questionnaires) ,
interpreting, and synthesizing information to aid in decision
making. So that, it is the basis for decision making (evaluation)
It is the process of collecting, recoding, scoring, describing, and
interpreting information about learning.
The goal of assessment is to make improvement.
It deals with the nature of the learner/(what s/he learned, how s/he
learned).
712/8/2020 Prepared by Mengistu Debele
8. CON’D
• Assessment is a systematic basis for making
inference about the learning and development
of students, the process of defining, selecting,
designing, collecting, analyzing, interpreting
and using information to increase students’
learning and development.
812/8/2020 Prepared by Mengistu Debele
9. EVALUATION
Students! Are assessment and evaluation the same? Surely they are
not the same. According to Gage and Berliner(1998), evaluation
refers to the process of using information to judge the goodness,
worth or quality of students’ achievement, teaching programs, and
educational programs. It is concerned with making judgments on the
worth or value of a performance, answer the question “how good,
adequate, or desirable”. It is also the process of obtaining, analyzing
and interpreting information to determent the extent to which
students achieve instructional objective.
It occurs after assessment has been done because teacher here is in a
position to make informed judgment.
912/8/2020 Prepared by Mengistu Debele
10. CON’D
Hence, Evaluation, deals with determining the value and
effectiveness of something-often a program. From this
standpoint, an evaluation encompass an assessment
initiative as the source for making judgments about
program quality.
Evaluation is the process of making judgment of a
product, a response, or a performance based on a
predetermined criteria.
Thus, evaluation is simply the quantitative description
and/or qualitative description of a behavior plus value
judgment. Dear students! Which of the two is a more
comprehensive, assessment or evaluation?
12/8/2020 10Prepared by Mengistu Debele
11. 1.2. ROLES AND PRINCIPLES OF EVALUATION
Dear students! Can you mention major uses of
evaluation results in improving teaching-
learning process? Great, according to Gronlund
and Linn (1995), evaluation used in:
Programmed instruction
Curriculum development
Marking and reporting
12/8/2020 11Prepared by Mengistu Debele
12. Functions of Evaluation
1. Evaluation assesses or make appraisal of
- Educational objectives, programs, curriculums,
instructional materials, facilities
- Teacher
- Learner
- Public relations of the school
- Achievement scores of learner
2. Evaluation products research
12/8/2020 12Prepared by Mengistu Debele
13. CON’D
• In view of Gronlund (1981), evaluation results are
used for making decisions related to instruction,
guidance and counseling, administration, and
research.
• Well and systematically organized results of
evaluation help teachers to improve teaching through
judging the adequacy and appropriateness of
instruction referring to objectives, contents, teaching
methods and assessment techniques.
12/8/2020 13Prepared by Mengistu Debele
14. 1.3. Principles of evaluation
• Evaluation is effective when it is based on sound
principles. As to Gronlund and Linn(1995) major
principles of evaluation are
1. Evaluation should give priority to the determination
of what to be measured.
2. The types of test items format/evaluation techniques
to be used should be determined by the specific
learning outcomes to be measured.
3. Comprehensive evaluation requires variety of
techniques.
4. A proper use of evaluation techniques requires an
awareness of their limitation and strengths.12/8/2020 14Prepared by Mengistu Debele
15. CON’D
5. Test items should be based on a representative sample
of the subject contents and the specified learning
outcomes to be measured,
6. Test items should be of the appropriate level of
difficulty.
7. Test items should be constructed so that extraneous
factors do not prevent students from responding.
8. Test items should constructed so that the students
obtain the correct answer only if they attained the
desired learning outcomes.
9. Test items should be constructed so that it contributes
to improvement of the teaching –learning process.12/8/2020 15Prepared by Mengistu Debele
16. To summarize the principles of evaluation
• Evaluation should be:
1. Based on clearly stated objectives
2. Comprehensive
3. Cooperative
4. Used judiciously
5. Continuous and integral part of teaching-
learning process
12/8/2020 16Prepared by Mengistu Debele
17. Unit -2
THE ROLE OBJECTIVE IN
EDUCATIONAL MEASUREMENT AND
EVALUATION
12/8/2020 17Prepared by Mengistu Debele
18. 2.1. Definition of important terms
Dear students! Can you define outcome,
aims/goals, objectives, learning outcomes?
• According to Gronlund and Linn(1995),
outcome is what occurs as a result of an
educational experience.
• Goals are broad statements of educational intent
that give the overall purpose and desired
outcomes (Ellington & Shirley, 1997).
12/8/2020 18Prepared by Mengistu Debele
19. CON’D
• Objectives are sets of more detailed statement that
specify the means by which different aims of the
course that relate to the activities that it involves and
content that it covers(Cruikshank, 1999).
• Learning outcomes are sets of statement even more
detailed statements that specify the various things that
students will be able to do after successful completion
of the learning process.
12/8/2020 19Prepared by Mengistu Debele
20. 2.2. Importance of stating objectives
• The properly stated instructional objectives
serve as guides for both teaching and testing.
The following are functions of objectives:
identify the intended students’ outcome, help
teacher to plan instruction, provide criteria for
evaluate students’ outcomes, help in selecting
appropriate instruction, provide a public record
of intent, help teacher to guide & monitor
students’ learning.
12/8/2020 20Prepared by Mengistu Debele
21. 2.3. Guideline for writing instructional objectives
1. Objectives should be stated in terms of student
performance/ behavior
2. Objectives should be stated in terms of the learning
outcomes and not in terms of the learning process.
3. Objectives should be sufficiently free from the
influence of course content
4. Statement of the objectives should be an
amalgamation of subject-matter and desired behavior.
12/8/2020 21Prepared by Mengistu Debele
22. CON’D
5. Avoid the use of more than one type of learning
outcomes in each objectives.
6. Begin each specific objectives with an action verb
which indicates what the students have to do , or
demonstrate
7. Write sufficient number of specific objectives (5-
8)for each general objectives so as to adequately
describe the students behavior for achieving the
general objective.
12/8/2020 22Prepared by Mengistu Debele
23. 2.4. Methods of stating instructional objectives
• According to Gronlund (1981), there are two
levels of stating instructional objectives.
1. Stating general objectives as intended
learning outcomes
2. Listing a sample of specific objectives
corresponding to and representative of each
other objectives.
12/8/2020 23Prepared by Mengistu Debele
24. 2.5. Taxonomy of Educational objectives
• Taxonomy means 'a set of classification
principles', or 'structure’.
• Domain means 'category’ or distinct area.
The most well known description of learning
domains was developed by Benjamin Bloom
that known as Bloom’s Taxonomy.
12/8/2020 24Prepared by Mengistu Debele
25. The three domains of Bloom’s taxonomy of
educational objectives
12/8/2020 25Prepared by Mengistu Debele
26. Cognitive domain
• RATIONAL LEARNING: THINKING
Emphasis upon knowledge, using the
mind, and intellectual abilities.
Often referred to as Instructional or
behavioral objectives that begin with
verbs
12/8/2020 26Prepared by Mengistu Debele
27. CON’D
• Behaviors are taught
to be cumulative,
going from simple to
more complex
mental behaviors.
12/8/2020 27Prepared by Mengistu Debele
28. CON’D
• Bloom and Krathwohl (1956)
divided cognitive domain into
six distinct levels, each level
building on those below and
representing progressively
higher level of cognition.
Knowledge
Evaluation
Synthesis
Analysis
Application
Comprehension
12/8/2020 28Prepared by Mengistu Debele
29. 1. Knowledge
Remembering of previously learned
material/information.
• Emphasizing facts, information, and specifics.
Involves remembering material in form very
close to how it was originally presented.
• Depends upon memorizing or identifying facts
without asking beyond.
12/8/2020 29Prepared by Mengistu Debele
30. 2. Comprehension
Defined as the ability to understand the
meaning of material.
• Understanding and interpreting information.
• Grasping the meaning and intent of the
material.
• Deals with content and involves ability to
understand what is being communicated.
12/8/2020 30Prepared by Mengistu Debele
31. 3. Application
Applying procedures/systems/rules/ in specific
situation.
• It is the ability to use learned material in new
situations.
• Using what is remembered and comprehended.
• Applies learning to real life, new, and/or
concrete situations.
• It is ability to use knowledge and learned
material in meaningful ways.
12/8/2020 31Prepared by Mengistu Debele
32. 4. Analysis
Reasoning
• Breaking a system/material down into its
constituent elements or parts so as to see its
organizational structures and determining the
relationships of these parts to each other and to
the whole.
• It is ability to break material down into
specific parts so that the overall organizational
structure may be comprehended.
12/8/2020 32Prepared by Mengistu Debele
33. 5. Synthesis
Creating
• It is ability to put parts together to form a
whole.
• Putting together diverse skills, abilities, and
knowledge to accomplish a particular new task
or form a new whole.
• Organizing ideas into new patterns and putting
materials together in a structure which was not
there before.
12/8/2020 33Prepared by Mengistu Debele
34. 6. Evaluation
Evaluating
• It is the ability to judge the worth of material
for a given purpose.
• Making judgment/critical comparisons on the
basis of agreed criteria.
• Judging the values of ideas, methods,
materials, procedures, and solutions by
developing and/or using appropriate criteria.
12/8/2020 34Prepared by Mengistu Debele
36. Affective domain
• EMOTIONAL LEARNING: FEELING
Concerned with feeling, attitudes,
appreciations, interests, values and
adjustments.
Bloom and Krathwohl (1964) divided affective
domain into five distinct hierarchical levels.
12/8/2020 36Prepared by Mengistu Debele
37. CON’D
• The affective domain (Krathwohl, Bloom &
Masia, 1973) includes the manner in which we
deal things emotionally, such as feelings,
values, appreciation, enthusiasm, motives and
attitudes.
• Since affective domain is concentrated with a
student attitudes, personal beliefs, and values,
measuring educational objectives in this
domain is not easy.
12/8/2020 37Prepared by Mengistu Debele
38. Bloom and Krathwohl (1964)’s levels
of affective domain
• The five major categories of affective domain listed
from simplest to most complex as follow:
Receiving-developing an awareness of sth.
Responding-showing active interest in sth.
Valuing- take up positive attitudinal position.
Organization-making adjustment to the value
Characterization-integrating one’s attitude into a
total all self-embracing philosophy.
12/8/2020 38Prepared by Mengistu Debele
39. Educational objectives and State of mind
Educational objectives State of mind
• Receiving Willingness to pay attention
• Responding React voluntarily or complies
• Valuing Acceptance
• Organization Rearrangement of value system
• Characterization Incorporates values in life
1.Receiving: the lowest level; the student passively pay
attention. Without this level no learning can occur.
12/8/2020 39Prepared by Mengistu Debele
40. CON’D
• It is the ability of the student to be attentive to
particular stimuli.
• It includes awareness(giving appropriate
opportunity to learner to be merely conscious
of something), willingness to receive (tolerate
new stimulus, not avoid it), and control or
select attention(attention to differentiating
stimulus from competing and distracting
stimuli)
12/8/2020 40Prepared by Mengistu Debele
41. 2. Responding
• The student being an active participant.
• Showing some new behaviors beyond mere attending. It
implies active attending of something with or about
phenomena.
1. Acquiescence in responding: obedience or compliance
are word to describe this behavior.
2. Willingness to respond: capacity for voluntary activity.
3. Satisfaction in response: feeling of satisfaction and
enjoyment.
12/8/2020 41Prepared by Mengistu Debele
42. 3.Valuing
• The worth the student attaches to some entity.
• Implies perceiving them as having worth and
consequently revaluing, consistency in behaviors
related to this phenomena and showing some definite
involvement or commitment.
• It is a motivated behavior by individual commitment to
the underlying value guiding behavior.
• It can be described as acceptance of value, preference
for a value and commitment.
12/8/2020 42Prepared by Mengistu Debele
43. 4. Organization
• Bringing together things into a whole.
• Organization is defined as conceptualization of the value
and integrating a new value into one’s general set of
values relative to other priorities.
• Value refers to an individual's life style that has been built
on his/her value system and that controls his/her behavior.
1. Conceptualization of value: permit the individual to see
how the value relates to the existing value he/she hold
2. Organization of the value system: analyzing the complex
objectives of various value system to new value which is
harmonious.
12/8/2020 43Prepared by Mengistu Debele
44. 5. Characterization by value
• Acting consistently with new value; person is
known by the value. It is the level at which a
person integrates one’s beliefs, ideas, and
attitudes into a total all-embracing life
philosophy.
12/8/2020 44Prepared by Mengistu Debele
45. Psychomotor domain
• PHYSICAL LEARNING: DOING
Emphasizes speed, accuracy, dexterity, and
physical skills.
• This domain includes objectives related to
muscular or motor skill, manipulation of
material and objects, and neuromuscular
coordination.
• Developed by Harrow (1972), Simpson 1972
12/8/2020 45Prepared by Mengistu Debele
46. Harrow’s levels of Psychomotor domain
Reflex movements
Basic-fundamental movements
Perceptual abilities
Physical abilities
Skilled movements and
Non-discursive communication.
12/8/2020 46Prepared by Mengistu Debele
47. Dave’s level of psychomotor domain
Imitation: observing and pattering behavior after
someone else.
Manipulation: being able to perform certain actions
following instructions and practicing.
Precision: refining, becoming more exact , few
errors are apparent
Articulation: coordinating a series of actions,
achieving harmony, and internal consistency.
Naturalization: having high level performance
become natural with out need to think much about it.
12/8/2020 47Prepared by Mengistu Debele
48. Simpson’s hierarchical taxonomy
Perception: process of becoming aware of
objects, qualities etc by way of senses. This
basic in situation-interpretation-action chain
leading to motor activity.
Set: readiness for a particular kind of action or
experience: may be mental, physical or
emotional.
Guided response: overt behavioral act under
guidance of instructor, or following model, or
set of criteria.
12/8/2020 48Prepared by Mengistu Debele
49. Mechanism:
learned response becomes habitual; learner has
achieved certain confidence and proficiency or
performance.
Complex overt response: performance of
motor act considered complex because of
movement pattern required.
Adaptation: altering motor activities in meet
demands of problematic situations.
12/8/2020 49Prepared by Mengistu Debele
50. Organization:
creating new motor acts or ways of manipulating
materials out of the skills, attributes, and
understandings developed in psychomotor
area.
12/8/2020 50Prepared by Mengistu Debele
51. UNIT 3
Planning Classroom Tests
• Development of good questions or items must
follow a number of principles.
• The development of valid, reliable and usable
questions involves proper planning.
• The validity, reliability and usability of such
test depend on the care with which the test are
planned and prepared.
12/8/2020 51Prepared by Mengistu Debele
52. 3.1. Some pit falls in teacher-made tests
• Most teacher-tests are not appropriate to the
different levels of learning outcomes.
• Many of the test exercises fail to measure what
they are supposed to measure. In other words
most of the teacher-made tests are not valid.
• Some classroom tests do not cover
comprehensively the topics taught(lacking content
validity).
• Most tests prepared by teacher lack clarify in the
wordings.
• Most teacher-made tests fail item analysis test.
12/8/2020 52Prepared by Mengistu Debele
53. 3.2. Considerations in Planning Classroom Tests
Guide in planning a classroom test
• Determine the purpose of the test.
• Describe the instructional objectives and content to be measured.
• Determine the relative emphasis to be given to each learning
outcome.
• Select the most appropriate item formats (essay or objective).
• Develop the test blue print to guide the test construction.
• Prepare test items that are relevant to the learning outcomes
specified in the test plan.
• Decide on the pattern of scoring and the interpretation of result.
• Decide on the length and duration of the test, and
• Assemble the items into a test, prepare direction and administer
the test.
12/8/2020 53Prepared by Mengistu Debele
54. 3.3. Steps in planning classroom tests
• Before constructing a test, test constructors
should ask themselves the following
questions:
• What should I measure?
• What knowledge, skills, and attitudes do I
want to measure?
• Would I test for factual knowledge or the
application of this factual knowledge?
12/8/2020 54Prepared by Mengistu Debele
55. The answers to these questions mainly depend on:
Instructional objectives
Intended learning outcomes
The nature of the subject-matter imparted, and
The emphasis given to the content.
As suggested by Gronlud(1981), the construction of
good tests requires adequate and extensive planning
so that: instructional objectives, teaching strategies,
textual materials, and evaluation procedures are all
inter-related in some meaningful fashion.
12/8/2020 55Prepared by Mengistu Debele
56. 3.4. Planning stage of test development
The planning stage of test development has
the following major steps
1. Determining the purpose of testing
2. Developing test specification(TOS)
3. Selecting appropriate item types
4. Preparing/writing relevant test items.
12/8/2020 56Prepared by Mengistu Debele
57. 3.5. Table of specification /TOS/
• TOS is a test blue print
• Table of specification is a two dimensional table that
specifies the level of objectives in relation to the
content of the course.
• A well planned TOS enhances content validity of that
test for which it is planned.
• The two dimensions (content and objectives) are put
together in a table by listing the objectives across the
top of the table (horizontally) and the content down the
table (vertically) to provide the complete framework for
the development of the test items.
12/8/2020 57Prepared by Mengistu Debele
58. 3.6. Item writing
Guide for item writing
• Follow your TOS when you are writing the test items.
• Generate more items than specified in the table of
specification.
• Use unambiguous language so that the demands of the item
would be clearly understood.
• Endeavor to generate the items at the appropriate levels of
difficulty as specified in the table of specification.
• Give enough time to allow an average student to complete the
task.
• Build in a good scoring guide at the point of writing the test
items.
• Have the test exercises examined and critiqued by one or more
colleagues.
• Review the items and select the best according to the laid
down table of specification/test blue print.
12/8/2020 58Prepared by Mengistu Debele
59. Unit 4
Construction of Classroom Tests
• Based upon the type of item format used,
teacher-made classroom tests are divided into
two type- objectives versus subjective tests.
• Objective tests are tests with definite answers.
• One and only one correct answer is available
to a given item for objective tests.
• Thus, they have high scorers reliability.
12/8/2020 59Prepared by Mengistu Debele
60. 4.1. Writing Objective Test Items
• Objective tests are of two types
• 1. Selection-typed item (student requires to select
answer)- multiple choices, true-false, matching type.
• Selection-type items are a fixed-response type.
• 2. Supply-typed items (students requires to supply
answer) - Completion, short answer
Supply-typed items are a free-response type.
12/8/2020 60Prepared by Mengistu Debele
61. CON’D
• To obtain the correct answer, students must
demonstrate the specific knowledge, understanding,
or skill. They are not free to redefine the problem or
to organize and present the answer in their own
words.
• This type of method contributes to scoring that is
quick, easy, and accurate.
• Negative side: inappropriate for measuring the ability
to formulate problems and choose an approach to
solving them or the ability to select, organize, and
integrate ideas.
12/8/2020 61Prepared by Mengistu Debele
62. Types of Objective Test
Objective test items
Supply test items Selection test items
Short Answers Completion Arrangements True- False Matching Multiple Choices
12/8/2020 62Prepared by Mengistu Debele
63. CON’D
• There are five main components to an objective test, such as
– Short answer
• What is the capital of Ethiopia?
– Completion
• In the equation 2X+5=9, X=____
– Matching
• Color of the sky A) Brown
• Color of the dirt B) Blue
• Color of the trees C) Green
– True-False or Alternative Response
• T F An atom is the smallest particle of matter.
• Yes No Acid turns litmus paper red.
– Multiple Choice
• In the equation 2X+5=9, 2X means
A)2 plus X
B)2 multiplied by X
C)2 minus X
12/8/2020 63Prepared by Mengistu Debele
64. 4.1.1.Supply Item Type
• Supply or free-response objective test requires
the testee to give very brief answers to the
questions. These answers may be a word, a
short phrase, a number, a symbol or number.
• If test items consist direct questions, they
require short answers (short-answer type)
• If test items consist incomplete statement, they
require responses that must be supplied by an
examinees (completion type).
12/8/2020 64Prepared by Mengistu Debele
65. Examples
1. What is the largest lake in Ethiopia?
2. Who is the first women athlete in Africa to win a
gold medal in Olympics?
3. The largest lake in Ethiopia is ____.
4. The name of the first women athlete in Africa to win
a gold medal in Olympics is ____.
Question1&2 are direct questions end by question mark,
so that they are short answer type.
Question 3&4 are incomplete sentences, so that they are
completion type.
12/8/2020 65Prepared by Mengistu Debele
66. Uses of supply item types
• They are suitable for measuring a wide variety
of relatively simple learning outcomes such as
recall of memorized information and problem
solving outcomes measured in mathematics
and sciences.
• They used when it is most effective for
measuring a specific learning outcome such as
computational learning outcomes in
mathematics and sciences.
12/8/2020 66Prepared by Mengistu Debele
67. Advantages of supply test items
• Reduce problems of guessing: It minimizes
guessing because the examinees must supply
the answer by either think and recall the
information or make the necessary
computations to solve the problem presented.
• Easy to construct.
• Effective in measuring spelling ability of
students.
• Encourage intensive study.
12/8/2020 67Prepared by Mengistu Debele
68. Weakness of supply items types
• Do not measure complex learning outcomes
• Scoring is relatively difficulty
• Excessive use can encourage rote memory and
poor study habit
• Limited to questions that can be answered by a
word, phrase, symbol or number.
• Sometimes it is difficult to phrase the question
as incomplete statement.
12/8/2020 68Prepared by Mengistu Debele
69. Suggestions to construct supply item types
• There must be only one correct answer
• The wording must be clear and specific to avoid
ambiguous to response
• Avoid too many blank spaces in the same sentence
• Do not take statements directly from textbooks
• Direct question is generally more desirable than an
incomplete statement.
• Do not use articles ‘the, a, an’ immediately preceding
a response.
• The length of blanks should be approximately equal.
12/8/2020 69Prepared by Mengistu Debele
70. Tip to write Completion Items
• Avoid statements that are so indefinite which
may be answered by several terms.
• Be sure that the language used in the question
is precise and accurate.
• Word the statement such that the blank is near
the end of the sentence rather than near the
beginning.
• If the problem requires a numerical answer,
indicate the units in which it is to be expressed.
12/8/2020 70Prepared by Mengistu Debele
71. 4.1.2.Selective item types
• True-false/alternative response items
• Consists of a declarative statement that
students asked to response true or false, right
or wrong, correct or incorrect, yes or no, agree
or disagree.
• Used to measure simple learning outcomes.
12/8/2020 71Prepared by Mengistu Debele
72. Suggestions for Writing True-False items
• The desired method of marking true or false
should be clearly explained.
• Construct statements that are definitely true or
definitely false.
• Use relatively short statements.
• Keep true and false statements at
approximately the same length.
• Be sure that there are approximately equal
number of true and false items.
• Avoid using double negative statements.
12/8/2020 72Prepared by Mengistu Debele
73. Avoid the following when write True-false items
• verbal clues and complex statements
• Broad general statements that are usually not true or
false without further qualification.
• Terms denoting indefinite degree(e.g.. Large, long
time, regularly)
• Avoid absolute terms like never, only, always ,etc
• Placing items in a systematic order (TTFFTT,
TFTFTF,TTFTTFTT, etc.)
• Taking statements directly from the text.
• Avoid trivial statements.
• Don’t use statements partly true partly false.
12/8/2020 73Prepared by Mengistu Debele
74. Writing matching items
• Matching has two column. The question or
problems column is called premises, the answer
column is called responses.
• Short homogenous lists of premises written on
right hand side under column B whereas
responses written on left hand side under
column A.
• Examinees are required to make some sort of
association between each premise and response.
12/8/2020 74Prepared by Mengistu Debele
75. Advantages of matching exercise
• Measure knowledge of terms definitions, dates,
events that involves simple association.
• Well suited for the ‘who, what, when, and where’
types of learning.
• Can be scored easily and objectively and amenable to
machine scoring.
• It is like a "fun game" for teaching young children.
• Measure a large amount of related factual materials.
• Many questions can be asked in a limited
• amount of testing time.
12/8/2020 75Prepared by Mengistu Debele
76. Limitations of matching exercise
• Restricted to measure factual materials based
on rote learning.
• If care is not taken, it can encourage serial
memorization rather than association.
• Difficult to get homogenous materials
• It is time consuming to the students.
12/8/2020 76Prepared by Mengistu Debele
77. Suggestions for Writing Matching items
• Keep both the list of descriptions and the list of options
fairly short and homogeneous.
• Put both descriptions and options on the same page.
• Make sure that all the options are plausible distracters.
• Arrange the answers in some systematic fashion.
• Use longer phrases or statements premises , shorter
phrases, words or symbols in responses.
• Each description in the list should be numbered and list
of options should be identified by letters.
• Include more options than descriptions
• In the directions, specify clearly the basis for matching
and whether the options can be used once or more than
once or not at all.
12/8/2020 77Prepared by Mengistu Debele
78. Multiple choice items
• The multiple-choice item consists of two parts: (1)
the stem which contains the problem; and (2) a list of
suggested answers (responses or options).
• The incorrect responses are often called decoys, foils
or distracters (distractors).
• The correct response is called the key.
• The stem may be stated as a direct question or an
incomplete statement. From the list of responses
provided, the student selects the one that is correct (or
best).
12/8/2020 78Prepared by Mengistu Debele
79. Advantages of multiple choice items
• It is adaptable or versatility to the most learning
outcomes.
• It has greater test reliability per item than do true-false
Items.
• It affords excellent content sampling, which generally
leads to more content-valid score interpretations.
• Less prone to ambiguity than the short-answer item.
• Can be scored quickly and accurately by machines,
clerks, teacher aides, and even students themselves.
12/8/2020 79Prepared by Mengistu Debele
80. Deficiencies or limitations of multiple-choice items
• It is very difficult to construct. It is not easy to get
plausible sounding distracters for teachers.
• There is a tendency for teachers to write multiple-
choice items demanding only factual recall.
• test-wise students perform better on multiple-choice
items than do non-test-wise students and that
multiple-choice tests favor
12/8/2020 80Prepared by Mengistu Debele
81. Suggestions for Writing Multiple Choice Items
• The stem of the item should clearly formulate a problem.
• Keep the response options as short as possible.
• Be sure that distracters are plausible.
• Include from three to five options.
• It is not necessary to provide additional distracters for an item
simply to maintain the same numbers of distracters for each item.
• Be sure that there is one and only one correct or clearly best answer.
• To increase the difficulty of items, increase the similarity of content
among the options.
• Use the option “none of the above” sparingly. Don’t use this option
when asking for best answer.
• Avoid using “all of the above”
• The stem and the options must be written on the same page.
• Ensure that the correct responses form essentially a random pattern
and in each of the possible response positions about the same
percentage of the time.
12/8/2020 81Prepared by Mengistu Debele
82. Guide for Preparing the Objective Test
• Begin writing items far enough.
• Match items to intended outcomes at the proper difficulty level.
• The wording of the item should be clear and as explicit as possible.
• Avoid setting interrelated items(Be sure that each item is independent of all
other items)
• Items should be designed to test important and not trivial facts or
knowledge.
• Write an item to elicit discriminately the extent of examinees possession of
only the desired behavior.
• Ensure that there is one and only one correct or best answer to each item.
• Avoid unintentionally giving away the answer through providing irrelevant
clues.
• Use language appropriate to the level of the examinees.
• Items in an achievement test should be constructed to elicit specific course
content and not measure general intelligence.
• Have an independent reviewer to see your test items.12/8/2020 82Prepared by Mengistu Debele
83. 4.2.Subjective /Essay/Tests
• Essay test items should be used primarily for the
measurement of those learning outcomes that can not
be measured by objective test items.
• Based on the amount of freedom given for
• student to organize his/her ideas and write answer,
essay questions are subdivided into two major types-
extended and restricted response.
12/8/2020 83Prepared by Mengistu Debele
84. Extended-response essay type
• In the extended-response type of essay question
virtually no bounds are placed on the student as to the
points the student will discuss and the type of
organization
• the student will use. Student has complete freedom of
giving response.
• E.g. Describe what you think should be included in a
• school testing program. Illustrate your answer with
specific tests, giving reasons for your test selection.
Your essay should be about 300-400 words or more.
12/8/2020 Prepared by Mengistu Debele 84
85. CON’D
• Extend-Response
– Permit students to decide which facts they think are most
pertinent, to select their own method of organization, and
write as much as seems necessary for a comprehensive
answer.
– Tends to reveal the ability to evaluate ideas, to relate them
coherently, and to express them succinctly.
– They are valuable for measuring complex skills and
understanding of concepts and principles, they have three
weaknesses:
a) Inefficient for measuring knowledge of factual
material
b) Scoring criteria are not as apparent to the student
c) Scoring is difficult and unreliable due to the various
responses, include array of factual material,
organization, legibility, and conciseness.
12/8/2020 Prepared by Mengistu Debele 85
86. Restricted-response essay type
• In the restricted-response essay question, the student is more limited in the
form and scope of his answer because he is told specifically the context that
his answer is to take.
E.g. Write how plants make food. Your answer should be about one-half
page long.
– These questions minimize some of the weaknesses of extended, for
three reasons:
• Easier to measure knowledge of factual material.
• Scoring more clear to the student.
• Reduces the difficulty of the scoring.
– The negative side- less effective as a measure of the ability to select,
organize, and integrate ideas.
– In addition, if the restrictions become too tight, the questions reduce to
nothing more than a objective type test.
12/8/2020 Prepared by Mengistu Debele 86
87. Guide for Preparing the Essay Test
• Restriction of the use of essay questions to only
those learning outcomes that cannot be
satisfactorily measured by objective items.
• Formulation of questions that call forth the
behavior specified in the learning outcomes.
Essay questions should be designed to elicit only
the skill which the item was intended to measure.
• Phrase each question to clearly indicate the
examinees task.
• Indication of approximate time limit for each
question.
• Avoidance of the use of optional questions.
12/8/2020 87Prepared by Mengistu Debele
88. 4.3.Performance Assessment
• Performance test is other teacher-made test
(technique) that try to establish what a person can do.
• Permit the student to organize and construct the
answer.
• Other types of performance may require the student
to use equipment, generate hypotheses, make
observations, construct a model, or perform to an
audience.
• For the most performance assessments do not have a
single right or best response-there may be a variety of
responses.
12/8/2020 88Prepared by Mengistu Debele
89. CON’D
• Performance assessment tasks are needed to measure
a student’s ability to engage in hands-on activities,
such as conducting an experiment, designing and
conducting a survey, or essay.
• Performance tests generally fall into one of
• three categories:
• (1) Tests under simulated condition.
• (2) Work sample tests.
• (3) Recognition tests.
12/8/2020 89Prepared by Mengistu Debele
90. 4.4. Authentic Assessment
Authentic assessment is the type of
assessment that aimed at evaluating
students' abilities in 'real-world' contexts.
In authentic assessment students are
asked to demonstrate practical skills and
concepts they have learned.
12/8/2020 90Prepared by Mengistu Debele
91. CON’D
• Authentic assessment does not encourage rote
learning and passive test-taking.
• it focuses on students' analytical skills, ability
to integrate what they learn, creativity, ability
to work collaboratively and written and oral
expression skills.
• It also called performance assessment.
12/8/2020 91Prepared by Mengistu Debele
92. Tools to employ authentic assessment
Observation and Observation tools
- Checklist: a checklist enables the observer to note-albeit very
quickly and very effectively-only whether or not a behavior
occurred. It does not permit the observer to rate the quality of,
degree to which, or frequency of occurrence of a particular
behavior. It is a list of items you need to verify.
- Anecdotal records: Anecdotal records are the least structured
observational tool. They depict actual behavior in natural
situations. short story that describes students’ behaviors and
students’ actual performance.
- Rating scales: rating scales can be used for single observations
or over long periods of time. They are a set of categories
designed to elicit information about a quantitative or a
qualitative attribute.
12/8/2020 92Prepared by Mengistu Debele
93. A. Running records
• Running records is a tool that helps teachers to
identify patterns or sequences in student
practical work, reading behaviors, laboratory
experiment procedure, drawing procedure, and
so on.
12/8/2020 93Prepared by Mengistu Debele
94. B. Project Work Assessment
• Project work assessment can be conducted in
two ways.
1. Process assessment
2. Product assessment
12/8/2020 94Prepared by Mengistu Debele
95. C. Portfolio assessment
• Portfolio is a collection of a student's work
specifically selected to tell a particular story
about the student’s work accomplishments.
• Portfolio is the systematic collection of student
work measured against predetermined scoring
criteria.
12/8/2020 95Prepared by Mengistu Debele
96. D. Self-Assessment
• Self-assessment is the process of looking at
oneself in order to assess aspects that are
important to one's identity. It is one of the
motives that drive self-evaluation, along with
self-verification and self-enhancement.
12/8/2020 96Prepared by Mengistu Debele
97. E. Reflection
• Reflection is an active process of witnessing
one’s own experience in order to take a closer
look at it, sometimes to direct attention to it
briefly, but often to explore it in greater depth.
12/8/2020 97Prepared by Mengistu Debele
98. F. Rubric
• Rubric is a scoring scale used to assess student
performance along a task-specific set of
criteria.
• A rubric is comprised of two components
which are criteria and levels of performance.
Each rubric has at least two criteria and at least
two levels of performance.
• Rubric is criteria prepared for scoring
12/8/2020 98Prepared by Mengistu Debele
99. Types of rubric
1. Analytic rubric: it articulates levels of
performance for each criterion so the teacher
can assess student performance on each
criterion.
2. Holistic rubric: it does not list separate
levels of performance for each criterion.
Instead, a holistic rubric assigns a level of
performance by assessing performance across
multiple criteria as a whole.
12/8/2020 99Prepared by Mengistu Debele
100. Unit 5Assembling, Administering and Scoring
Classroom Tests
5.1. Assembling the Test items
It is the process of arranging items on the test so that they are
easy to read.
• Plan the layout of the test in such a way as to be convenient for
recording answers and also scoring of the test items on
separate answer sheets.
• Group items of the same format and arrange in logical order
(true-false, matching items, completion/short answer items,
multiple choice, essay) together with their relevant directions
on what need to be done by the testees.
• Group items dealing with the same content together within
item types.
12/8/2020 Prepared by Mengistu Debele 100
101. CON’D
• Arrange the test items in progressive order of
difficulty starting from simple to complex questions.
• Ensure that one item does not provide clues to the
answer of another item or items in the same or
another section of the test.
• Ensure that the correct responses from essentially a
random pattern and in each of the possible response
positions about the same percentage of the time for
multiple choice items.
12/8/2020 Prepared by Mengistu Debele 101
102. 5.2. Administrations of classroom tests
Test administration refers to the procedure of
actually presenting the learning task that the
examinees are required to perform in order to
ascertain the degree of learning that has taken
place during the teaching-learning process.
- validity and reliability of test scores can be
greatly reduced when test is poorly
administered.
12/8/2020 102Prepared by Mengistu Debele
103. Consider the following during test administration
• Ensure that the physical and psychological
environment are conducive to testees.
• Avoid malpractices both from testees and invigilator.
• Avoid unnecessary threat from test administrators.
• Stick to the instructions regarding the conduct of the
test and avoid giving hits to testees who ask about
particular items. But make corrections or
clarifications to the testees whenever necessary.
• Keep interruptions during the test to a minimum.
12/8/2020 103Prepared by Mengistu Debele
104. Ensuring quality in test administration
• Collection of the question papers in time from custodian to
be able to start the test at the appropriate time stipulated
• Ensure compliance with the stipulated sitting arrangements
in the test to prevent collision between or among the testees.
• Ensure orderly and proper distribution of questions papers
to the testees.
• Do not talk unnecessarily before the test. Testees’ time
should not be wasted at the beginning of the test with
unnecessary remarks, instructions or threat that may
develop test anxiety.
• It is necessary to remind the testees of the need to avoid
malpractices before they start and make it clear that
cheating will be penalized.
12/8/2020 104Prepared by Mengistu Debele
105. Credibility and Civility in Test Administration
• Credibility deals with the value the eventual
recipients and users of the results of assessment
place on the result with respect to the grades
obtained, certificates issued or the issuing
institution.
• Civility deals whether the persons being assessed
are in such conditions as to give their best without
hindrances and encumbrances in the attributes
being assessed and whether the exercise is seen as
integral to or as external to the learning process.
12/8/2020 105Prepared by Mengistu Debele
106. Points need consideration in test administration
1. Instruction: both for test administrators and
testees
2. Duration of the test
3. Venue and sitting arrangement
4. Other necessary conditions
12/8/2020 106Prepared by Mengistu Debele
107. 5.3. Scoring the Test
• Marking schemes (key) should be prepared
alongside the construction of the test items in
order to score the test objectively.
Scoring Essay Test
In the essay test the examiner is an active part
of the measurement instrument. Therefore, the
viabilities within and between examiners affect
the resulting score of examinee.
12/8/2020 107Prepared by Mengistu Debele
108. Methods of scoring essay questions
There are two common methods of scoring essay questions.
These are:
1. The Point or Analytic Method: in this method each answer is
compared with already prepared ideal marking scheme (scoring
key) and marks are assigned according to the adequacy of the
answer.
2. The Global/Holistic of Rating Method: In this method the
examiner first sorts the response into categories of varying quality
based on his general or global impression on reading the response.
The standard of quality helps to establish a relative scale, which
forms the basis for ranking responses from those with the poorest
quality response to those that have the highest quality response.
Usually between five and ten categories are used with the rating
method with each of the piles representing the degree of quality
and determines the credit to be assigned.
12/8/2020 108Prepared by Mengistu Debele
109. Scoring Objective Test
• Objective test can be scored by various methods with
ease.
1. Manual Scoring: in this method answers to tests are
scored by direct comparison of the examinees answer
with the marking key.
2. Stencil Scoring: in this method answers to tests are
scored by laying the stencil over each answer sheet
and the number of answer checks appearing through
the holes is counted.
3. Machine Scoring: in this method answers to tests are
scored by machine with computers and other possible
scoring devices using certified answer key prepared
for the test items.
12/8/2020 109Prepared by Mengistu Debele
110. Unit 6:Interpreting, Describing, and Analyzing Test Scores
6.1.Methods of interpreting classroom test scores
Norm-reference interpretation: norm-referenced
test score interpretation tells us how an
individual student’s score compared with other
students’ scores in the group who have take the
same test.
12/8/2020 110Prepared by Mengistu Debele
111. 6.2 Judging The Quality of a Classroom Test
Item Analysis
Item analysis is the process of “testing the item” to
ascertain specifically whether the item is functioning
properly in measuring what the entire test is
measuring.
Purpose and Uses of Item Analysis
- To discriminate between high and low achievers in a
norm-referenced test(discrimination power).
- To determine items having desirable qualities of a
measuring instrument(difficulty level).
12/8/2020 Prepared by Mengistu Debele 111
112. The Process of Item Analysis for Norm –
Referenced Classroom Test
• In Norm-referenced test, special emphasis is
placed on item difficulty and item discriminating
power.
• The process of item analysis begins after the test
has been administered (or trial tested), scored and
recorded.
• The process of Item Analysis is carried out by
using two contracting test groups composed from
the upper and lower 25% or 27% or 33% of the
testees on which the items are administered.
12/8/2020 112Prepared by Mengistu Debele
113. Computing Item Difficulty
• Item difficulty index is denoted by P.
• The difficulty index (P) for each of the items is
obtained by using the formula:
P = Number of testees who got item right (R) X100
Total number of testees responding to item (T)
Thus, P = R X 100
T
The item difficult indicates the percentage of testees
who got the item right in the two groups used for the
analysis. That is 0.7 x 100% = 70%.
12/8/2020 113Prepared by Mengistu Debele
114. Where c* is the correct answer
1) calculate item difficulty index(p)
2) find good plausible destructor
3) calculate item discrimination power
Alternatives
Upper
Group
A B C* D E Omits
0 0 15 0 0 0
Lower
Group
4 2 8 1 0 0
12/8/2020 114Prepared by Mengistu Debele
115. Solution
P = Upper right +Lower right X 100
Total students in analysis
P = 15+8 X 100
30
= 23/30 (100)
= 0.766(100)
= 76.66%
Alternative A is good plausible distracter because
it attracts more students from lower group than
the upper group students.
12/8/2020 115Prepared by Mengistu Debele
116. Compute the item-discrimination index
Item discrimination power is denoted by D.
D = RU-RL
½ T
Where RU is right from upper group
RL is right from lower group
T is total students from upper and lower
groups
D = 15-8
½(30)
=7/15
= 0.47
12/8/2020 116Prepared by Mengistu Debele
117. Interpretation
• Item discrimination values range from – 1∙00 to + 1∙00.
• The higher the discriminating index, the better is an item in
differentiating between high and low achievers.
Item discriminating power is a:
• Positive value when a larger proportion of those in the high
scoring group get the item right compared to those in the low
scoring group. If D has a positive value, the item has positive
discrimination.
• Negative value when more testees in the lower group than in the
upper group get the item right. If D has a negative value, the
item has negative discrimination.
• Zero(0) value when an equal number of testees in both groups
get the item right; and
• One (1) when all testees in the upper group get the item right
and all the testees in the lower group get the item wrong.
12/8/2020 117Prepared by Mengistu Debele
118. Compute the Sensitivity to Instructional Effects
Sensitivity to instructional effects is denoted by S.
S requires two test administrations, i.e. pretest and
posttest
S = RA-RB
T
• Where RA = number of pupils who got the item
right after instruction
RB = number of pupils who answered the
item correctly before instruction
T= total number of pupils who attempted
the item both times
12/8/2020 118Prepared by Mengistu Debele
119. Following is an example of the behavior of four
items before (B, pretest) and after (A, posttest) instruction:
B = Pretest; A = Posttest, + means item correctly answered.
- means item incorrectly answered.
1 2 3 4
B A B A B A B A
Abebe - + - - + - - +
Balcha - + - - + - - +
Chaltu - + - - + - + +
Dawit - + - - + - + +
Eliyas - + - - + - - -
12/8/2020 119Prepared by Mengistu Debele
120. Analysis from the above table
Item 1. This is what would be expected in an "ideal" criterion
referenced test(CRT). No one answered the item correctly
before instruction; everybody after. This suggests that the
instruction was effective.
Item 2. We would not want these results in a CRT. No pupil
answered the item correctly either before or after
instruction.
Item 3. We would not want these results in any test-be it CRT
or NRT. Why should pupils who answered correctly before
instruction answer incorrectly after instruction?
Item 4. This is how we would expect good test items to
behave. We would expect some pupils to know the correct
answer before instruction but a larger number to answer the
item correctly after "effective" instruction.
12/8/2020 120Prepared by Mengistu Debele
121. Calculate the sensitivity to instructional effects for item
1, 2 , 3 and 4 in above example
Solution
S = RA-RB
T
1. S = 5-0 = 1.00
5
2. S = 0-0 = 0 .00
5
3. S = 0-5 = -1 .00
5
4. S = 4-2 = 0.4
5
12/8/2020 121Prepared by Mengistu Debele
122. Unit 7:Describing Educational Data
• 7.1. Units of measurement
• Data differ in terms of what properties of the
real number series (order, distance, or origin)
we can attribute to the scores. The most
refined-classification of scores is one
suggested by Stevens (1946), who classified
measurement scales as nominal, ordinal,
interval, and ratio scales.
12/8/2020 Prepared by Mengistu Debele 122
123. Nominal scale
• A nominal scale is the simplest scale of
measurement. Nominal scale is about naming.
E.g. Room 1, Room 2.
• It involves the assignment of different
numerals to categories that are qualitatively
different.
• For example, for purposes of storing data on
computer cards, we might use the symbol 0 to
represent a female and the symbol 1 to
represent a male.
12/8/2020 Prepared by Mengistu Debele 123
124. CON’D
• If measurement is defined as "the assignment
of numerals to objects or events according to
rules", then nominal data indicate
measurement. If, on the other hand,
measurement implies a quantitative difference,
then nominal data do not indicate
measurement.
12/8/2020 Prepared by Mengistu Debele 124
125. Ordinal scale
• An ordinal scale has the order property of a real
number series and gives an indication of rank order.
Thus, magnitude is indicated, if only in a very gross
fashion. Rankings in a music contest or in an athletic
event would be examples of ordinal data. Ordinal
scale is about ranking in order. E.g. Grade 1, Grade 2,
Grade 3.
• It help us to know who is best, second best, third
• best, and so on to select the top pupils for some
• task.
12/8/2020 Prepared by Mengistu Debele 125
126. Interval scale
• With interval data we can interpret the
distances between scores. It provides
information with regard to the differences
between the scores. Interval scale is about
difference and equal distance b/n attributes
under measurement, There is no absolute zero
in interval scale. E.g. Fahrenheit/Celsius
temperature scale.
12/8/2020 Prepared by Mengistu Debele 126
127. Ratio scale
• If one measures with a ratio scale, the ratio of
the scores has meaning. Thus, a person who is
86" is twice as tall as a person who is 43". We
can make this statement because a
measurement of 0(zero) actually indicates no
height. That is, there is a meaningful zero
point. Thus, there is absolute or true zero ratio
scale.
12/8/2020 Prepared by Mengistu Debele 127
128. 7.2. Measures of central tendency
• Measures of central tendency give some
idea of the average or typical score in the
distribution.
• Central tendency describes the center
location of distribution. There are three
commonly used measures of central
tendency. These are the mode, the
median and the mean.
12/8/2020 Prepared by Mengistu Debele 128
129. The mode
• The mode is defined as a score value which is
obtained most often or it is the most frequently
occurring value. A set of score may has one,
two more than two or no mode (mono modal,
bimodal, multi modal or no modal).E.g. in set
of 1,2,3,4,5 there is no mode. in set of data
6,7,7,7,8,8,9(the mode is 7, mono-modal),in
set of data 4, 5,5, 6,6, 7,8(the modes are 5 and
6 bimodal), in set of 4,4,5,5,6,6,7,8(the modes
are 4, 5,and 6 multimodal).
12/8/2020 Prepared by Mengistu Debele 129
130. The median
• The median (Mdn) is the point below which 50
percent of the scores lie. An approximation to that
point is obtained from ordered data by simply finding
the score in the middle of the distribution.
• It is the 50th percentile of the distribution that having
half of the case above it and half of the case below it
in distribution.
12/8/2020 Prepared by Mengistu Debele 130
131. CON’D
• When the number of ordered set of score is odd the
median is the middle score.
• Mdn =N+1th case.
• E.g. 3,4,4,7,7,7,8,8,9 N =9, (9+1)/2 = 5th case.
The 5th case is the median. i.e. 7.
If the number of ordered set of score is even, middle
score is the average of Nth +1 case.
e.g. 3,4,4,7,7,7,8,8,9,10. N=10, 10/2=5th case. the 5th
case is 7. the next 1 case is also 7.
Thus, Mdn = (7+7)/2 =7
12/8/2020 Prepared by Mengistu Debele 131
132. The mean (x)
• The mean (x) is the arithmetic average of
a set of scores. It is calculated by adding
all the scores in the distribution and
dividing by the total number of scores (N).
The formula is
x = Σx
N where x= raw score,Σ=sum of x
and N= the number of scores.
12/8/2020 Prepared by Mengistu Debele 132
133. 7.3. Measures of variability
Measures of variability used to know how the scores are
spread or dispersed. the common measures of
variability are range (R),, variance(σ2 = population
variance, S2 = sample variance ), standard deviation
(S), and percentile.
• The range(R) of a set of measurements is defined to
be the difference between the largest and the smallest
measurements of the set.
• Occasionally the range (high score-low score + 1) is
used.
12/8/2020 Prepared by Mengistu Debele 133
134. The Variance
• Variance indicates the degree of spread by
comparing every score to the mean of the
distribution. The formula is
• S2 = (x-x)
N
E.g. find variance for set of data 9,8,8,7,7,7,4,4,3,3.
Solution: N=10, mean= 6, sum of squared difference
is 46. hence S2 = 46/10 = 4.6
12/8/2020 Prepared by Mengistu Debele 134
135. Standard deviation
• The standard deviation involves the square
root of the variance. The formula is
S = S2
12/8/2020 Prepared by Mengistu Debele 135
136. Unit 8
Characteristics of good tests
• 8.1. Reliability
• Reliability can be defined as the degree of
consistency between two measures of the same tests.
• The classical theory of reliability can best be
explained by starting with observed scores (Xs).
• observed score as being made up of a "true
• score" (T) and an "error score" (E) such that
• X = T + E where X = observed score
• T = true score
• E = error score
12/8/2020 Prepared by Mengistu Debele 136
137. • A good test measures what it intends to
measure.
• The most important criteria for evaluating
tests are validity and reliability.
Measurement tools can be judged on
variety of merits. These include practical
issues as well as technical issues. All
instruments have strengths and weakness.
No instrument is perfect for every task.
12/8/2020 Prepared by Mengistu Debele 137
138. • Some of the practical issues that need to
be considered include: cost, availability,
training required, ease of administration,
scoring, analysis, time and effort required
for respondent to complete measure.
• Along with the practical issues ,
measurement tools may be judged on the
following characteristics:
12/8/2020 Prepared by Mengistu Debele 138
139. • Validity: a test is considers as valid when it
measures what is supposed to measure.
• Reliability: a test is reliable if it is taken
again by the same students under the
same circumstances and the score is
almost the constant, taking into
consideration that the time between the
test and retest is reasonable length.
12/8/2020 Prepared by Mengistu Debele 139
140. • Objectivity: objectivity means that if the
test is marked by different persons, the
score is will be the same. In other words,
marking process should not be affected by
the marker’s personality.
• Comprehensiveness: A good test should
include items from different areas of
material assigned for the test.
12/8/2020 Prepared by Mengistu Debele 140
141. • Simplicity: simplicity means that the test
should be written in a clear, correct and
simple language, it is important to keep
the method of testing as simple as
possible while still testing the skill you
intend to test.
• Scorability: scorablity means that each
item in the test has its own mark related to
the distribution of marks given
12/8/2020 Prepared by Mengistu Debele 141
142. Approaches to estimate reliability
• The common approaches to estimate reliability are:
1. Measures of stability
2. Measures of equivalence
3. Measures of equivalence and stability
4. Measures of internal consistency
• a. Split-half
• b. Kuder-Richardson estimates
5. Scorer (judge) reliability
12/8/2020 Prepared by Mengistu Debele 142
143. Measure of stability
• A measure of stability, often called a test-retest
estimate of reliability, is obtained by
administering a test to a group of persons, re-
administering the same test to the same group
at a later date, and correlating the two sets of
scores.
12/8/2020 Prepared by Mengistu Debele 143
144. Measures of Equivalence
• Measures of equivalence estimate reliability
based on equivalent-forms that obtained by
giving two forms (with equal content, means,
and variances) of a test to the same group on
the same day and correlating these results.
12/8/2020 Prepared by Mengistu Debele 144
145. Measures of Equivalence and Stability
• A coefficient of equivalence and stability
could be obtained by giving one form of
the test and, after some time,
administering the other form and
correlating the results.
12/8/2020 Prepared by Mengistu Debele 145
146. Measures of Internal Consistency
• The split-half method of estimating reliability is
theoretically the same as the equivalent-forms
method.
• The appropriate formula is a special case of the
Spearman-Brown prophecy formula.
• rxx = 2r(1/2)(1/2)
1+r(1/2)(1/2)
rxx = estimated reliability of the whole test
r (1/2) (1/2) = reliability of the half-test
12/8/2020 Prepared by Mengistu Debele 146
147. Kuder-Richardson Estimates
• If items are scored dichotomously (right or wrong), one way to
avoid the problems of how to split the test is to use one of the
Kuder-Richardson formulas. The formulas may be considered
as representative of the average correlation obtained from all
possible split-half reliability estimates. K-R 20 and K-R 21 are
two formulas used extensively.
K-R20: rxx = n (1-Σpq)
n-1 S2
x
K-R21:rxx = n (1-x(n-x))
n-1 nS2
x
12/8/2020 Prepared by Mengistu Debele 147
148. where n = number of items in test
p = proportion of people who answered item correctly
q = proportion of people who answered item incorrectly
(q = 1 - p; if P = .20, q = .80).
pq = variance of a single item scored dichotomously
(right or wrong)
Σ = summation sign indicating that pq is summed over
all items
S2 = variance of the total test
X = mean of the total test
12/8/2020 Prepared by Mengistu Debele 148
149. Scorer (Judge) Reliability
• If a sample of papers has been scored
independently by two different readers, the
traditional Pearson product moment correlation
coefficient (r) can be used to estimate the
reliability of a single reader's scores.
12/8/2020 Prepared by Mengistu Debele 149
150. Factors Influencing Reliability
• Test length: the longer tests give more reliable scores.
• Speed: A test is considered a pure speed test if
everyone who reaches an item gets it right but no one
has time to finish all the items. Thus, score
differences depend upon the number of items
attempted. The opposite of a speed test is a power
test.
• Group homogeneity: the more heterogeneous the
group, the higher the reliability.
• Objectivity: the more subjectively a measure is
scored, the lower the reliability of the measure.
12/8/2020 Prepared by Mengistu Debele 150
151. 8.2 validity
• What is validity?
• Validity is the extent to which a test measures
what it claims to measure. It is vital for a test
to be valid in order for the results to be
accurately applied and interpreted.
• Validity can be best defined as the extent to
which certain inferences can be made
accurately from and certain actions should be
based on-test scores or other measurement.
12/8/2020 Prepared by Mengistu Debele 151
152. • Validity(meaningfulness)is the degree to
which an evaluation (test) actually serves
the purpose for which it is intended.
• Three important things about validity are:
• 1 validity refers to the appropriate use of
the scores but not the test itself. This
means it refers to the interpretation of test
scores rather than test itself
12/8/2020 Prepared by Mengistu Debele 152
153. • 2. validity is a matter of degree. That
means, tests are not absolutely valid or
invalid
• 3. validity is specific to a particular use. No
test serves all purposes equally well.
There are three main classes of validity.
these are content validity, criterion-related
validity, and construct validity.
12/8/2020 Prepared by Mengistu Debele 153
154. Content validity
• Content validity is defined as the extent to
which the instrument measures what it
purports to measure. Content validity pertains
to the degree to which the instrument fully
assesses or measures the contents of interest.
When a test has content validity, the items on
the test represent the entire range of possible
items the test should covers.
12/8/2020 Prepared by Mengistu Debele 154
155. CON’D
• Content validity deals with adequate sampling of
items. When evaluation instrument adequately
samples certain types of situations or subject matter,
it is said to have content validity. if item sampling is
not representative to adequate degree, the test lacks
content validity.
• Face validity: is a form of content validity that refers
to the measuring instrument appears/seems to
measure what it is intended to measure.
12/8/2020 Prepared by Mengistu Debele 155
156. Criterion-related validity
• A test is said to have criterion-related validity
when the test has demonstrated its
effectiveness in predicting criterion or
indicators of a construct. There are two types
of criterion-related validity:
• 1 concurrent validity
• 2. predictive validity
12/8/2020 Prepared by Mengistu Debele 156
157. Concurrent validity
• Concurrent validity occurs when the criterion
measures are obtained at the same time as the
test scores. This indicates the extent to which
the test scores accurately estimate an
individual’s current state with regards to the
criterion. It is a comparison of scores on some
instrument with current scores on another
instrument.
12/8/2020 Prepared by Mengistu Debele 157
158. Predictive validity
• Predictive validity occurs when the criterion
measures are obtained at a time after the test.
examples of test with predictive validity are
career or aptitude tests which are helpful in
determining who is likely to succeed of fail in
certain subjects or occupations. predictive
validity is a comparison of scores on some
instruments with some future behavior or
future scores on another instrument.
12/8/2020 Prepared by Mengistu Debele 158
159. Construct validity
• Construct validity is the degree to which an
instrument measures the trait or theoretical
constructs that it is intended to measure. A
construct is human characteristics. A test is
said to be valid in terms of construct when it
possesses psychological concepts. A test has
construct validity if it demonstrates an
association between the test scores and the
prediction of a theoretical trait.
12/8/2020 Prepared by Mengistu Debele 159