SlideShare ist ein Scribd-Unternehmen logo
1 von 159
Downloaden Sie, um offline zu lesen
EDUCATIONAL
MEASUREMENT AND
EVALUATION
A Teaching Note For Regular
Undergraduate Students
12/8/2020
1
CREATED FOR BY:
Dr. Mengistu Debele Gerbi (Ph.D)
Ambo University
2020
Prepared by Mengistu Debele
UNIT 1: BASIC CONCEPTS AND PRINCIPLES IN
EDUCATIONAL MEASUREMENT AND EVALUATION
1.1 Meaning and Definitions of Basic Terms
Dear students, can you define the following terms?
- TEST
- TESTING
- MEASUREMENT
- ASSESSMENT
- EVALUATION
212/8/2020 Prepared by Mengistu Debele
TEST
According to Gronlund (1981), test refers to the
presentation of a standardized set of questions to be
answered by pupils. It is also an instrument for
measuring samples of a person’s behavior by posing a set
of questions in a uniform manner.
Test is a method used to determine students’ ability to
complete certain tasks or demonstrate mastery of a skill
or knowledge of content.
Test is a method of measuring person’s quality, ability,
knowledge or performance in a given domain.
312/8/2020 Prepared by Mengistu Debele
1. Intelligence test, 12. Speed test
2. Personality test, 13. Power test
3. Aptitude test, 14. Objective test
4. Achievement test, 15. subjective test
5. Prognostic test, 16. Teacher-made test
6. Performance test, 17. Formative test
7. Diagnostic test, 18. Summative test
8. Preference test, 19. Placement test
9. Accomplishment test, 20. Standardize test
10. Scale test , 21. Norm-reference test
11. Criterion-reference test,
General Classification of test
12/8/2020 4Prepared by Mengistu Debele
Testing is the process of administering the test to the
pupils.
a technique of obtaining information needed for
evaluation purposes.
Tests, quizzes, measuring instruments- are devices
used to obtain such information.
Test is the most commonly used method of making
measurements in education.
TESTING
12/8/2020 5Prepared by Mengistu Debele
MEASUREMENT
Students! Are test and measurement the same? Of course, they are not the same.
Measurement refers to giving or assigning a numerical value to certain attributes
or behaviors.
It is a systematic process of obtaining the quantified degree .
It is an assignment of numbers (quantity), uses variety of instruments: test, rating
scale.
It is the process of obtaining numerical description of the degree of individual
possesses. Quantifying of how much does learner learned. E.g. Ujulu scored 8/10
in M &E test.
A measurement is takes place after a test is given and score is obtained. Thus, test
are instrument of measurement. We use test, project, home work, laboratory work,
quiz, and assignment as instruments of measurement.
Measurement can also refers to both the score obtained and the process
used.
612/8/2020 Prepared by Mengistu Debele
ASSESSMENT
Assessment is a process of gathering and organizing data in order
to monitor the progress of students’ learning.
According to Airasia(1997) and Kruikshank, et al.(1999),
assessment refers to the process of collecting (through paper-and-
pencil tests, observation, self-reported questionnaires) ,
interpreting, and synthesizing information to aid in decision
making. So that, it is the basis for decision making (evaluation)
It is the process of collecting, recoding, scoring, describing, and
interpreting information about learning.
The goal of assessment is to make improvement.
It deals with the nature of the learner/(what s/he learned, how s/he
learned).
712/8/2020 Prepared by Mengistu Debele
CON’D
• Assessment is a systematic basis for making
inference about the learning and development
of students, the process of defining, selecting,
designing, collecting, analyzing, interpreting
and using information to increase students’
learning and development.
812/8/2020 Prepared by Mengistu Debele
EVALUATION
Students! Are assessment and evaluation the same? Surely they are
not the same. According to Gage and Berliner(1998), evaluation
refers to the process of using information to judge the goodness,
worth or quality of students’ achievement, teaching programs, and
educational programs. It is concerned with making judgments on the
worth or value of a performance, answer the question “how good,
adequate, or desirable”. It is also the process of obtaining, analyzing
and interpreting information to determent the extent to which
students achieve instructional objective.
It occurs after assessment has been done because teacher here is in a
position to make informed judgment.
912/8/2020 Prepared by Mengistu Debele
CON’D
Hence, Evaluation, deals with determining the value and
effectiveness of something-often a program. From this
standpoint, an evaluation encompass an assessment
initiative as the source for making judgments about
program quality.
Evaluation is the process of making judgment of a
product, a response, or a performance based on a
predetermined criteria.
Thus, evaluation is simply the quantitative description
and/or qualitative description of a behavior plus value
judgment. Dear students! Which of the two is a more
comprehensive, assessment or evaluation?
12/8/2020 10Prepared by Mengistu Debele
1.2. ROLES AND PRINCIPLES OF EVALUATION
Dear students! Can you mention major uses of
evaluation results in improving teaching-
learning process? Great, according to Gronlund
and Linn (1995), evaluation used in:
 Programmed instruction
 Curriculum development
Marking and reporting
12/8/2020 11Prepared by Mengistu Debele
Functions of Evaluation
1. Evaluation assesses or make appraisal of
- Educational objectives, programs, curriculums,
instructional materials, facilities
- Teacher
- Learner
- Public relations of the school
- Achievement scores of learner
2. Evaluation products research
12/8/2020 12Prepared by Mengistu Debele
CON’D
• In view of Gronlund (1981), evaluation results are
used for making decisions related to instruction,
guidance and counseling, administration, and
research.
• Well and systematically organized results of
evaluation help teachers to improve teaching through
judging the adequacy and appropriateness of
instruction referring to objectives, contents, teaching
methods and assessment techniques.
12/8/2020 13Prepared by Mengistu Debele
1.3. Principles of evaluation
• Evaluation is effective when it is based on sound
principles. As to Gronlund and Linn(1995) major
principles of evaluation are
1. Evaluation should give priority to the determination
of what to be measured.
2. The types of test items format/evaluation techniques
to be used should be determined by the specific
learning outcomes to be measured.
3. Comprehensive evaluation requires variety of
techniques.
4. A proper use of evaluation techniques requires an
awareness of their limitation and strengths.12/8/2020 14Prepared by Mengistu Debele
CON’D
5. Test items should be based on a representative sample
of the subject contents and the specified learning
outcomes to be measured,
6. Test items should be of the appropriate level of
difficulty.
7. Test items should be constructed so that extraneous
factors do not prevent students from responding.
8. Test items should constructed so that the students
obtain the correct answer only if they attained the
desired learning outcomes.
9. Test items should be constructed so that it contributes
to improvement of the teaching –learning process.12/8/2020 15Prepared by Mengistu Debele
To summarize the principles of evaluation
• Evaluation should be:
1. Based on clearly stated objectives
2. Comprehensive
3. Cooperative
4. Used judiciously
5. Continuous and integral part of teaching-
learning process
12/8/2020 16Prepared by Mengistu Debele
Unit -2
THE ROLE OBJECTIVE IN
EDUCATIONAL MEASUREMENT AND
EVALUATION
12/8/2020 17Prepared by Mengistu Debele
2.1. Definition of important terms
Dear students! Can you define outcome,
aims/goals, objectives, learning outcomes?
• According to Gronlund and Linn(1995),
outcome is what occurs as a result of an
educational experience.
• Goals are broad statements of educational intent
that give the overall purpose and desired
outcomes (Ellington & Shirley, 1997).
12/8/2020 18Prepared by Mengistu Debele
CON’D
• Objectives are sets of more detailed statement that
specify the means by which different aims of the
course that relate to the activities that it involves and
content that it covers(Cruikshank, 1999).
• Learning outcomes are sets of statement even more
detailed statements that specify the various things that
students will be able to do after successful completion
of the learning process.
12/8/2020 19Prepared by Mengistu Debele
2.2. Importance of stating objectives
• The properly stated instructional objectives
serve as guides for both teaching and testing.
The following are functions of objectives:
identify the intended students’ outcome, help
teacher to plan instruction, provide criteria for
evaluate students’ outcomes, help in selecting
appropriate instruction, provide a public record
of intent, help teacher to guide & monitor
students’ learning.
12/8/2020 20Prepared by Mengistu Debele
2.3. Guideline for writing instructional objectives
1. Objectives should be stated in terms of student
performance/ behavior
2. Objectives should be stated in terms of the learning
outcomes and not in terms of the learning process.
3. Objectives should be sufficiently free from the
influence of course content
4. Statement of the objectives should be an
amalgamation of subject-matter and desired behavior.
12/8/2020 21Prepared by Mengistu Debele
CON’D
5. Avoid the use of more than one type of learning
outcomes in each objectives.
6. Begin each specific objectives with an action verb
which indicates what the students have to do , or
demonstrate
7. Write sufficient number of specific objectives (5-
8)for each general objectives so as to adequately
describe the students behavior for achieving the
general objective.
12/8/2020 22Prepared by Mengistu Debele
2.4. Methods of stating instructional objectives
• According to Gronlund (1981), there are two
levels of stating instructional objectives.
1. Stating general objectives as intended
learning outcomes
2. Listing a sample of specific objectives
corresponding to and representative of each
other objectives.
12/8/2020 23Prepared by Mengistu Debele
2.5. Taxonomy of Educational objectives
• Taxonomy means 'a set of classification
principles', or 'structure’.
• Domain means 'category’ or distinct area.
The most well known description of learning
domains was developed by Benjamin Bloom
that known as Bloom’s Taxonomy.
12/8/2020 24Prepared by Mengistu Debele
The three domains of Bloom’s taxonomy of
educational objectives
12/8/2020 25Prepared by Mengistu Debele
Cognitive domain
• RATIONAL LEARNING: THINKING
 Emphasis upon knowledge, using the
mind, and intellectual abilities.
 Often referred to as Instructional or
behavioral objectives that begin with
verbs
12/8/2020 26Prepared by Mengistu Debele
CON’D
• Behaviors are taught
to be cumulative,
going from simple to
more complex
mental behaviors.
12/8/2020 27Prepared by Mengistu Debele
CON’D
• Bloom and Krathwohl (1956)
divided cognitive domain into
six distinct levels, each level
building on those below and
representing progressively
higher level of cognition.
Knowledge
Evaluation
Synthesis
Analysis
Application
Comprehension
12/8/2020 28Prepared by Mengistu Debele
1. Knowledge
Remembering of previously learned
material/information.
• Emphasizing facts, information, and specifics.
Involves remembering material in form very
close to how it was originally presented.
• Depends upon memorizing or identifying facts
without asking beyond.
12/8/2020 29Prepared by Mengistu Debele
2. Comprehension
Defined as the ability to understand the
meaning of material.
• Understanding and interpreting information.
• Grasping the meaning and intent of the
material.
• Deals with content and involves ability to
understand what is being communicated.
12/8/2020 30Prepared by Mengistu Debele
3. Application
Applying procedures/systems/rules/ in specific
situation.
• It is the ability to use learned material in new
situations.
• Using what is remembered and comprehended.
• Applies learning to real life, new, and/or
concrete situations.
• It is ability to use knowledge and learned
material in meaningful ways.
12/8/2020 31Prepared by Mengistu Debele
4. Analysis
Reasoning
• Breaking a system/material down into its
constituent elements or parts so as to see its
organizational structures and determining the
relationships of these parts to each other and to
the whole.
• It is ability to break material down into
specific parts so that the overall organizational
structure may be comprehended.
12/8/2020 32Prepared by Mengistu Debele
5. Synthesis
Creating
• It is ability to put parts together to form a
whole.
• Putting together diverse skills, abilities, and
knowledge to accomplish a particular new task
or form a new whole.
• Organizing ideas into new patterns and putting
materials together in a structure which was not
there before.
12/8/2020 33Prepared by Mengistu Debele
6. Evaluation
Evaluating
• It is the ability to judge the worth of material
for a given purpose.
• Making judgment/critical comparisons on the
basis of agreed criteria.
• Judging the values of ideas, methods,
materials, procedures, and solutions by
developing and/or using appropriate criteria.
12/8/2020 34Prepared by Mengistu Debele
12/8/2020 35Prepared by Mengistu Debele
Affective domain
• EMOTIONAL LEARNING: FEELING
 Concerned with feeling, attitudes,
appreciations, interests, values and
adjustments.
Bloom and Krathwohl (1964) divided affective
domain into five distinct hierarchical levels.
12/8/2020 36Prepared by Mengistu Debele
CON’D
• The affective domain (Krathwohl, Bloom &
Masia, 1973) includes the manner in which we
deal things emotionally, such as feelings,
values, appreciation, enthusiasm, motives and
attitudes.
• Since affective domain is concentrated with a
student attitudes, personal beliefs, and values,
measuring educational objectives in this
domain is not easy.
12/8/2020 37Prepared by Mengistu Debele
Bloom and Krathwohl (1964)’s levels
of affective domain
• The five major categories of affective domain listed
from simplest to most complex as follow:
Receiving-developing an awareness of sth.
Responding-showing active interest in sth.
Valuing- take up positive attitudinal position.
Organization-making adjustment to the value
Characterization-integrating one’s attitude into a
total all self-embracing philosophy.
12/8/2020 38Prepared by Mengistu Debele
Educational objectives and State of mind
Educational objectives State of mind
• Receiving Willingness to pay attention
• Responding React voluntarily or complies
• Valuing Acceptance
• Organization Rearrangement of value system
• Characterization Incorporates values in life
1.Receiving: the lowest level; the student passively pay
attention. Without this level no learning can occur.
12/8/2020 39Prepared by Mengistu Debele
CON’D
• It is the ability of the student to be attentive to
particular stimuli.
• It includes awareness(giving appropriate
opportunity to learner to be merely conscious
of something), willingness to receive (tolerate
new stimulus, not avoid it), and control or
select attention(attention to differentiating
stimulus from competing and distracting
stimuli)
12/8/2020 40Prepared by Mengistu Debele
2. Responding
• The student being an active participant.
• Showing some new behaviors beyond mere attending. It
implies active attending of something with or about
phenomena.
1. Acquiescence in responding: obedience or compliance
are word to describe this behavior.
2. Willingness to respond: capacity for voluntary activity.
3. Satisfaction in response: feeling of satisfaction and
enjoyment.
12/8/2020 41Prepared by Mengistu Debele
3.Valuing
• The worth the student attaches to some entity.
• Implies perceiving them as having worth and
consequently revaluing, consistency in behaviors
related to this phenomena and showing some definite
involvement or commitment.
• It is a motivated behavior by individual commitment to
the underlying value guiding behavior.
• It can be described as acceptance of value, preference
for a value and commitment.
12/8/2020 42Prepared by Mengistu Debele
4. Organization
• Bringing together things into a whole.
• Organization is defined as conceptualization of the value
and integrating a new value into one’s general set of
values relative to other priorities.
• Value refers to an individual's life style that has been built
on his/her value system and that controls his/her behavior.
1. Conceptualization of value: permit the individual to see
how the value relates to the existing value he/she hold
2. Organization of the value system: analyzing the complex
objectives of various value system to new value which is
harmonious.
12/8/2020 43Prepared by Mengistu Debele
5. Characterization by value
• Acting consistently with new value; person is
known by the value. It is the level at which a
person integrates one’s beliefs, ideas, and
attitudes into a total all-embracing life
philosophy.
12/8/2020 44Prepared by Mengistu Debele
Psychomotor domain
• PHYSICAL LEARNING: DOING
 Emphasizes speed, accuracy, dexterity, and
physical skills.
• This domain includes objectives related to
muscular or motor skill, manipulation of
material and objects, and neuromuscular
coordination.
• Developed by Harrow (1972), Simpson 1972
12/8/2020 45Prepared by Mengistu Debele
Harrow’s levels of Psychomotor domain
Reflex movements
Basic-fundamental movements
 Perceptual abilities
Physical abilities
Skilled movements and
Non-discursive communication.
12/8/2020 46Prepared by Mengistu Debele
Dave’s level of psychomotor domain
 Imitation: observing and pattering behavior after
someone else.
 Manipulation: being able to perform certain actions
following instructions and practicing.
 Precision: refining, becoming more exact , few
errors are apparent
 Articulation: coordinating a series of actions,
achieving harmony, and internal consistency.
 Naturalization: having high level performance
become natural with out need to think much about it.
12/8/2020 47Prepared by Mengistu Debele
Simpson’s hierarchical taxonomy
 Perception: process of becoming aware of
objects, qualities etc by way of senses. This
basic in situation-interpretation-action chain
leading to motor activity.
Set: readiness for a particular kind of action or
experience: may be mental, physical or
emotional.
Guided response: overt behavioral act under
guidance of instructor, or following model, or
set of criteria.
12/8/2020 48Prepared by Mengistu Debele
 Mechanism:
learned response becomes habitual; learner has
achieved certain confidence and proficiency or
performance.
 Complex overt response: performance of
motor act considered complex because of
movement pattern required.
Adaptation: altering motor activities in meet
demands of problematic situations.
12/8/2020 49Prepared by Mengistu Debele
 Organization:
creating new motor acts or ways of manipulating
materials out of the skills, attributes, and
understandings developed in psychomotor
area.
12/8/2020 50Prepared by Mengistu Debele
UNIT 3
Planning Classroom Tests
• Development of good questions or items must
follow a number of principles.
• The development of valid, reliable and usable
questions involves proper planning.
• The validity, reliability and usability of such
test depend on the care with which the test are
planned and prepared.
12/8/2020 51Prepared by Mengistu Debele
3.1. Some pit falls in teacher-made tests
• Most teacher-tests are not appropriate to the
different levels of learning outcomes.
• Many of the test exercises fail to measure what
they are supposed to measure. In other words
most of the teacher-made tests are not valid.
• Some classroom tests do not cover
comprehensively the topics taught(lacking content
validity).
• Most tests prepared by teacher lack clarify in the
wordings.
• Most teacher-made tests fail item analysis test.
12/8/2020 52Prepared by Mengistu Debele
3.2. Considerations in Planning Classroom Tests
Guide in planning a classroom test
• Determine the purpose of the test.
• Describe the instructional objectives and content to be measured.
• Determine the relative emphasis to be given to each learning
outcome.
• Select the most appropriate item formats (essay or objective).
• Develop the test blue print to guide the test construction.
• Prepare test items that are relevant to the learning outcomes
specified in the test plan.
• Decide on the pattern of scoring and the interpretation of result.
• Decide on the length and duration of the test, and
• Assemble the items into a test, prepare direction and administer
the test.
12/8/2020 53Prepared by Mengistu Debele
3.3. Steps in planning classroom tests
• Before constructing a test, test constructors
should ask themselves the following
questions:
• What should I measure?
• What knowledge, skills, and attitudes do I
want to measure?
• Would I test for factual knowledge or the
application of this factual knowledge?
12/8/2020 54Prepared by Mengistu Debele
The answers to these questions mainly depend on:
Instructional objectives
Intended learning outcomes
The nature of the subject-matter imparted, and
The emphasis given to the content.
As suggested by Gronlud(1981), the construction of
good tests requires adequate and extensive planning
so that: instructional objectives, teaching strategies,
textual materials, and evaluation procedures are all
inter-related in some meaningful fashion.
12/8/2020 55Prepared by Mengistu Debele
3.4. Planning stage of test development
The planning stage of test development has
the following major steps
1. Determining the purpose of testing
2. Developing test specification(TOS)
3. Selecting appropriate item types
4. Preparing/writing relevant test items.
12/8/2020 56Prepared by Mengistu Debele
3.5. Table of specification /TOS/
• TOS is a test blue print
• Table of specification is a two dimensional table that
specifies the level of objectives in relation to the
content of the course.
• A well planned TOS enhances content validity of that
test for which it is planned.
• The two dimensions (content and objectives) are put
together in a table by listing the objectives across the
top of the table (horizontally) and the content down the
table (vertically) to provide the complete framework for
the development of the test items.
12/8/2020 57Prepared by Mengistu Debele
3.6. Item writing
Guide for item writing
• Follow your TOS when you are writing the test items.
• Generate more items than specified in the table of
specification.
• Use unambiguous language so that the demands of the item
would be clearly understood.
• Endeavor to generate the items at the appropriate levels of
difficulty as specified in the table of specification.
• Give enough time to allow an average student to complete the
task.
• Build in a good scoring guide at the point of writing the test
items.
• Have the test exercises examined and critiqued by one or more
colleagues.
• Review the items and select the best according to the laid
down table of specification/test blue print.
12/8/2020 58Prepared by Mengistu Debele
Unit 4
Construction of Classroom Tests
• Based upon the type of item format used,
teacher-made classroom tests are divided into
two type- objectives versus subjective tests.
• Objective tests are tests with definite answers.
• One and only one correct answer is available
to a given item for objective tests.
• Thus, they have high scorers reliability.
12/8/2020 59Prepared by Mengistu Debele
4.1. Writing Objective Test Items
• Objective tests are of two types
• 1. Selection-typed item (student requires to select
answer)- multiple choices, true-false, matching type.
• Selection-type items are a fixed-response type.
• 2. Supply-typed items (students requires to supply
answer) - Completion, short answer
Supply-typed items are a free-response type.
12/8/2020 60Prepared by Mengistu Debele
CON’D
• To obtain the correct answer, students must
demonstrate the specific knowledge, understanding,
or skill. They are not free to redefine the problem or
to organize and present the answer in their own
words.
• This type of method contributes to scoring that is
quick, easy, and accurate.
• Negative side: inappropriate for measuring the ability
to formulate problems and choose an approach to
solving them or the ability to select, organize, and
integrate ideas.
12/8/2020 61Prepared by Mengistu Debele
Types of Objective Test
Objective test items
Supply test items Selection test items
Short Answers Completion Arrangements True- False Matching Multiple Choices
12/8/2020 62Prepared by Mengistu Debele
CON’D
• There are five main components to an objective test, such as
– Short answer
• What is the capital of Ethiopia?
– Completion
• In the equation 2X+5=9, X=____
– Matching
• Color of the sky A) Brown
• Color of the dirt B) Blue
• Color of the trees C) Green
– True-False or Alternative Response
• T F An atom is the smallest particle of matter.
• Yes No Acid turns litmus paper red.
– Multiple Choice
• In the equation 2X+5=9, 2X means
A)2 plus X
B)2 multiplied by X
C)2 minus X
12/8/2020 63Prepared by Mengistu Debele
4.1.1.Supply Item Type
• Supply or free-response objective test requires
the testee to give very brief answers to the
questions. These answers may be a word, a
short phrase, a number, a symbol or number.
• If test items consist direct questions, they
require short answers (short-answer type)
• If test items consist incomplete statement, they
require responses that must be supplied by an
examinees (completion type).
12/8/2020 64Prepared by Mengistu Debele
Examples
1. What is the largest lake in Ethiopia?
2. Who is the first women athlete in Africa to win a
gold medal in Olympics?
3. The largest lake in Ethiopia is ____.
4. The name of the first women athlete in Africa to win
a gold medal in Olympics is ____.
Question1&2 are direct questions end by question mark,
so that they are short answer type.
Question 3&4 are incomplete sentences, so that they are
completion type.
12/8/2020 65Prepared by Mengistu Debele
Uses of supply item types
• They are suitable for measuring a wide variety
of relatively simple learning outcomes such as
recall of memorized information and problem
solving outcomes measured in mathematics
and sciences.
• They used when it is most effective for
measuring a specific learning outcome such as
computational learning outcomes in
mathematics and sciences.
12/8/2020 66Prepared by Mengistu Debele
Advantages of supply test items
• Reduce problems of guessing: It minimizes
guessing because the examinees must supply
the answer by either think and recall the
information or make the necessary
computations to solve the problem presented.
• Easy to construct.
• Effective in measuring spelling ability of
students.
• Encourage intensive study.
12/8/2020 67Prepared by Mengistu Debele
Weakness of supply items types
• Do not measure complex learning outcomes
• Scoring is relatively difficulty
• Excessive use can encourage rote memory and
poor study habit
• Limited to questions that can be answered by a
word, phrase, symbol or number.
• Sometimes it is difficult to phrase the question
as incomplete statement.
12/8/2020 68Prepared by Mengistu Debele
Suggestions to construct supply item types
• There must be only one correct answer
• The wording must be clear and specific to avoid
ambiguous to response
• Avoid too many blank spaces in the same sentence
• Do not take statements directly from textbooks
• Direct question is generally more desirable than an
incomplete statement.
• Do not use articles ‘the, a, an’ immediately preceding
a response.
• The length of blanks should be approximately equal.
12/8/2020 69Prepared by Mengistu Debele
Tip to write Completion Items
• Avoid statements that are so indefinite which
may be answered by several terms.
• Be sure that the language used in the question
is precise and accurate.
• Word the statement such that the blank is near
the end of the sentence rather than near the
beginning.
• If the problem requires a numerical answer,
indicate the units in which it is to be expressed.
12/8/2020 70Prepared by Mengistu Debele
4.1.2.Selective item types
• True-false/alternative response items
• Consists of a declarative statement that
students asked to response true or false, right
or wrong, correct or incorrect, yes or no, agree
or disagree.
• Used to measure simple learning outcomes.
12/8/2020 71Prepared by Mengistu Debele
Suggestions for Writing True-False items
• The desired method of marking true or false
should be clearly explained.
• Construct statements that are definitely true or
definitely false.
• Use relatively short statements.
• Keep true and false statements at
approximately the same length.
• Be sure that there are approximately equal
number of true and false items.
• Avoid using double negative statements.
12/8/2020 72Prepared by Mengistu Debele
Avoid the following when write True-false items
• verbal clues and complex statements
• Broad general statements that are usually not true or
false without further qualification.
• Terms denoting indefinite degree(e.g.. Large, long
time, regularly)
• Avoid absolute terms like never, only, always ,etc
• Placing items in a systematic order (TTFFTT,
TFTFTF,TTFTTFTT, etc.)
• Taking statements directly from the text.
• Avoid trivial statements.
• Don’t use statements partly true partly false.
12/8/2020 73Prepared by Mengistu Debele
Writing matching items
• Matching has two column. The question or
problems column is called premises, the answer
column is called responses.
• Short homogenous lists of premises written on
right hand side under column B whereas
responses written on left hand side under
column A.
• Examinees are required to make some sort of
association between each premise and response.
12/8/2020 74Prepared by Mengistu Debele
Advantages of matching exercise
• Measure knowledge of terms definitions, dates,
events that involves simple association.
• Well suited for the ‘who, what, when, and where’
types of learning.
• Can be scored easily and objectively and amenable to
machine scoring.
• It is like a "fun game" for teaching young children.
• Measure a large amount of related factual materials.
• Many questions can be asked in a limited
• amount of testing time.
12/8/2020 75Prepared by Mengistu Debele
Limitations of matching exercise
• Restricted to measure factual materials based
on rote learning.
• If care is not taken, it can encourage serial
memorization rather than association.
• Difficult to get homogenous materials
• It is time consuming to the students.
12/8/2020 76Prepared by Mengistu Debele
Suggestions for Writing Matching items
• Keep both the list of descriptions and the list of options
fairly short and homogeneous.
• Put both descriptions and options on the same page.
• Make sure that all the options are plausible distracters.
• Arrange the answers in some systematic fashion.
• Use longer phrases or statements premises , shorter
phrases, words or symbols in responses.
• Each description in the list should be numbered and list
of options should be identified by letters.
• Include more options than descriptions
• In the directions, specify clearly the basis for matching
and whether the options can be used once or more than
once or not at all.
12/8/2020 77Prepared by Mengistu Debele
Multiple choice items
• The multiple-choice item consists of two parts: (1)
the stem which contains the problem; and (2) a list of
suggested answers (responses or options).
• The incorrect responses are often called decoys, foils
or distracters (distractors).
• The correct response is called the key.
• The stem may be stated as a direct question or an
incomplete statement. From the list of responses
provided, the student selects the one that is correct (or
best).
12/8/2020 78Prepared by Mengistu Debele
Advantages of multiple choice items
• It is adaptable or versatility to the most learning
outcomes.
• It has greater test reliability per item than do true-false
Items.
• It affords excellent content sampling, which generally
leads to more content-valid score interpretations.
• Less prone to ambiguity than the short-answer item.
• Can be scored quickly and accurately by machines,
clerks, teacher aides, and even students themselves.
12/8/2020 79Prepared by Mengistu Debele
Deficiencies or limitations of multiple-choice items
• It is very difficult to construct. It is not easy to get
plausible sounding distracters for teachers.
• There is a tendency for teachers to write multiple-
choice items demanding only factual recall.
• test-wise students perform better on multiple-choice
items than do non-test-wise students and that
multiple-choice tests favor
12/8/2020 80Prepared by Mengistu Debele
Suggestions for Writing Multiple Choice Items
• The stem of the item should clearly formulate a problem.
• Keep the response options as short as possible.
• Be sure that distracters are plausible.
• Include from three to five options.
• It is not necessary to provide additional distracters for an item
simply to maintain the same numbers of distracters for each item.
• Be sure that there is one and only one correct or clearly best answer.
• To increase the difficulty of items, increase the similarity of content
among the options.
• Use the option “none of the above” sparingly. Don’t use this option
when asking for best answer.
• Avoid using “all of the above”
• The stem and the options must be written on the same page.
• Ensure that the correct responses form essentially a random pattern
and in each of the possible response positions about the same
percentage of the time.
12/8/2020 81Prepared by Mengistu Debele
Guide for Preparing the Objective Test
• Begin writing items far enough.
• Match items to intended outcomes at the proper difficulty level.
• The wording of the item should be clear and as explicit as possible.
• Avoid setting interrelated items(Be sure that each item is independent of all
other items)
• Items should be designed to test important and not trivial facts or
knowledge.
• Write an item to elicit discriminately the extent of examinees possession of
only the desired behavior.
• Ensure that there is one and only one correct or best answer to each item.
• Avoid unintentionally giving away the answer through providing irrelevant
clues.
• Use language appropriate to the level of the examinees.
• Items in an achievement test should be constructed to elicit specific course
content and not measure general intelligence.
• Have an independent reviewer to see your test items.12/8/2020 82Prepared by Mengistu Debele
4.2.Subjective /Essay/Tests
• Essay test items should be used primarily for the
measurement of those learning outcomes that can not
be measured by objective test items.
• Based on the amount of freedom given for
• student to organize his/her ideas and write answer,
essay questions are subdivided into two major types-
extended and restricted response.
12/8/2020 83Prepared by Mengistu Debele
Extended-response essay type
• In the extended-response type of essay question
virtually no bounds are placed on the student as to the
points the student will discuss and the type of
organization
• the student will use. Student has complete freedom of
giving response.
• E.g. Describe what you think should be included in a
• school testing program. Illustrate your answer with
specific tests, giving reasons for your test selection.
Your essay should be about 300-400 words or more.
12/8/2020 Prepared by Mengistu Debele 84
CON’D
• Extend-Response
– Permit students to decide which facts they think are most
pertinent, to select their own method of organization, and
write as much as seems necessary for a comprehensive
answer.
– Tends to reveal the ability to evaluate ideas, to relate them
coherently, and to express them succinctly.
– They are valuable for measuring complex skills and
understanding of concepts and principles, they have three
weaknesses:
a) Inefficient for measuring knowledge of factual
material
b) Scoring criteria are not as apparent to the student
c) Scoring is difficult and unreliable due to the various
responses, include array of factual material,
organization, legibility, and conciseness.
12/8/2020 Prepared by Mengistu Debele 85
Restricted-response essay type
• In the restricted-response essay question, the student is more limited in the
form and scope of his answer because he is told specifically the context that
his answer is to take.
E.g. Write how plants make food. Your answer should be about one-half
page long.
– These questions minimize some of the weaknesses of extended, for
three reasons:
• Easier to measure knowledge of factual material.
• Scoring more clear to the student.
• Reduces the difficulty of the scoring.
– The negative side- less effective as a measure of the ability to select,
organize, and integrate ideas.
– In addition, if the restrictions become too tight, the questions reduce to
nothing more than a objective type test.
12/8/2020 Prepared by Mengistu Debele 86
Guide for Preparing the Essay Test
• Restriction of the use of essay questions to only
those learning outcomes that cannot be
satisfactorily measured by objective items.
• Formulation of questions that call forth the
behavior specified in the learning outcomes.
Essay questions should be designed to elicit only
the skill which the item was intended to measure.
• Phrase each question to clearly indicate the
examinees task.
• Indication of approximate time limit for each
question.
• Avoidance of the use of optional questions.
12/8/2020 87Prepared by Mengistu Debele
4.3.Performance Assessment
• Performance test is other teacher-made test
(technique) that try to establish what a person can do.
• Permit the student to organize and construct the
answer.
• Other types of performance may require the student
to use equipment, generate hypotheses, make
observations, construct a model, or perform to an
audience.
• For the most performance assessments do not have a
single right or best response-there may be a variety of
responses.
12/8/2020 88Prepared by Mengistu Debele
CON’D
• Performance assessment tasks are needed to measure
a student’s ability to engage in hands-on activities,
such as conducting an experiment, designing and
conducting a survey, or essay.
• Performance tests generally fall into one of
• three categories:
• (1) Tests under simulated condition.
• (2) Work sample tests.
• (3) Recognition tests.
12/8/2020 89Prepared by Mengistu Debele
4.4. Authentic Assessment
Authentic assessment is the type of
assessment that aimed at evaluating
students' abilities in 'real-world' contexts.
In authentic assessment students are
asked to demonstrate practical skills and
concepts they have learned.
12/8/2020 90Prepared by Mengistu Debele
CON’D
• Authentic assessment does not encourage rote
learning and passive test-taking.
• it focuses on students' analytical skills, ability
to integrate what they learn, creativity, ability
to work collaboratively and written and oral
expression skills.
• It also called performance assessment.
12/8/2020 91Prepared by Mengistu Debele
Tools to employ authentic assessment
Observation and Observation tools
- Checklist: a checklist enables the observer to note-albeit very
quickly and very effectively-only whether or not a behavior
occurred. It does not permit the observer to rate the quality of,
degree to which, or frequency of occurrence of a particular
behavior. It is a list of items you need to verify.
- Anecdotal records: Anecdotal records are the least structured
observational tool. They depict actual behavior in natural
situations. short story that describes students’ behaviors and
students’ actual performance.
- Rating scales: rating scales can be used for single observations
or over long periods of time. They are a set of categories
designed to elicit information about a quantitative or a
qualitative attribute.
12/8/2020 92Prepared by Mengistu Debele
A. Running records
• Running records is a tool that helps teachers to
identify patterns or sequences in student
practical work, reading behaviors, laboratory
experiment procedure, drawing procedure, and
so on.
12/8/2020 93Prepared by Mengistu Debele
B. Project Work Assessment
• Project work assessment can be conducted in
two ways.
1. Process assessment
2. Product assessment
12/8/2020 94Prepared by Mengistu Debele
C. Portfolio assessment
• Portfolio is a collection of a student's work
specifically selected to tell a particular story
about the student’s work accomplishments.
• Portfolio is the systematic collection of student
work measured against predetermined scoring
criteria.
12/8/2020 95Prepared by Mengistu Debele
D. Self-Assessment
• Self-assessment is the process of looking at
oneself in order to assess aspects that are
important to one's identity. It is one of the
motives that drive self-evaluation, along with
self-verification and self-enhancement.
12/8/2020 96Prepared by Mengistu Debele
E. Reflection
• Reflection is an active process of witnessing
one’s own experience in order to take a closer
look at it, sometimes to direct attention to it
briefly, but often to explore it in greater depth.
12/8/2020 97Prepared by Mengistu Debele
F. Rubric
• Rubric is a scoring scale used to assess student
performance along a task-specific set of
criteria.
• A rubric is comprised of two components
which are criteria and levels of performance.
Each rubric has at least two criteria and at least
two levels of performance.
• Rubric is criteria prepared for scoring
12/8/2020 98Prepared by Mengistu Debele
Types of rubric
1. Analytic rubric: it articulates levels of
performance for each criterion so the teacher
can assess student performance on each
criterion.
2. Holistic rubric: it does not list separate
levels of performance for each criterion.
Instead, a holistic rubric assigns a level of
performance by assessing performance across
multiple criteria as a whole.
12/8/2020 99Prepared by Mengistu Debele
Unit 5Assembling, Administering and Scoring
Classroom Tests
5.1. Assembling the Test items
It is the process of arranging items on the test so that they are
easy to read.
• Plan the layout of the test in such a way as to be convenient for
recording answers and also scoring of the test items on
separate answer sheets.
• Group items of the same format and arrange in logical order
(true-false, matching items, completion/short answer items,
multiple choice, essay) together with their relevant directions
on what need to be done by the testees.
• Group items dealing with the same content together within
item types.
12/8/2020 Prepared by Mengistu Debele 100
CON’D
• Arrange the test items in progressive order of
difficulty starting from simple to complex questions.
• Ensure that one item does not provide clues to the
answer of another item or items in the same or
another section of the test.
• Ensure that the correct responses from essentially a
random pattern and in each of the possible response
positions about the same percentage of the time for
multiple choice items.
12/8/2020 Prepared by Mengistu Debele 101
5.2. Administrations of classroom tests
Test administration refers to the procedure of
actually presenting the learning task that the
examinees are required to perform in order to
ascertain the degree of learning that has taken
place during the teaching-learning process.
- validity and reliability of test scores can be
greatly reduced when test is poorly
administered.
12/8/2020 102Prepared by Mengistu Debele
Consider the following during test administration
• Ensure that the physical and psychological
environment are conducive to testees.
• Avoid malpractices both from testees and invigilator.
• Avoid unnecessary threat from test administrators.
• Stick to the instructions regarding the conduct of the
test and avoid giving hits to testees who ask about
particular items. But make corrections or
clarifications to the testees whenever necessary.
• Keep interruptions during the test to a minimum.
12/8/2020 103Prepared by Mengistu Debele
Ensuring quality in test administration
• Collection of the question papers in time from custodian to
be able to start the test at the appropriate time stipulated
• Ensure compliance with the stipulated sitting arrangements
in the test to prevent collision between or among the testees.
• Ensure orderly and proper distribution of questions papers
to the testees.
• Do not talk unnecessarily before the test. Testees’ time
should not be wasted at the beginning of the test with
unnecessary remarks, instructions or threat that may
develop test anxiety.
• It is necessary to remind the testees of the need to avoid
malpractices before they start and make it clear that
cheating will be penalized.
12/8/2020 104Prepared by Mengistu Debele
Credibility and Civility in Test Administration
• Credibility deals with the value the eventual
recipients and users of the results of assessment
place on the result with respect to the grades
obtained, certificates issued or the issuing
institution.
• Civility deals whether the persons being assessed
are in such conditions as to give their best without
hindrances and encumbrances in the attributes
being assessed and whether the exercise is seen as
integral to or as external to the learning process.
12/8/2020 105Prepared by Mengistu Debele
Points need consideration in test administration
1. Instruction: both for test administrators and
testees
2. Duration of the test
3. Venue and sitting arrangement
4. Other necessary conditions
12/8/2020 106Prepared by Mengistu Debele
5.3. Scoring the Test
• Marking schemes (key) should be prepared
alongside the construction of the test items in
order to score the test objectively.
Scoring Essay Test
In the essay test the examiner is an active part
of the measurement instrument. Therefore, the
viabilities within and between examiners affect
the resulting score of examinee.
12/8/2020 107Prepared by Mengistu Debele
Methods of scoring essay questions
There are two common methods of scoring essay questions.
These are:
1. The Point or Analytic Method: in this method each answer is
compared with already prepared ideal marking scheme (scoring
key) and marks are assigned according to the adequacy of the
answer.
2. The Global/Holistic of Rating Method: In this method the
examiner first sorts the response into categories of varying quality
based on his general or global impression on reading the response.
The standard of quality helps to establish a relative scale, which
forms the basis for ranking responses from those with the poorest
quality response to those that have the highest quality response.
Usually between five and ten categories are used with the rating
method with each of the piles representing the degree of quality
and determines the credit to be assigned.
12/8/2020 108Prepared by Mengistu Debele
Scoring Objective Test
• Objective test can be scored by various methods with
ease.
1. Manual Scoring: in this method answers to tests are
scored by direct comparison of the examinees answer
with the marking key.
2. Stencil Scoring: in this method answers to tests are
scored by laying the stencil over each answer sheet
and the number of answer checks appearing through
the holes is counted.
3. Machine Scoring: in this method answers to tests are
scored by machine with computers and other possible
scoring devices using certified answer key prepared
for the test items.
12/8/2020 109Prepared by Mengistu Debele
Unit 6:Interpreting, Describing, and Analyzing Test Scores
6.1.Methods of interpreting classroom test scores
Norm-reference interpretation: norm-referenced
test score interpretation tells us how an
individual student’s score compared with other
students’ scores in the group who have take the
same test.
12/8/2020 110Prepared by Mengistu Debele
6.2 Judging The Quality of a Classroom Test
Item Analysis
Item analysis is the process of “testing the item” to
ascertain specifically whether the item is functioning
properly in measuring what the entire test is
measuring.
Purpose and Uses of Item Analysis
- To discriminate between high and low achievers in a
norm-referenced test(discrimination power).
- To determine items having desirable qualities of a
measuring instrument(difficulty level).
12/8/2020 Prepared by Mengistu Debele 111
The Process of Item Analysis for Norm –
Referenced Classroom Test
• In Norm-referenced test, special emphasis is
placed on item difficulty and item discriminating
power.
• The process of item analysis begins after the test
has been administered (or trial tested), scored and
recorded.
• The process of Item Analysis is carried out by
using two contracting test groups composed from
the upper and lower 25% or 27% or 33% of the
testees on which the items are administered.
12/8/2020 112Prepared by Mengistu Debele
Computing Item Difficulty
• Item difficulty index is denoted by P.
• The difficulty index (P) for each of the items is
obtained by using the formula:
P = Number of testees who got item right (R) X100
Total number of testees responding to item (T)
Thus, P = R X 100
T
The item difficult indicates the percentage of testees
who got the item right in the two groups used for the
analysis. That is 0.7 x 100% = 70%.
12/8/2020 113Prepared by Mengistu Debele
Where c* is the correct answer
1) calculate item difficulty index(p)
2) find good plausible destructor
3) calculate item discrimination power
Alternatives
Upper
Group
A B C* D E Omits
0 0 15 0 0 0
Lower
Group
4 2 8 1 0 0
12/8/2020 114Prepared by Mengistu Debele
Solution
P = Upper right +Lower right X 100
Total students in analysis
P = 15+8 X 100
30
= 23/30 (100)
= 0.766(100)
= 76.66%
Alternative A is good plausible distracter because
it attracts more students from lower group than
the upper group students.
12/8/2020 115Prepared by Mengistu Debele
Compute the item-discrimination index
Item discrimination power is denoted by D.
D = RU-RL
½ T
Where RU is right from upper group
RL is right from lower group
T is total students from upper and lower
groups
D = 15-8
½(30)
=7/15
= 0.47
12/8/2020 116Prepared by Mengistu Debele
Interpretation
• Item discrimination values range from – 1∙00 to + 1∙00.
• The higher the discriminating index, the better is an item in
differentiating between high and low achievers.
Item discriminating power is a:
• Positive value when a larger proportion of those in the high
scoring group get the item right compared to those in the low
scoring group. If D has a positive value, the item has positive
discrimination.
• Negative value when more testees in the lower group than in the
upper group get the item right. If D has a negative value, the
item has negative discrimination.
• Zero(0) value when an equal number of testees in both groups
get the item right; and
• One (1) when all testees in the upper group get the item right
and all the testees in the lower group get the item wrong.
12/8/2020 117Prepared by Mengistu Debele
Compute the Sensitivity to Instructional Effects
Sensitivity to instructional effects is denoted by S.
S requires two test administrations, i.e. pretest and
posttest
S = RA-RB
T
• Where RA = number of pupils who got the item
right after instruction
RB = number of pupils who answered the
item correctly before instruction
T= total number of pupils who attempted
the item both times
12/8/2020 118Prepared by Mengistu Debele
Following is an example of the behavior of four
items before (B, pretest) and after (A, posttest) instruction:
B = Pretest; A = Posttest, + means item correctly answered.
- means item incorrectly answered.
1 2 3 4
B A B A B A B A
Abebe - + - - + - - +
Balcha - + - - + - - +
Chaltu - + - - + - + +
Dawit - + - - + - + +
Eliyas - + - - + - - -
12/8/2020 119Prepared by Mengistu Debele
Analysis from the above table
Item 1. This is what would be expected in an "ideal" criterion
referenced test(CRT). No one answered the item correctly
before instruction; everybody after. This suggests that the
instruction was effective.
Item 2. We would not want these results in a CRT. No pupil
answered the item correctly either before or after
instruction.
Item 3. We would not want these results in any test-be it CRT
or NRT. Why should pupils who answered correctly before
instruction answer incorrectly after instruction?
Item 4. This is how we would expect good test items to
behave. We would expect some pupils to know the correct
answer before instruction but a larger number to answer the
item correctly after "effective" instruction.
12/8/2020 120Prepared by Mengistu Debele
Calculate the sensitivity to instructional effects for item
1, 2 , 3 and 4 in above example
Solution
S = RA-RB
T
1. S = 5-0 = 1.00
5
2. S = 0-0 = 0 .00
5
3. S = 0-5 = -1 .00
5
4. S = 4-2 = 0.4
5
12/8/2020 121Prepared by Mengistu Debele
Unit 7:Describing Educational Data
• 7.1. Units of measurement
• Data differ in terms of what properties of the
real number series (order, distance, or origin)
we can attribute to the scores. The most
refined-classification of scores is one
suggested by Stevens (1946), who classified
measurement scales as nominal, ordinal,
interval, and ratio scales.
12/8/2020 Prepared by Mengistu Debele 122
Nominal scale
• A nominal scale is the simplest scale of
measurement. Nominal scale is about naming.
E.g. Room 1, Room 2.
• It involves the assignment of different
numerals to categories that are qualitatively
different.
• For example, for purposes of storing data on
computer cards, we might use the symbol 0 to
represent a female and the symbol 1 to
represent a male.
12/8/2020 Prepared by Mengistu Debele 123
CON’D
• If measurement is defined as "the assignment
of numerals to objects or events according to
rules", then nominal data indicate
measurement. If, on the other hand,
measurement implies a quantitative difference,
then nominal data do not indicate
measurement.
12/8/2020 Prepared by Mengistu Debele 124
Ordinal scale
• An ordinal scale has the order property of a real
number series and gives an indication of rank order.
Thus, magnitude is indicated, if only in a very gross
fashion. Rankings in a music contest or in an athletic
event would be examples of ordinal data. Ordinal
scale is about ranking in order. E.g. Grade 1, Grade 2,
Grade 3.
• It help us to know who is best, second best, third
• best, and so on to select the top pupils for some
• task.
12/8/2020 Prepared by Mengistu Debele 125
Interval scale
• With interval data we can interpret the
distances between scores. It provides
information with regard to the differences
between the scores. Interval scale is about
difference and equal distance b/n attributes
under measurement, There is no absolute zero
in interval scale. E.g. Fahrenheit/Celsius
temperature scale.
12/8/2020 Prepared by Mengistu Debele 126
Ratio scale
• If one measures with a ratio scale, the ratio of
the scores has meaning. Thus, a person who is
86" is twice as tall as a person who is 43". We
can make this statement because a
measurement of 0(zero) actually indicates no
height. That is, there is a meaningful zero
point. Thus, there is absolute or true zero ratio
scale.
12/8/2020 Prepared by Mengistu Debele 127
7.2. Measures of central tendency
• Measures of central tendency give some
idea of the average or typical score in the
distribution.
• Central tendency describes the center
location of distribution. There are three
commonly used measures of central
tendency. These are the mode, the
median and the mean.
12/8/2020 Prepared by Mengistu Debele 128
The mode
• The mode is defined as a score value which is
obtained most often or it is the most frequently
occurring value. A set of score may has one,
two more than two or no mode (mono modal,
bimodal, multi modal or no modal).E.g. in set
of 1,2,3,4,5 there is no mode. in set of data
6,7,7,7,8,8,9(the mode is 7, mono-modal),in
set of data 4, 5,5, 6,6, 7,8(the modes are 5 and
6 bimodal), in set of 4,4,5,5,6,6,7,8(the modes
are 4, 5,and 6 multimodal).
12/8/2020 Prepared by Mengistu Debele 129
The median
• The median (Mdn) is the point below which 50
percent of the scores lie. An approximation to that
point is obtained from ordered data by simply finding
the score in the middle of the distribution.
• It is the 50th percentile of the distribution that having
half of the case above it and half of the case below it
in distribution.
12/8/2020 Prepared by Mengistu Debele 130
CON’D
• When the number of ordered set of score is odd the
median is the middle score.
• Mdn =N+1th case.
• E.g. 3,4,4,7,7,7,8,8,9 N =9, (9+1)/2 = 5th case.
The 5th case is the median. i.e. 7.
If the number of ordered set of score is even, middle
score is the average of Nth +1 case.
e.g. 3,4,4,7,7,7,8,8,9,10. N=10, 10/2=5th case. the 5th
case is 7. the next 1 case is also 7.
Thus, Mdn = (7+7)/2 =7
12/8/2020 Prepared by Mengistu Debele 131
The mean (x)
• The mean (x) is the arithmetic average of
a set of scores. It is calculated by adding
all the scores in the distribution and
dividing by the total number of scores (N).
The formula is
x = Σx
N where x= raw score,Σ=sum of x
and N= the number of scores.
12/8/2020 Prepared by Mengistu Debele 132
7.3. Measures of variability
Measures of variability used to know how the scores are
spread or dispersed. the common measures of
variability are range (R),, variance(σ2 = population
variance, S2 = sample variance ), standard deviation
(S), and percentile.
• The range(R) of a set of measurements is defined to
be the difference between the largest and the smallest
measurements of the set.
• Occasionally the range (high score-low score + 1) is
used.
12/8/2020 Prepared by Mengistu Debele 133
The Variance
• Variance indicates the degree of spread by
comparing every score to the mean of the
distribution. The formula is
• S2 = (x-x)
N
E.g. find variance for set of data 9,8,8,7,7,7,4,4,3,3.
Solution: N=10, mean= 6, sum of squared difference
is 46. hence S2 = 46/10 = 4.6
12/8/2020 Prepared by Mengistu Debele 134
Standard deviation
• The standard deviation involves the square
root of the variance. The formula is
S = S2
12/8/2020 Prepared by Mengistu Debele 135
Unit 8
Characteristics of good tests
• 8.1. Reliability
• Reliability can be defined as the degree of
consistency between two measures of the same tests.
• The classical theory of reliability can best be
explained by starting with observed scores (Xs).
• observed score as being made up of a "true
• score" (T) and an "error score" (E) such that
• X = T + E where X = observed score
• T = true score
• E = error score
12/8/2020 Prepared by Mengistu Debele 136
• A good test measures what it intends to
measure.
• The most important criteria for evaluating
tests are validity and reliability.
Measurement tools can be judged on
variety of merits. These include practical
issues as well as technical issues. All
instruments have strengths and weakness.
No instrument is perfect for every task.
12/8/2020 Prepared by Mengistu Debele 137
• Some of the practical issues that need to
be considered include: cost, availability,
training required, ease of administration,
scoring, analysis, time and effort required
for respondent to complete measure.
• Along with the practical issues ,
measurement tools may be judged on the
following characteristics:
12/8/2020 Prepared by Mengistu Debele 138
• Validity: a test is considers as valid when it
measures what is supposed to measure.
• Reliability: a test is reliable if it is taken
again by the same students under the
same circumstances and the score is
almost the constant, taking into
consideration that the time between the
test and retest is reasonable length.
12/8/2020 Prepared by Mengistu Debele 139
• Objectivity: objectivity means that if the
test is marked by different persons, the
score is will be the same. In other words,
marking process should not be affected by
the marker’s personality.
• Comprehensiveness: A good test should
include items from different areas of
material assigned for the test.
12/8/2020 Prepared by Mengistu Debele 140
• Simplicity: simplicity means that the test
should be written in a clear, correct and
simple language, it is important to keep
the method of testing as simple as
possible while still testing the skill you
intend to test.
• Scorability: scorablity means that each
item in the test has its own mark related to
the distribution of marks given
12/8/2020 Prepared by Mengistu Debele 141
Approaches to estimate reliability
• The common approaches to estimate reliability are:
1. Measures of stability
2. Measures of equivalence
3. Measures of equivalence and stability
4. Measures of internal consistency
• a. Split-half
• b. Kuder-Richardson estimates
5. Scorer (judge) reliability
12/8/2020 Prepared by Mengistu Debele 142
Measure of stability
• A measure of stability, often called a test-retest
estimate of reliability, is obtained by
administering a test to a group of persons, re-
administering the same test to the same group
at a later date, and correlating the two sets of
scores.
12/8/2020 Prepared by Mengistu Debele 143
Measures of Equivalence
• Measures of equivalence estimate reliability
based on equivalent-forms that obtained by
giving two forms (with equal content, means,
and variances) of a test to the same group on
the same day and correlating these results.
12/8/2020 Prepared by Mengistu Debele 144
Measures of Equivalence and Stability
• A coefficient of equivalence and stability
could be obtained by giving one form of
the test and, after some time,
administering the other form and
correlating the results.
12/8/2020 Prepared by Mengistu Debele 145
Measures of Internal Consistency
• The split-half method of estimating reliability is
theoretically the same as the equivalent-forms
method.
• The appropriate formula is a special case of the
Spearman-Brown prophecy formula.
• rxx = 2r(1/2)(1/2)
1+r(1/2)(1/2)
 rxx = estimated reliability of the whole test
 r (1/2) (1/2) = reliability of the half-test
12/8/2020 Prepared by Mengistu Debele 146
Kuder-Richardson Estimates
• If items are scored dichotomously (right or wrong), one way to
avoid the problems of how to split the test is to use one of the
Kuder-Richardson formulas. The formulas may be considered
as representative of the average correlation obtained from all
possible split-half reliability estimates. K-R 20 and K-R 21 are
two formulas used extensively.
K-R20: rxx = n (1-Σpq)
n-1 S2
x
K-R21:rxx = n (1-x(n-x))
n-1 nS2
x
12/8/2020 Prepared by Mengistu Debele 147
where n = number of items in test
p = proportion of people who answered item correctly
q = proportion of people who answered item incorrectly
(q = 1 - p; if P = .20, q = .80).
pq = variance of a single item scored dichotomously
(right or wrong)
Σ = summation sign indicating that pq is summed over
all items
S2 = variance of the total test
X = mean of the total test
12/8/2020 Prepared by Mengistu Debele 148
Scorer (Judge) Reliability
• If a sample of papers has been scored
independently by two different readers, the
traditional Pearson product moment correlation
coefficient (r) can be used to estimate the
reliability of a single reader's scores.
12/8/2020 Prepared by Mengistu Debele 149
Factors Influencing Reliability
• Test length: the longer tests give more reliable scores.
• Speed: A test is considered a pure speed test if
everyone who reaches an item gets it right but no one
has time to finish all the items. Thus, score
differences depend upon the number of items
attempted. The opposite of a speed test is a power
test.
• Group homogeneity: the more heterogeneous the
group, the higher the reliability.
• Objectivity: the more subjectively a measure is
scored, the lower the reliability of the measure.
12/8/2020 Prepared by Mengistu Debele 150
8.2 validity
• What is validity?
• Validity is the extent to which a test measures
what it claims to measure. It is vital for a test
to be valid in order for the results to be
accurately applied and interpreted.
• Validity can be best defined as the extent to
which certain inferences can be made
accurately from and certain actions should be
based on-test scores or other measurement.
12/8/2020 Prepared by Mengistu Debele 151
• Validity(meaningfulness)is the degree to
which an evaluation (test) actually serves
the purpose for which it is intended.
• Three important things about validity are:
• 1 validity refers to the appropriate use of
the scores but not the test itself. This
means it refers to the interpretation of test
scores rather than test itself
12/8/2020 Prepared by Mengistu Debele 152
• 2. validity is a matter of degree. That
means, tests are not absolutely valid or
invalid
• 3. validity is specific to a particular use. No
test serves all purposes equally well.
There are three main classes of validity.
these are content validity, criterion-related
validity, and construct validity.
12/8/2020 Prepared by Mengistu Debele 153
Content validity
• Content validity is defined as the extent to
which the instrument measures what it
purports to measure. Content validity pertains
to the degree to which the instrument fully
assesses or measures the contents of interest.
When a test has content validity, the items on
the test represent the entire range of possible
items the test should covers.
12/8/2020 Prepared by Mengistu Debele 154
CON’D
• Content validity deals with adequate sampling of
items. When evaluation instrument adequately
samples certain types of situations or subject matter,
it is said to have content validity. if item sampling is
not representative to adequate degree, the test lacks
content validity.
• Face validity: is a form of content validity that refers
to the measuring instrument appears/seems to
measure what it is intended to measure.
12/8/2020 Prepared by Mengistu Debele 155
Criterion-related validity
• A test is said to have criterion-related validity
when the test has demonstrated its
effectiveness in predicting criterion or
indicators of a construct. There are two types
of criterion-related validity:
• 1 concurrent validity
• 2. predictive validity
12/8/2020 Prepared by Mengistu Debele 156
Concurrent validity
• Concurrent validity occurs when the criterion
measures are obtained at the same time as the
test scores. This indicates the extent to which
the test scores accurately estimate an
individual’s current state with regards to the
criterion. It is a comparison of scores on some
instrument with current scores on another
instrument.
12/8/2020 Prepared by Mengistu Debele 157
Predictive validity
• Predictive validity occurs when the criterion
measures are obtained at a time after the test.
examples of test with predictive validity are
career or aptitude tests which are helpful in
determining who is likely to succeed of fail in
certain subjects or occupations. predictive
validity is a comparison of scores on some
instruments with some future behavior or
future scores on another instrument.
12/8/2020 Prepared by Mengistu Debele 158
Construct validity
• Construct validity is the degree to which an
instrument measures the trait or theoretical
constructs that it is intended to measure. A
construct is human characteristics. A test is
said to be valid in terms of construct when it
possesses psychological concepts. A test has
construct validity if it demonstrates an
association between the test scores and the
prediction of a theoretical trait.
12/8/2020 Prepared by Mengistu Debele 159

Weitere ähnliche Inhalte

Was ist angesagt?

Measurement and evaluation
Measurement and evaluationMeasurement and evaluation
Measurement and evaluation
Brien Naco
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
cyrilcoscos
 
Importance of evaluation
Importance of evaluationImportance of evaluation
Importance of evaluation
Jhunisa Agustin
 

Was ist angesagt? (20)

Types of Assessment in Classroom
Types of Assessment in ClassroomTypes of Assessment in Classroom
Types of Assessment in Classroom
 
Classroom Assessment
Classroom AssessmentClassroom Assessment
Classroom Assessment
 
Measurement and evaluation
Measurement and evaluationMeasurement and evaluation
Measurement and evaluation
 
Rubric
RubricRubric
Rubric
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Specific techniques of curriculum evaluation
Specific techniques of curriculum evaluationSpecific techniques of curriculum evaluation
Specific techniques of curriculum evaluation
 
Validity and objectivity of tests
Validity and objectivity of testsValidity and objectivity of tests
Validity and objectivity of tests
 
Definition of Assessment,
Definition of Assessment,Definition of Assessment,
Definition of Assessment,
 
Education planning types
Education planning typesEducation planning types
Education planning types
 
Measurement and evaluation
 Measurement and evaluation Measurement and evaluation
Measurement and evaluation
 
Construction of Test
Construction of TestConstruction of Test
Construction of Test
 
Achievement Test,
Achievement Test,Achievement Test,
Achievement Test,
 
Meaning, need and characteristics of evaluation
Meaning, need and characteristics of evaluationMeaning, need and characteristics of evaluation
Meaning, need and characteristics of evaluation
 
achievement test
achievement testachievement test
achievement test
 
Norms Referenced and Criteria Referenced Evaluation
Norms Referenced and Criteria Referenced EvaluationNorms Referenced and Criteria Referenced Evaluation
Norms Referenced and Criteria Referenced Evaluation
 
Purpose, Principle, Scope of Test and Evaluation
Purpose, Principle, Scope of Test and EvaluationPurpose, Principle, Scope of Test and Evaluation
Purpose, Principle, Scope of Test and Evaluation
 
Importance of evaluation
Importance of evaluationImportance of evaluation
Importance of evaluation
 
Types of Grading and Reports
Types of Grading and ReportsTypes of Grading and Reports
Types of Grading and Reports
 
Distinction among measurement, assessment and evaluation
Distinction among measurement, assessment and evaluationDistinction among measurement, assessment and evaluation
Distinction among measurement, assessment and evaluation
 
Importance of test
Importance of testImportance of test
Importance of test
 

Ähnlich wie Ppt on educational measurement and evaluation

Assessment and Evaluation 2011.pptx
Assessment and Evaluation 2011.pptxAssessment and Evaluation 2011.pptx
Assessment and Evaluation 2011.pptx
Sendafa edget
 
kto12classroomassessmentppt-150405021132-conversion-gate01 (1).pdf
kto12classroomassessmentppt-150405021132-conversion-gate01 (1).pdfkto12classroomassessmentppt-150405021132-conversion-gate01 (1).pdf
kto12classroomassessmentppt-150405021132-conversion-gate01 (1).pdf
RosellMaySilvestre3
 
Ccepresentation 2-101016010554-phpapp02
Ccepresentation 2-101016010554-phpapp02Ccepresentation 2-101016010554-phpapp02
Ccepresentation 2-101016010554-phpapp02
Palaxayya Hiremath
 
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxassessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
Marjorie Malveda
 

Ähnlich wie Ppt on educational measurement and evaluation (20)

Educational assessment
Educational assessment Educational assessment
Educational assessment
 
Curriculum (formative & summative) evaluation
Curriculum (formative & summative) evaluationCurriculum (formative & summative) evaluation
Curriculum (formative & summative) evaluation
 
Assessment (1)
Assessment (1)Assessment (1)
Assessment (1)
 
Assessment
AssessmentAssessment
Assessment
 
CHAPTER 4 Assessing Student Learning Outcomes.pptx
CHAPTER 4 Assessing Student Learning Outcomes.pptxCHAPTER 4 Assessing Student Learning Outcomes.pptx
CHAPTER 4 Assessing Student Learning Outcomes.pptx
 
ASSESSMENT and EVALUATION.pptx
ASSESSMENT and EVALUATION.pptxASSESSMENT and EVALUATION.pptx
ASSESSMENT and EVALUATION.pptx
 
K to 12 classroom assessment ppt
K to 12 classroom assessment pptK to 12 classroom assessment ppt
K to 12 classroom assessment ppt
 
continous assessment (LH) for Jinela Teachers.pdf
continous assessment (LH)   for Jinela Teachers.pdfcontinous assessment (LH)   for Jinela Teachers.pdf
continous assessment (LH) for Jinela Teachers.pdf
 
Determinants of Lecturers Assessment Practice in Higher Education in Somalia
Determinants of Lecturers Assessment Practice in Higher Education in SomaliaDeterminants of Lecturers Assessment Practice in Higher Education in Somalia
Determinants of Lecturers Assessment Practice in Higher Education in Somalia
 
2015 PGDT 423 (1).pptx
2015 PGDT 423 (1).pptx2015 PGDT 423 (1).pptx
2015 PGDT 423 (1).pptx
 
Assessing student learning outcomes
Assessing student learning outcomesAssessing student learning outcomes
Assessing student learning outcomes
 
Module 6-L A & E, Weekend.pptx
Module 6-L A & E, Weekend.pptxModule 6-L A & E, Weekend.pptx
Module 6-L A & E, Weekend.pptx
 
Assessment and Evaluation 2011.pptx
Assessment and Evaluation 2011.pptxAssessment and Evaluation 2011.pptx
Assessment and Evaluation 2011.pptx
 
Types of evaluation by dr.thanuja.k
Types of evaluation by dr.thanuja.k Types of evaluation by dr.thanuja.k
Types of evaluation by dr.thanuja.k
 
Using assessment to support the curriculum
Using assessment to support the curriculumUsing assessment to support the curriculum
Using assessment to support the curriculum
 
DO 8, s2015.pptx
DO 8, s2015.pptxDO 8, s2015.pptx
DO 8, s2015.pptx
 
kto12classroomassessmentppt-150405021132-conversion-gate01 (1).pdf
kto12classroomassessmentppt-150405021132-conversion-gate01 (1).pdfkto12classroomassessmentppt-150405021132-conversion-gate01 (1).pdf
kto12classroomassessmentppt-150405021132-conversion-gate01 (1).pdf
 
Ccepresentation 2-101016010554-phpapp02
Ccepresentation 2-101016010554-phpapp02Ccepresentation 2-101016010554-phpapp02
Ccepresentation 2-101016010554-phpapp02
 
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxassessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
 
Basic concepts in Assessments (Educ 9)
Basic concepts in Assessments (Educ 9)Basic concepts in Assessments (Educ 9)
Basic concepts in Assessments (Educ 9)
 

Kürzlich hochgeladen

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Kürzlich hochgeladen (20)

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

Ppt on educational measurement and evaluation

  • 1. EDUCATIONAL MEASUREMENT AND EVALUATION A Teaching Note For Regular Undergraduate Students 12/8/2020 1 CREATED FOR BY: Dr. Mengistu Debele Gerbi (Ph.D) Ambo University 2020 Prepared by Mengistu Debele
  • 2. UNIT 1: BASIC CONCEPTS AND PRINCIPLES IN EDUCATIONAL MEASUREMENT AND EVALUATION 1.1 Meaning and Definitions of Basic Terms Dear students, can you define the following terms? - TEST - TESTING - MEASUREMENT - ASSESSMENT - EVALUATION 212/8/2020 Prepared by Mengistu Debele
  • 3. TEST According to Gronlund (1981), test refers to the presentation of a standardized set of questions to be answered by pupils. It is also an instrument for measuring samples of a person’s behavior by posing a set of questions in a uniform manner. Test is a method used to determine students’ ability to complete certain tasks or demonstrate mastery of a skill or knowledge of content. Test is a method of measuring person’s quality, ability, knowledge or performance in a given domain. 312/8/2020 Prepared by Mengistu Debele
  • 4. 1. Intelligence test, 12. Speed test 2. Personality test, 13. Power test 3. Aptitude test, 14. Objective test 4. Achievement test, 15. subjective test 5. Prognostic test, 16. Teacher-made test 6. Performance test, 17. Formative test 7. Diagnostic test, 18. Summative test 8. Preference test, 19. Placement test 9. Accomplishment test, 20. Standardize test 10. Scale test , 21. Norm-reference test 11. Criterion-reference test, General Classification of test 12/8/2020 4Prepared by Mengistu Debele
  • 5. Testing is the process of administering the test to the pupils. a technique of obtaining information needed for evaluation purposes. Tests, quizzes, measuring instruments- are devices used to obtain such information. Test is the most commonly used method of making measurements in education. TESTING 12/8/2020 5Prepared by Mengistu Debele
  • 6. MEASUREMENT Students! Are test and measurement the same? Of course, they are not the same. Measurement refers to giving or assigning a numerical value to certain attributes or behaviors. It is a systematic process of obtaining the quantified degree . It is an assignment of numbers (quantity), uses variety of instruments: test, rating scale. It is the process of obtaining numerical description of the degree of individual possesses. Quantifying of how much does learner learned. E.g. Ujulu scored 8/10 in M &E test. A measurement is takes place after a test is given and score is obtained. Thus, test are instrument of measurement. We use test, project, home work, laboratory work, quiz, and assignment as instruments of measurement. Measurement can also refers to both the score obtained and the process used. 612/8/2020 Prepared by Mengistu Debele
  • 7. ASSESSMENT Assessment is a process of gathering and organizing data in order to monitor the progress of students’ learning. According to Airasia(1997) and Kruikshank, et al.(1999), assessment refers to the process of collecting (through paper-and- pencil tests, observation, self-reported questionnaires) , interpreting, and synthesizing information to aid in decision making. So that, it is the basis for decision making (evaluation) It is the process of collecting, recoding, scoring, describing, and interpreting information about learning. The goal of assessment is to make improvement. It deals with the nature of the learner/(what s/he learned, how s/he learned). 712/8/2020 Prepared by Mengistu Debele
  • 8. CON’D • Assessment is a systematic basis for making inference about the learning and development of students, the process of defining, selecting, designing, collecting, analyzing, interpreting and using information to increase students’ learning and development. 812/8/2020 Prepared by Mengistu Debele
  • 9. EVALUATION Students! Are assessment and evaluation the same? Surely they are not the same. According to Gage and Berliner(1998), evaluation refers to the process of using information to judge the goodness, worth or quality of students’ achievement, teaching programs, and educational programs. It is concerned with making judgments on the worth or value of a performance, answer the question “how good, adequate, or desirable”. It is also the process of obtaining, analyzing and interpreting information to determent the extent to which students achieve instructional objective. It occurs after assessment has been done because teacher here is in a position to make informed judgment. 912/8/2020 Prepared by Mengistu Debele
  • 10. CON’D Hence, Evaluation, deals with determining the value and effectiveness of something-often a program. From this standpoint, an evaluation encompass an assessment initiative as the source for making judgments about program quality. Evaluation is the process of making judgment of a product, a response, or a performance based on a predetermined criteria. Thus, evaluation is simply the quantitative description and/or qualitative description of a behavior plus value judgment. Dear students! Which of the two is a more comprehensive, assessment or evaluation? 12/8/2020 10Prepared by Mengistu Debele
  • 11. 1.2. ROLES AND PRINCIPLES OF EVALUATION Dear students! Can you mention major uses of evaluation results in improving teaching- learning process? Great, according to Gronlund and Linn (1995), evaluation used in:  Programmed instruction  Curriculum development Marking and reporting 12/8/2020 11Prepared by Mengistu Debele
  • 12. Functions of Evaluation 1. Evaluation assesses or make appraisal of - Educational objectives, programs, curriculums, instructional materials, facilities - Teacher - Learner - Public relations of the school - Achievement scores of learner 2. Evaluation products research 12/8/2020 12Prepared by Mengistu Debele
  • 13. CON’D • In view of Gronlund (1981), evaluation results are used for making decisions related to instruction, guidance and counseling, administration, and research. • Well and systematically organized results of evaluation help teachers to improve teaching through judging the adequacy and appropriateness of instruction referring to objectives, contents, teaching methods and assessment techniques. 12/8/2020 13Prepared by Mengistu Debele
  • 14. 1.3. Principles of evaluation • Evaluation is effective when it is based on sound principles. As to Gronlund and Linn(1995) major principles of evaluation are 1. Evaluation should give priority to the determination of what to be measured. 2. The types of test items format/evaluation techniques to be used should be determined by the specific learning outcomes to be measured. 3. Comprehensive evaluation requires variety of techniques. 4. A proper use of evaluation techniques requires an awareness of their limitation and strengths.12/8/2020 14Prepared by Mengistu Debele
  • 15. CON’D 5. Test items should be based on a representative sample of the subject contents and the specified learning outcomes to be measured, 6. Test items should be of the appropriate level of difficulty. 7. Test items should be constructed so that extraneous factors do not prevent students from responding. 8. Test items should constructed so that the students obtain the correct answer only if they attained the desired learning outcomes. 9. Test items should be constructed so that it contributes to improvement of the teaching –learning process.12/8/2020 15Prepared by Mengistu Debele
  • 16. To summarize the principles of evaluation • Evaluation should be: 1. Based on clearly stated objectives 2. Comprehensive 3. Cooperative 4. Used judiciously 5. Continuous and integral part of teaching- learning process 12/8/2020 16Prepared by Mengistu Debele
  • 17. Unit -2 THE ROLE OBJECTIVE IN EDUCATIONAL MEASUREMENT AND EVALUATION 12/8/2020 17Prepared by Mengistu Debele
  • 18. 2.1. Definition of important terms Dear students! Can you define outcome, aims/goals, objectives, learning outcomes? • According to Gronlund and Linn(1995), outcome is what occurs as a result of an educational experience. • Goals are broad statements of educational intent that give the overall purpose and desired outcomes (Ellington & Shirley, 1997). 12/8/2020 18Prepared by Mengistu Debele
  • 19. CON’D • Objectives are sets of more detailed statement that specify the means by which different aims of the course that relate to the activities that it involves and content that it covers(Cruikshank, 1999). • Learning outcomes are sets of statement even more detailed statements that specify the various things that students will be able to do after successful completion of the learning process. 12/8/2020 19Prepared by Mengistu Debele
  • 20. 2.2. Importance of stating objectives • The properly stated instructional objectives serve as guides for both teaching and testing. The following are functions of objectives: identify the intended students’ outcome, help teacher to plan instruction, provide criteria for evaluate students’ outcomes, help in selecting appropriate instruction, provide a public record of intent, help teacher to guide & monitor students’ learning. 12/8/2020 20Prepared by Mengistu Debele
  • 21. 2.3. Guideline for writing instructional objectives 1. Objectives should be stated in terms of student performance/ behavior 2. Objectives should be stated in terms of the learning outcomes and not in terms of the learning process. 3. Objectives should be sufficiently free from the influence of course content 4. Statement of the objectives should be an amalgamation of subject-matter and desired behavior. 12/8/2020 21Prepared by Mengistu Debele
  • 22. CON’D 5. Avoid the use of more than one type of learning outcomes in each objectives. 6. Begin each specific objectives with an action verb which indicates what the students have to do , or demonstrate 7. Write sufficient number of specific objectives (5- 8)for each general objectives so as to adequately describe the students behavior for achieving the general objective. 12/8/2020 22Prepared by Mengistu Debele
  • 23. 2.4. Methods of stating instructional objectives • According to Gronlund (1981), there are two levels of stating instructional objectives. 1. Stating general objectives as intended learning outcomes 2. Listing a sample of specific objectives corresponding to and representative of each other objectives. 12/8/2020 23Prepared by Mengistu Debele
  • 24. 2.5. Taxonomy of Educational objectives • Taxonomy means 'a set of classification principles', or 'structure’. • Domain means 'category’ or distinct area. The most well known description of learning domains was developed by Benjamin Bloom that known as Bloom’s Taxonomy. 12/8/2020 24Prepared by Mengistu Debele
  • 25. The three domains of Bloom’s taxonomy of educational objectives 12/8/2020 25Prepared by Mengistu Debele
  • 26. Cognitive domain • RATIONAL LEARNING: THINKING  Emphasis upon knowledge, using the mind, and intellectual abilities.  Often referred to as Instructional or behavioral objectives that begin with verbs 12/8/2020 26Prepared by Mengistu Debele
  • 27. CON’D • Behaviors are taught to be cumulative, going from simple to more complex mental behaviors. 12/8/2020 27Prepared by Mengistu Debele
  • 28. CON’D • Bloom and Krathwohl (1956) divided cognitive domain into six distinct levels, each level building on those below and representing progressively higher level of cognition. Knowledge Evaluation Synthesis Analysis Application Comprehension 12/8/2020 28Prepared by Mengistu Debele
  • 29. 1. Knowledge Remembering of previously learned material/information. • Emphasizing facts, information, and specifics. Involves remembering material in form very close to how it was originally presented. • Depends upon memorizing or identifying facts without asking beyond. 12/8/2020 29Prepared by Mengistu Debele
  • 30. 2. Comprehension Defined as the ability to understand the meaning of material. • Understanding and interpreting information. • Grasping the meaning and intent of the material. • Deals with content and involves ability to understand what is being communicated. 12/8/2020 30Prepared by Mengistu Debele
  • 31. 3. Application Applying procedures/systems/rules/ in specific situation. • It is the ability to use learned material in new situations. • Using what is remembered and comprehended. • Applies learning to real life, new, and/or concrete situations. • It is ability to use knowledge and learned material in meaningful ways. 12/8/2020 31Prepared by Mengistu Debele
  • 32. 4. Analysis Reasoning • Breaking a system/material down into its constituent elements or parts so as to see its organizational structures and determining the relationships of these parts to each other and to the whole. • It is ability to break material down into specific parts so that the overall organizational structure may be comprehended. 12/8/2020 32Prepared by Mengistu Debele
  • 33. 5. Synthesis Creating • It is ability to put parts together to form a whole. • Putting together diverse skills, abilities, and knowledge to accomplish a particular new task or form a new whole. • Organizing ideas into new patterns and putting materials together in a structure which was not there before. 12/8/2020 33Prepared by Mengistu Debele
  • 34. 6. Evaluation Evaluating • It is the ability to judge the worth of material for a given purpose. • Making judgment/critical comparisons on the basis of agreed criteria. • Judging the values of ideas, methods, materials, procedures, and solutions by developing and/or using appropriate criteria. 12/8/2020 34Prepared by Mengistu Debele
  • 35. 12/8/2020 35Prepared by Mengistu Debele
  • 36. Affective domain • EMOTIONAL LEARNING: FEELING  Concerned with feeling, attitudes, appreciations, interests, values and adjustments. Bloom and Krathwohl (1964) divided affective domain into five distinct hierarchical levels. 12/8/2020 36Prepared by Mengistu Debele
  • 37. CON’D • The affective domain (Krathwohl, Bloom & Masia, 1973) includes the manner in which we deal things emotionally, such as feelings, values, appreciation, enthusiasm, motives and attitudes. • Since affective domain is concentrated with a student attitudes, personal beliefs, and values, measuring educational objectives in this domain is not easy. 12/8/2020 37Prepared by Mengistu Debele
  • 38. Bloom and Krathwohl (1964)’s levels of affective domain • The five major categories of affective domain listed from simplest to most complex as follow: Receiving-developing an awareness of sth. Responding-showing active interest in sth. Valuing- take up positive attitudinal position. Organization-making adjustment to the value Characterization-integrating one’s attitude into a total all self-embracing philosophy. 12/8/2020 38Prepared by Mengistu Debele
  • 39. Educational objectives and State of mind Educational objectives State of mind • Receiving Willingness to pay attention • Responding React voluntarily or complies • Valuing Acceptance • Organization Rearrangement of value system • Characterization Incorporates values in life 1.Receiving: the lowest level; the student passively pay attention. Without this level no learning can occur. 12/8/2020 39Prepared by Mengistu Debele
  • 40. CON’D • It is the ability of the student to be attentive to particular stimuli. • It includes awareness(giving appropriate opportunity to learner to be merely conscious of something), willingness to receive (tolerate new stimulus, not avoid it), and control or select attention(attention to differentiating stimulus from competing and distracting stimuli) 12/8/2020 40Prepared by Mengistu Debele
  • 41. 2. Responding • The student being an active participant. • Showing some new behaviors beyond mere attending. It implies active attending of something with or about phenomena. 1. Acquiescence in responding: obedience or compliance are word to describe this behavior. 2. Willingness to respond: capacity for voluntary activity. 3. Satisfaction in response: feeling of satisfaction and enjoyment. 12/8/2020 41Prepared by Mengistu Debele
  • 42. 3.Valuing • The worth the student attaches to some entity. • Implies perceiving them as having worth and consequently revaluing, consistency in behaviors related to this phenomena and showing some definite involvement or commitment. • It is a motivated behavior by individual commitment to the underlying value guiding behavior. • It can be described as acceptance of value, preference for a value and commitment. 12/8/2020 42Prepared by Mengistu Debele
  • 43. 4. Organization • Bringing together things into a whole. • Organization is defined as conceptualization of the value and integrating a new value into one’s general set of values relative to other priorities. • Value refers to an individual's life style that has been built on his/her value system and that controls his/her behavior. 1. Conceptualization of value: permit the individual to see how the value relates to the existing value he/she hold 2. Organization of the value system: analyzing the complex objectives of various value system to new value which is harmonious. 12/8/2020 43Prepared by Mengistu Debele
  • 44. 5. Characterization by value • Acting consistently with new value; person is known by the value. It is the level at which a person integrates one’s beliefs, ideas, and attitudes into a total all-embracing life philosophy. 12/8/2020 44Prepared by Mengistu Debele
  • 45. Psychomotor domain • PHYSICAL LEARNING: DOING  Emphasizes speed, accuracy, dexterity, and physical skills. • This domain includes objectives related to muscular or motor skill, manipulation of material and objects, and neuromuscular coordination. • Developed by Harrow (1972), Simpson 1972 12/8/2020 45Prepared by Mengistu Debele
  • 46. Harrow’s levels of Psychomotor domain Reflex movements Basic-fundamental movements  Perceptual abilities Physical abilities Skilled movements and Non-discursive communication. 12/8/2020 46Prepared by Mengistu Debele
  • 47. Dave’s level of psychomotor domain  Imitation: observing and pattering behavior after someone else.  Manipulation: being able to perform certain actions following instructions and practicing.  Precision: refining, becoming more exact , few errors are apparent  Articulation: coordinating a series of actions, achieving harmony, and internal consistency.  Naturalization: having high level performance become natural with out need to think much about it. 12/8/2020 47Prepared by Mengistu Debele
  • 48. Simpson’s hierarchical taxonomy  Perception: process of becoming aware of objects, qualities etc by way of senses. This basic in situation-interpretation-action chain leading to motor activity. Set: readiness for a particular kind of action or experience: may be mental, physical or emotional. Guided response: overt behavioral act under guidance of instructor, or following model, or set of criteria. 12/8/2020 48Prepared by Mengistu Debele
  • 49.  Mechanism: learned response becomes habitual; learner has achieved certain confidence and proficiency or performance.  Complex overt response: performance of motor act considered complex because of movement pattern required. Adaptation: altering motor activities in meet demands of problematic situations. 12/8/2020 49Prepared by Mengistu Debele
  • 50.  Organization: creating new motor acts or ways of manipulating materials out of the skills, attributes, and understandings developed in psychomotor area. 12/8/2020 50Prepared by Mengistu Debele
  • 51. UNIT 3 Planning Classroom Tests • Development of good questions or items must follow a number of principles. • The development of valid, reliable and usable questions involves proper planning. • The validity, reliability and usability of such test depend on the care with which the test are planned and prepared. 12/8/2020 51Prepared by Mengistu Debele
  • 52. 3.1. Some pit falls in teacher-made tests • Most teacher-tests are not appropriate to the different levels of learning outcomes. • Many of the test exercises fail to measure what they are supposed to measure. In other words most of the teacher-made tests are not valid. • Some classroom tests do not cover comprehensively the topics taught(lacking content validity). • Most tests prepared by teacher lack clarify in the wordings. • Most teacher-made tests fail item analysis test. 12/8/2020 52Prepared by Mengistu Debele
  • 53. 3.2. Considerations in Planning Classroom Tests Guide in planning a classroom test • Determine the purpose of the test. • Describe the instructional objectives and content to be measured. • Determine the relative emphasis to be given to each learning outcome. • Select the most appropriate item formats (essay or objective). • Develop the test blue print to guide the test construction. • Prepare test items that are relevant to the learning outcomes specified in the test plan. • Decide on the pattern of scoring and the interpretation of result. • Decide on the length and duration of the test, and • Assemble the items into a test, prepare direction and administer the test. 12/8/2020 53Prepared by Mengistu Debele
  • 54. 3.3. Steps in planning classroom tests • Before constructing a test, test constructors should ask themselves the following questions: • What should I measure? • What knowledge, skills, and attitudes do I want to measure? • Would I test for factual knowledge or the application of this factual knowledge? 12/8/2020 54Prepared by Mengistu Debele
  • 55. The answers to these questions mainly depend on: Instructional objectives Intended learning outcomes The nature of the subject-matter imparted, and The emphasis given to the content. As suggested by Gronlud(1981), the construction of good tests requires adequate and extensive planning so that: instructional objectives, teaching strategies, textual materials, and evaluation procedures are all inter-related in some meaningful fashion. 12/8/2020 55Prepared by Mengistu Debele
  • 56. 3.4. Planning stage of test development The planning stage of test development has the following major steps 1. Determining the purpose of testing 2. Developing test specification(TOS) 3. Selecting appropriate item types 4. Preparing/writing relevant test items. 12/8/2020 56Prepared by Mengistu Debele
  • 57. 3.5. Table of specification /TOS/ • TOS is a test blue print • Table of specification is a two dimensional table that specifies the level of objectives in relation to the content of the course. • A well planned TOS enhances content validity of that test for which it is planned. • The two dimensions (content and objectives) are put together in a table by listing the objectives across the top of the table (horizontally) and the content down the table (vertically) to provide the complete framework for the development of the test items. 12/8/2020 57Prepared by Mengistu Debele
  • 58. 3.6. Item writing Guide for item writing • Follow your TOS when you are writing the test items. • Generate more items than specified in the table of specification. • Use unambiguous language so that the demands of the item would be clearly understood. • Endeavor to generate the items at the appropriate levels of difficulty as specified in the table of specification. • Give enough time to allow an average student to complete the task. • Build in a good scoring guide at the point of writing the test items. • Have the test exercises examined and critiqued by one or more colleagues. • Review the items and select the best according to the laid down table of specification/test blue print. 12/8/2020 58Prepared by Mengistu Debele
  • 59. Unit 4 Construction of Classroom Tests • Based upon the type of item format used, teacher-made classroom tests are divided into two type- objectives versus subjective tests. • Objective tests are tests with definite answers. • One and only one correct answer is available to a given item for objective tests. • Thus, they have high scorers reliability. 12/8/2020 59Prepared by Mengistu Debele
  • 60. 4.1. Writing Objective Test Items • Objective tests are of two types • 1. Selection-typed item (student requires to select answer)- multiple choices, true-false, matching type. • Selection-type items are a fixed-response type. • 2. Supply-typed items (students requires to supply answer) - Completion, short answer Supply-typed items are a free-response type. 12/8/2020 60Prepared by Mengistu Debele
  • 61. CON’D • To obtain the correct answer, students must demonstrate the specific knowledge, understanding, or skill. They are not free to redefine the problem or to organize and present the answer in their own words. • This type of method contributes to scoring that is quick, easy, and accurate. • Negative side: inappropriate for measuring the ability to formulate problems and choose an approach to solving them or the ability to select, organize, and integrate ideas. 12/8/2020 61Prepared by Mengistu Debele
  • 62. Types of Objective Test Objective test items Supply test items Selection test items Short Answers Completion Arrangements True- False Matching Multiple Choices 12/8/2020 62Prepared by Mengistu Debele
  • 63. CON’D • There are five main components to an objective test, such as – Short answer • What is the capital of Ethiopia? – Completion • In the equation 2X+5=9, X=____ – Matching • Color of the sky A) Brown • Color of the dirt B) Blue • Color of the trees C) Green – True-False or Alternative Response • T F An atom is the smallest particle of matter. • Yes No Acid turns litmus paper red. – Multiple Choice • In the equation 2X+5=9, 2X means A)2 plus X B)2 multiplied by X C)2 minus X 12/8/2020 63Prepared by Mengistu Debele
  • 64. 4.1.1.Supply Item Type • Supply or free-response objective test requires the testee to give very brief answers to the questions. These answers may be a word, a short phrase, a number, a symbol or number. • If test items consist direct questions, they require short answers (short-answer type) • If test items consist incomplete statement, they require responses that must be supplied by an examinees (completion type). 12/8/2020 64Prepared by Mengistu Debele
  • 65. Examples 1. What is the largest lake in Ethiopia? 2. Who is the first women athlete in Africa to win a gold medal in Olympics? 3. The largest lake in Ethiopia is ____. 4. The name of the first women athlete in Africa to win a gold medal in Olympics is ____. Question1&2 are direct questions end by question mark, so that they are short answer type. Question 3&4 are incomplete sentences, so that they are completion type. 12/8/2020 65Prepared by Mengistu Debele
  • 66. Uses of supply item types • They are suitable for measuring a wide variety of relatively simple learning outcomes such as recall of memorized information and problem solving outcomes measured in mathematics and sciences. • They used when it is most effective for measuring a specific learning outcome such as computational learning outcomes in mathematics and sciences. 12/8/2020 66Prepared by Mengistu Debele
  • 67. Advantages of supply test items • Reduce problems of guessing: It minimizes guessing because the examinees must supply the answer by either think and recall the information or make the necessary computations to solve the problem presented. • Easy to construct. • Effective in measuring spelling ability of students. • Encourage intensive study. 12/8/2020 67Prepared by Mengistu Debele
  • 68. Weakness of supply items types • Do not measure complex learning outcomes • Scoring is relatively difficulty • Excessive use can encourage rote memory and poor study habit • Limited to questions that can be answered by a word, phrase, symbol or number. • Sometimes it is difficult to phrase the question as incomplete statement. 12/8/2020 68Prepared by Mengistu Debele
  • 69. Suggestions to construct supply item types • There must be only one correct answer • The wording must be clear and specific to avoid ambiguous to response • Avoid too many blank spaces in the same sentence • Do not take statements directly from textbooks • Direct question is generally more desirable than an incomplete statement. • Do not use articles ‘the, a, an’ immediately preceding a response. • The length of blanks should be approximately equal. 12/8/2020 69Prepared by Mengistu Debele
  • 70. Tip to write Completion Items • Avoid statements that are so indefinite which may be answered by several terms. • Be sure that the language used in the question is precise and accurate. • Word the statement such that the blank is near the end of the sentence rather than near the beginning. • If the problem requires a numerical answer, indicate the units in which it is to be expressed. 12/8/2020 70Prepared by Mengistu Debele
  • 71. 4.1.2.Selective item types • True-false/alternative response items • Consists of a declarative statement that students asked to response true or false, right or wrong, correct or incorrect, yes or no, agree or disagree. • Used to measure simple learning outcomes. 12/8/2020 71Prepared by Mengistu Debele
  • 72. Suggestions for Writing True-False items • The desired method of marking true or false should be clearly explained. • Construct statements that are definitely true or definitely false. • Use relatively short statements. • Keep true and false statements at approximately the same length. • Be sure that there are approximately equal number of true and false items. • Avoid using double negative statements. 12/8/2020 72Prepared by Mengistu Debele
  • 73. Avoid the following when write True-false items • verbal clues and complex statements • Broad general statements that are usually not true or false without further qualification. • Terms denoting indefinite degree(e.g.. Large, long time, regularly) • Avoid absolute terms like never, only, always ,etc • Placing items in a systematic order (TTFFTT, TFTFTF,TTFTTFTT, etc.) • Taking statements directly from the text. • Avoid trivial statements. • Don’t use statements partly true partly false. 12/8/2020 73Prepared by Mengistu Debele
  • 74. Writing matching items • Matching has two column. The question or problems column is called premises, the answer column is called responses. • Short homogenous lists of premises written on right hand side under column B whereas responses written on left hand side under column A. • Examinees are required to make some sort of association between each premise and response. 12/8/2020 74Prepared by Mengistu Debele
  • 75. Advantages of matching exercise • Measure knowledge of terms definitions, dates, events that involves simple association. • Well suited for the ‘who, what, when, and where’ types of learning. • Can be scored easily and objectively and amenable to machine scoring. • It is like a "fun game" for teaching young children. • Measure a large amount of related factual materials. • Many questions can be asked in a limited • amount of testing time. 12/8/2020 75Prepared by Mengistu Debele
  • 76. Limitations of matching exercise • Restricted to measure factual materials based on rote learning. • If care is not taken, it can encourage serial memorization rather than association. • Difficult to get homogenous materials • It is time consuming to the students. 12/8/2020 76Prepared by Mengistu Debele
  • 77. Suggestions for Writing Matching items • Keep both the list of descriptions and the list of options fairly short and homogeneous. • Put both descriptions and options on the same page. • Make sure that all the options are plausible distracters. • Arrange the answers in some systematic fashion. • Use longer phrases or statements premises , shorter phrases, words or symbols in responses. • Each description in the list should be numbered and list of options should be identified by letters. • Include more options than descriptions • In the directions, specify clearly the basis for matching and whether the options can be used once or more than once or not at all. 12/8/2020 77Prepared by Mengistu Debele
  • 78. Multiple choice items • The multiple-choice item consists of two parts: (1) the stem which contains the problem; and (2) a list of suggested answers (responses or options). • The incorrect responses are often called decoys, foils or distracters (distractors). • The correct response is called the key. • The stem may be stated as a direct question or an incomplete statement. From the list of responses provided, the student selects the one that is correct (or best). 12/8/2020 78Prepared by Mengistu Debele
  • 79. Advantages of multiple choice items • It is adaptable or versatility to the most learning outcomes. • It has greater test reliability per item than do true-false Items. • It affords excellent content sampling, which generally leads to more content-valid score interpretations. • Less prone to ambiguity than the short-answer item. • Can be scored quickly and accurately by machines, clerks, teacher aides, and even students themselves. 12/8/2020 79Prepared by Mengistu Debele
  • 80. Deficiencies or limitations of multiple-choice items • It is very difficult to construct. It is not easy to get plausible sounding distracters for teachers. • There is a tendency for teachers to write multiple- choice items demanding only factual recall. • test-wise students perform better on multiple-choice items than do non-test-wise students and that multiple-choice tests favor 12/8/2020 80Prepared by Mengistu Debele
  • 81. Suggestions for Writing Multiple Choice Items • The stem of the item should clearly formulate a problem. • Keep the response options as short as possible. • Be sure that distracters are plausible. • Include from three to five options. • It is not necessary to provide additional distracters for an item simply to maintain the same numbers of distracters for each item. • Be sure that there is one and only one correct or clearly best answer. • To increase the difficulty of items, increase the similarity of content among the options. • Use the option “none of the above” sparingly. Don’t use this option when asking for best answer. • Avoid using “all of the above” • The stem and the options must be written on the same page. • Ensure that the correct responses form essentially a random pattern and in each of the possible response positions about the same percentage of the time. 12/8/2020 81Prepared by Mengistu Debele
  • 82. Guide for Preparing the Objective Test • Begin writing items far enough. • Match items to intended outcomes at the proper difficulty level. • The wording of the item should be clear and as explicit as possible. • Avoid setting interrelated items(Be sure that each item is independent of all other items) • Items should be designed to test important and not trivial facts or knowledge. • Write an item to elicit discriminately the extent of examinees possession of only the desired behavior. • Ensure that there is one and only one correct or best answer to each item. • Avoid unintentionally giving away the answer through providing irrelevant clues. • Use language appropriate to the level of the examinees. • Items in an achievement test should be constructed to elicit specific course content and not measure general intelligence. • Have an independent reviewer to see your test items.12/8/2020 82Prepared by Mengistu Debele
  • 83. 4.2.Subjective /Essay/Tests • Essay test items should be used primarily for the measurement of those learning outcomes that can not be measured by objective test items. • Based on the amount of freedom given for • student to organize his/her ideas and write answer, essay questions are subdivided into two major types- extended and restricted response. 12/8/2020 83Prepared by Mengistu Debele
  • 84. Extended-response essay type • In the extended-response type of essay question virtually no bounds are placed on the student as to the points the student will discuss and the type of organization • the student will use. Student has complete freedom of giving response. • E.g. Describe what you think should be included in a • school testing program. Illustrate your answer with specific tests, giving reasons for your test selection. Your essay should be about 300-400 words or more. 12/8/2020 Prepared by Mengistu Debele 84
  • 85. CON’D • Extend-Response – Permit students to decide which facts they think are most pertinent, to select their own method of organization, and write as much as seems necessary for a comprehensive answer. – Tends to reveal the ability to evaluate ideas, to relate them coherently, and to express them succinctly. – They are valuable for measuring complex skills and understanding of concepts and principles, they have three weaknesses: a) Inefficient for measuring knowledge of factual material b) Scoring criteria are not as apparent to the student c) Scoring is difficult and unreliable due to the various responses, include array of factual material, organization, legibility, and conciseness. 12/8/2020 Prepared by Mengistu Debele 85
  • 86. Restricted-response essay type • In the restricted-response essay question, the student is more limited in the form and scope of his answer because he is told specifically the context that his answer is to take. E.g. Write how plants make food. Your answer should be about one-half page long. – These questions minimize some of the weaknesses of extended, for three reasons: • Easier to measure knowledge of factual material. • Scoring more clear to the student. • Reduces the difficulty of the scoring. – The negative side- less effective as a measure of the ability to select, organize, and integrate ideas. – In addition, if the restrictions become too tight, the questions reduce to nothing more than a objective type test. 12/8/2020 Prepared by Mengistu Debele 86
  • 87. Guide for Preparing the Essay Test • Restriction of the use of essay questions to only those learning outcomes that cannot be satisfactorily measured by objective items. • Formulation of questions that call forth the behavior specified in the learning outcomes. Essay questions should be designed to elicit only the skill which the item was intended to measure. • Phrase each question to clearly indicate the examinees task. • Indication of approximate time limit for each question. • Avoidance of the use of optional questions. 12/8/2020 87Prepared by Mengistu Debele
  • 88. 4.3.Performance Assessment • Performance test is other teacher-made test (technique) that try to establish what a person can do. • Permit the student to organize and construct the answer. • Other types of performance may require the student to use equipment, generate hypotheses, make observations, construct a model, or perform to an audience. • For the most performance assessments do not have a single right or best response-there may be a variety of responses. 12/8/2020 88Prepared by Mengistu Debele
  • 89. CON’D • Performance assessment tasks are needed to measure a student’s ability to engage in hands-on activities, such as conducting an experiment, designing and conducting a survey, or essay. • Performance tests generally fall into one of • three categories: • (1) Tests under simulated condition. • (2) Work sample tests. • (3) Recognition tests. 12/8/2020 89Prepared by Mengistu Debele
  • 90. 4.4. Authentic Assessment Authentic assessment is the type of assessment that aimed at evaluating students' abilities in 'real-world' contexts. In authentic assessment students are asked to demonstrate practical skills and concepts they have learned. 12/8/2020 90Prepared by Mengistu Debele
  • 91. CON’D • Authentic assessment does not encourage rote learning and passive test-taking. • it focuses on students' analytical skills, ability to integrate what they learn, creativity, ability to work collaboratively and written and oral expression skills. • It also called performance assessment. 12/8/2020 91Prepared by Mengistu Debele
  • 92. Tools to employ authentic assessment Observation and Observation tools - Checklist: a checklist enables the observer to note-albeit very quickly and very effectively-only whether or not a behavior occurred. It does not permit the observer to rate the quality of, degree to which, or frequency of occurrence of a particular behavior. It is a list of items you need to verify. - Anecdotal records: Anecdotal records are the least structured observational tool. They depict actual behavior in natural situations. short story that describes students’ behaviors and students’ actual performance. - Rating scales: rating scales can be used for single observations or over long periods of time. They are a set of categories designed to elicit information about a quantitative or a qualitative attribute. 12/8/2020 92Prepared by Mengistu Debele
  • 93. A. Running records • Running records is a tool that helps teachers to identify patterns or sequences in student practical work, reading behaviors, laboratory experiment procedure, drawing procedure, and so on. 12/8/2020 93Prepared by Mengistu Debele
  • 94. B. Project Work Assessment • Project work assessment can be conducted in two ways. 1. Process assessment 2. Product assessment 12/8/2020 94Prepared by Mengistu Debele
  • 95. C. Portfolio assessment • Portfolio is a collection of a student's work specifically selected to tell a particular story about the student’s work accomplishments. • Portfolio is the systematic collection of student work measured against predetermined scoring criteria. 12/8/2020 95Prepared by Mengistu Debele
  • 96. D. Self-Assessment • Self-assessment is the process of looking at oneself in order to assess aspects that are important to one's identity. It is one of the motives that drive self-evaluation, along with self-verification and self-enhancement. 12/8/2020 96Prepared by Mengistu Debele
  • 97. E. Reflection • Reflection is an active process of witnessing one’s own experience in order to take a closer look at it, sometimes to direct attention to it briefly, but often to explore it in greater depth. 12/8/2020 97Prepared by Mengistu Debele
  • 98. F. Rubric • Rubric is a scoring scale used to assess student performance along a task-specific set of criteria. • A rubric is comprised of two components which are criteria and levels of performance. Each rubric has at least two criteria and at least two levels of performance. • Rubric is criteria prepared for scoring 12/8/2020 98Prepared by Mengistu Debele
  • 99. Types of rubric 1. Analytic rubric: it articulates levels of performance for each criterion so the teacher can assess student performance on each criterion. 2. Holistic rubric: it does not list separate levels of performance for each criterion. Instead, a holistic rubric assigns a level of performance by assessing performance across multiple criteria as a whole. 12/8/2020 99Prepared by Mengistu Debele
  • 100. Unit 5Assembling, Administering and Scoring Classroom Tests 5.1. Assembling the Test items It is the process of arranging items on the test so that they are easy to read. • Plan the layout of the test in such a way as to be convenient for recording answers and also scoring of the test items on separate answer sheets. • Group items of the same format and arrange in logical order (true-false, matching items, completion/short answer items, multiple choice, essay) together with their relevant directions on what need to be done by the testees. • Group items dealing with the same content together within item types. 12/8/2020 Prepared by Mengistu Debele 100
  • 101. CON’D • Arrange the test items in progressive order of difficulty starting from simple to complex questions. • Ensure that one item does not provide clues to the answer of another item or items in the same or another section of the test. • Ensure that the correct responses from essentially a random pattern and in each of the possible response positions about the same percentage of the time for multiple choice items. 12/8/2020 Prepared by Mengistu Debele 101
  • 102. 5.2. Administrations of classroom tests Test administration refers to the procedure of actually presenting the learning task that the examinees are required to perform in order to ascertain the degree of learning that has taken place during the teaching-learning process. - validity and reliability of test scores can be greatly reduced when test is poorly administered. 12/8/2020 102Prepared by Mengistu Debele
  • 103. Consider the following during test administration • Ensure that the physical and psychological environment are conducive to testees. • Avoid malpractices both from testees and invigilator. • Avoid unnecessary threat from test administrators. • Stick to the instructions regarding the conduct of the test and avoid giving hits to testees who ask about particular items. But make corrections or clarifications to the testees whenever necessary. • Keep interruptions during the test to a minimum. 12/8/2020 103Prepared by Mengistu Debele
  • 104. Ensuring quality in test administration • Collection of the question papers in time from custodian to be able to start the test at the appropriate time stipulated • Ensure compliance with the stipulated sitting arrangements in the test to prevent collision between or among the testees. • Ensure orderly and proper distribution of questions papers to the testees. • Do not talk unnecessarily before the test. Testees’ time should not be wasted at the beginning of the test with unnecessary remarks, instructions or threat that may develop test anxiety. • It is necessary to remind the testees of the need to avoid malpractices before they start and make it clear that cheating will be penalized. 12/8/2020 104Prepared by Mengistu Debele
  • 105. Credibility and Civility in Test Administration • Credibility deals with the value the eventual recipients and users of the results of assessment place on the result with respect to the grades obtained, certificates issued or the issuing institution. • Civility deals whether the persons being assessed are in such conditions as to give their best without hindrances and encumbrances in the attributes being assessed and whether the exercise is seen as integral to or as external to the learning process. 12/8/2020 105Prepared by Mengistu Debele
  • 106. Points need consideration in test administration 1. Instruction: both for test administrators and testees 2. Duration of the test 3. Venue and sitting arrangement 4. Other necessary conditions 12/8/2020 106Prepared by Mengistu Debele
  • 107. 5.3. Scoring the Test • Marking schemes (key) should be prepared alongside the construction of the test items in order to score the test objectively. Scoring Essay Test In the essay test the examiner is an active part of the measurement instrument. Therefore, the viabilities within and between examiners affect the resulting score of examinee. 12/8/2020 107Prepared by Mengistu Debele
  • 108. Methods of scoring essay questions There are two common methods of scoring essay questions. These are: 1. The Point or Analytic Method: in this method each answer is compared with already prepared ideal marking scheme (scoring key) and marks are assigned according to the adequacy of the answer. 2. The Global/Holistic of Rating Method: In this method the examiner first sorts the response into categories of varying quality based on his general or global impression on reading the response. The standard of quality helps to establish a relative scale, which forms the basis for ranking responses from those with the poorest quality response to those that have the highest quality response. Usually between five and ten categories are used with the rating method with each of the piles representing the degree of quality and determines the credit to be assigned. 12/8/2020 108Prepared by Mengistu Debele
  • 109. Scoring Objective Test • Objective test can be scored by various methods with ease. 1. Manual Scoring: in this method answers to tests are scored by direct comparison of the examinees answer with the marking key. 2. Stencil Scoring: in this method answers to tests are scored by laying the stencil over each answer sheet and the number of answer checks appearing through the holes is counted. 3. Machine Scoring: in this method answers to tests are scored by machine with computers and other possible scoring devices using certified answer key prepared for the test items. 12/8/2020 109Prepared by Mengistu Debele
  • 110. Unit 6:Interpreting, Describing, and Analyzing Test Scores 6.1.Methods of interpreting classroom test scores Norm-reference interpretation: norm-referenced test score interpretation tells us how an individual student’s score compared with other students’ scores in the group who have take the same test. 12/8/2020 110Prepared by Mengistu Debele
  • 111. 6.2 Judging The Quality of a Classroom Test Item Analysis Item analysis is the process of “testing the item” to ascertain specifically whether the item is functioning properly in measuring what the entire test is measuring. Purpose and Uses of Item Analysis - To discriminate between high and low achievers in a norm-referenced test(discrimination power). - To determine items having desirable qualities of a measuring instrument(difficulty level). 12/8/2020 Prepared by Mengistu Debele 111
  • 112. The Process of Item Analysis for Norm – Referenced Classroom Test • In Norm-referenced test, special emphasis is placed on item difficulty and item discriminating power. • The process of item analysis begins after the test has been administered (or trial tested), scored and recorded. • The process of Item Analysis is carried out by using two contracting test groups composed from the upper and lower 25% or 27% or 33% of the testees on which the items are administered. 12/8/2020 112Prepared by Mengistu Debele
  • 113. Computing Item Difficulty • Item difficulty index is denoted by P. • The difficulty index (P) for each of the items is obtained by using the formula: P = Number of testees who got item right (R) X100 Total number of testees responding to item (T) Thus, P = R X 100 T The item difficult indicates the percentage of testees who got the item right in the two groups used for the analysis. That is 0.7 x 100% = 70%. 12/8/2020 113Prepared by Mengistu Debele
  • 114. Where c* is the correct answer 1) calculate item difficulty index(p) 2) find good plausible destructor 3) calculate item discrimination power Alternatives Upper Group A B C* D E Omits 0 0 15 0 0 0 Lower Group 4 2 8 1 0 0 12/8/2020 114Prepared by Mengistu Debele
  • 115. Solution P = Upper right +Lower right X 100 Total students in analysis P = 15+8 X 100 30 = 23/30 (100) = 0.766(100) = 76.66% Alternative A is good plausible distracter because it attracts more students from lower group than the upper group students. 12/8/2020 115Prepared by Mengistu Debele
  • 116. Compute the item-discrimination index Item discrimination power is denoted by D. D = RU-RL ½ T Where RU is right from upper group RL is right from lower group T is total students from upper and lower groups D = 15-8 ½(30) =7/15 = 0.47 12/8/2020 116Prepared by Mengistu Debele
  • 117. Interpretation • Item discrimination values range from – 1∙00 to + 1∙00. • The higher the discriminating index, the better is an item in differentiating between high and low achievers. Item discriminating power is a: • Positive value when a larger proportion of those in the high scoring group get the item right compared to those in the low scoring group. If D has a positive value, the item has positive discrimination. • Negative value when more testees in the lower group than in the upper group get the item right. If D has a negative value, the item has negative discrimination. • Zero(0) value when an equal number of testees in both groups get the item right; and • One (1) when all testees in the upper group get the item right and all the testees in the lower group get the item wrong. 12/8/2020 117Prepared by Mengistu Debele
  • 118. Compute the Sensitivity to Instructional Effects Sensitivity to instructional effects is denoted by S. S requires two test administrations, i.e. pretest and posttest S = RA-RB T • Where RA = number of pupils who got the item right after instruction RB = number of pupils who answered the item correctly before instruction T= total number of pupils who attempted the item both times 12/8/2020 118Prepared by Mengistu Debele
  • 119. Following is an example of the behavior of four items before (B, pretest) and after (A, posttest) instruction: B = Pretest; A = Posttest, + means item correctly answered. - means item incorrectly answered. 1 2 3 4 B A B A B A B A Abebe - + - - + - - + Balcha - + - - + - - + Chaltu - + - - + - + + Dawit - + - - + - + + Eliyas - + - - + - - - 12/8/2020 119Prepared by Mengistu Debele
  • 120. Analysis from the above table Item 1. This is what would be expected in an "ideal" criterion referenced test(CRT). No one answered the item correctly before instruction; everybody after. This suggests that the instruction was effective. Item 2. We would not want these results in a CRT. No pupil answered the item correctly either before or after instruction. Item 3. We would not want these results in any test-be it CRT or NRT. Why should pupils who answered correctly before instruction answer incorrectly after instruction? Item 4. This is how we would expect good test items to behave. We would expect some pupils to know the correct answer before instruction but a larger number to answer the item correctly after "effective" instruction. 12/8/2020 120Prepared by Mengistu Debele
  • 121. Calculate the sensitivity to instructional effects for item 1, 2 , 3 and 4 in above example Solution S = RA-RB T 1. S = 5-0 = 1.00 5 2. S = 0-0 = 0 .00 5 3. S = 0-5 = -1 .00 5 4. S = 4-2 = 0.4 5 12/8/2020 121Prepared by Mengistu Debele
  • 122. Unit 7:Describing Educational Data • 7.1. Units of measurement • Data differ in terms of what properties of the real number series (order, distance, or origin) we can attribute to the scores. The most refined-classification of scores is one suggested by Stevens (1946), who classified measurement scales as nominal, ordinal, interval, and ratio scales. 12/8/2020 Prepared by Mengistu Debele 122
  • 123. Nominal scale • A nominal scale is the simplest scale of measurement. Nominal scale is about naming. E.g. Room 1, Room 2. • It involves the assignment of different numerals to categories that are qualitatively different. • For example, for purposes of storing data on computer cards, we might use the symbol 0 to represent a female and the symbol 1 to represent a male. 12/8/2020 Prepared by Mengistu Debele 123
  • 124. CON’D • If measurement is defined as "the assignment of numerals to objects or events according to rules", then nominal data indicate measurement. If, on the other hand, measurement implies a quantitative difference, then nominal data do not indicate measurement. 12/8/2020 Prepared by Mengistu Debele 124
  • 125. Ordinal scale • An ordinal scale has the order property of a real number series and gives an indication of rank order. Thus, magnitude is indicated, if only in a very gross fashion. Rankings in a music contest or in an athletic event would be examples of ordinal data. Ordinal scale is about ranking in order. E.g. Grade 1, Grade 2, Grade 3. • It help us to know who is best, second best, third • best, and so on to select the top pupils for some • task. 12/8/2020 Prepared by Mengistu Debele 125
  • 126. Interval scale • With interval data we can interpret the distances between scores. It provides information with regard to the differences between the scores. Interval scale is about difference and equal distance b/n attributes under measurement, There is no absolute zero in interval scale. E.g. Fahrenheit/Celsius temperature scale. 12/8/2020 Prepared by Mengistu Debele 126
  • 127. Ratio scale • If one measures with a ratio scale, the ratio of the scores has meaning. Thus, a person who is 86" is twice as tall as a person who is 43". We can make this statement because a measurement of 0(zero) actually indicates no height. That is, there is a meaningful zero point. Thus, there is absolute or true zero ratio scale. 12/8/2020 Prepared by Mengistu Debele 127
  • 128. 7.2. Measures of central tendency • Measures of central tendency give some idea of the average or typical score in the distribution. • Central tendency describes the center location of distribution. There are three commonly used measures of central tendency. These are the mode, the median and the mean. 12/8/2020 Prepared by Mengistu Debele 128
  • 129. The mode • The mode is defined as a score value which is obtained most often or it is the most frequently occurring value. A set of score may has one, two more than two or no mode (mono modal, bimodal, multi modal or no modal).E.g. in set of 1,2,3,4,5 there is no mode. in set of data 6,7,7,7,8,8,9(the mode is 7, mono-modal),in set of data 4, 5,5, 6,6, 7,8(the modes are 5 and 6 bimodal), in set of 4,4,5,5,6,6,7,8(the modes are 4, 5,and 6 multimodal). 12/8/2020 Prepared by Mengistu Debele 129
  • 130. The median • The median (Mdn) is the point below which 50 percent of the scores lie. An approximation to that point is obtained from ordered data by simply finding the score in the middle of the distribution. • It is the 50th percentile of the distribution that having half of the case above it and half of the case below it in distribution. 12/8/2020 Prepared by Mengistu Debele 130
  • 131. CON’D • When the number of ordered set of score is odd the median is the middle score. • Mdn =N+1th case. • E.g. 3,4,4,7,7,7,8,8,9 N =9, (9+1)/2 = 5th case. The 5th case is the median. i.e. 7. If the number of ordered set of score is even, middle score is the average of Nth +1 case. e.g. 3,4,4,7,7,7,8,8,9,10. N=10, 10/2=5th case. the 5th case is 7. the next 1 case is also 7. Thus, Mdn = (7+7)/2 =7 12/8/2020 Prepared by Mengistu Debele 131
  • 132. The mean (x) • The mean (x) is the arithmetic average of a set of scores. It is calculated by adding all the scores in the distribution and dividing by the total number of scores (N). The formula is x = Σx N where x= raw score,Σ=sum of x and N= the number of scores. 12/8/2020 Prepared by Mengistu Debele 132
  • 133. 7.3. Measures of variability Measures of variability used to know how the scores are spread or dispersed. the common measures of variability are range (R),, variance(σ2 = population variance, S2 = sample variance ), standard deviation (S), and percentile. • The range(R) of a set of measurements is defined to be the difference between the largest and the smallest measurements of the set. • Occasionally the range (high score-low score + 1) is used. 12/8/2020 Prepared by Mengistu Debele 133
  • 134. The Variance • Variance indicates the degree of spread by comparing every score to the mean of the distribution. The formula is • S2 = (x-x) N E.g. find variance for set of data 9,8,8,7,7,7,4,4,3,3. Solution: N=10, mean= 6, sum of squared difference is 46. hence S2 = 46/10 = 4.6 12/8/2020 Prepared by Mengistu Debele 134
  • 135. Standard deviation • The standard deviation involves the square root of the variance. The formula is S = S2 12/8/2020 Prepared by Mengistu Debele 135
  • 136. Unit 8 Characteristics of good tests • 8.1. Reliability • Reliability can be defined as the degree of consistency between two measures of the same tests. • The classical theory of reliability can best be explained by starting with observed scores (Xs). • observed score as being made up of a "true • score" (T) and an "error score" (E) such that • X = T + E where X = observed score • T = true score • E = error score 12/8/2020 Prepared by Mengistu Debele 136
  • 137. • A good test measures what it intends to measure. • The most important criteria for evaluating tests are validity and reliability. Measurement tools can be judged on variety of merits. These include practical issues as well as technical issues. All instruments have strengths and weakness. No instrument is perfect for every task. 12/8/2020 Prepared by Mengistu Debele 137
  • 138. • Some of the practical issues that need to be considered include: cost, availability, training required, ease of administration, scoring, analysis, time and effort required for respondent to complete measure. • Along with the practical issues , measurement tools may be judged on the following characteristics: 12/8/2020 Prepared by Mengistu Debele 138
  • 139. • Validity: a test is considers as valid when it measures what is supposed to measure. • Reliability: a test is reliable if it is taken again by the same students under the same circumstances and the score is almost the constant, taking into consideration that the time between the test and retest is reasonable length. 12/8/2020 Prepared by Mengistu Debele 139
  • 140. • Objectivity: objectivity means that if the test is marked by different persons, the score is will be the same. In other words, marking process should not be affected by the marker’s personality. • Comprehensiveness: A good test should include items from different areas of material assigned for the test. 12/8/2020 Prepared by Mengistu Debele 140
  • 141. • Simplicity: simplicity means that the test should be written in a clear, correct and simple language, it is important to keep the method of testing as simple as possible while still testing the skill you intend to test. • Scorability: scorablity means that each item in the test has its own mark related to the distribution of marks given 12/8/2020 Prepared by Mengistu Debele 141
  • 142. Approaches to estimate reliability • The common approaches to estimate reliability are: 1. Measures of stability 2. Measures of equivalence 3. Measures of equivalence and stability 4. Measures of internal consistency • a. Split-half • b. Kuder-Richardson estimates 5. Scorer (judge) reliability 12/8/2020 Prepared by Mengistu Debele 142
  • 143. Measure of stability • A measure of stability, often called a test-retest estimate of reliability, is obtained by administering a test to a group of persons, re- administering the same test to the same group at a later date, and correlating the two sets of scores. 12/8/2020 Prepared by Mengistu Debele 143
  • 144. Measures of Equivalence • Measures of equivalence estimate reliability based on equivalent-forms that obtained by giving two forms (with equal content, means, and variances) of a test to the same group on the same day and correlating these results. 12/8/2020 Prepared by Mengistu Debele 144
  • 145. Measures of Equivalence and Stability • A coefficient of equivalence and stability could be obtained by giving one form of the test and, after some time, administering the other form and correlating the results. 12/8/2020 Prepared by Mengistu Debele 145
  • 146. Measures of Internal Consistency • The split-half method of estimating reliability is theoretically the same as the equivalent-forms method. • The appropriate formula is a special case of the Spearman-Brown prophecy formula. • rxx = 2r(1/2)(1/2) 1+r(1/2)(1/2)  rxx = estimated reliability of the whole test  r (1/2) (1/2) = reliability of the half-test 12/8/2020 Prepared by Mengistu Debele 146
  • 147. Kuder-Richardson Estimates • If items are scored dichotomously (right or wrong), one way to avoid the problems of how to split the test is to use one of the Kuder-Richardson formulas. The formulas may be considered as representative of the average correlation obtained from all possible split-half reliability estimates. K-R 20 and K-R 21 are two formulas used extensively. K-R20: rxx = n (1-Σpq) n-1 S2 x K-R21:rxx = n (1-x(n-x)) n-1 nS2 x 12/8/2020 Prepared by Mengistu Debele 147
  • 148. where n = number of items in test p = proportion of people who answered item correctly q = proportion of people who answered item incorrectly (q = 1 - p; if P = .20, q = .80). pq = variance of a single item scored dichotomously (right or wrong) Σ = summation sign indicating that pq is summed over all items S2 = variance of the total test X = mean of the total test 12/8/2020 Prepared by Mengistu Debele 148
  • 149. Scorer (Judge) Reliability • If a sample of papers has been scored independently by two different readers, the traditional Pearson product moment correlation coefficient (r) can be used to estimate the reliability of a single reader's scores. 12/8/2020 Prepared by Mengistu Debele 149
  • 150. Factors Influencing Reliability • Test length: the longer tests give more reliable scores. • Speed: A test is considered a pure speed test if everyone who reaches an item gets it right but no one has time to finish all the items. Thus, score differences depend upon the number of items attempted. The opposite of a speed test is a power test. • Group homogeneity: the more heterogeneous the group, the higher the reliability. • Objectivity: the more subjectively a measure is scored, the lower the reliability of the measure. 12/8/2020 Prepared by Mengistu Debele 150
  • 151. 8.2 validity • What is validity? • Validity is the extent to which a test measures what it claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted. • Validity can be best defined as the extent to which certain inferences can be made accurately from and certain actions should be based on-test scores or other measurement. 12/8/2020 Prepared by Mengistu Debele 151
  • 152. • Validity(meaningfulness)is the degree to which an evaluation (test) actually serves the purpose for which it is intended. • Three important things about validity are: • 1 validity refers to the appropriate use of the scores but not the test itself. This means it refers to the interpretation of test scores rather than test itself 12/8/2020 Prepared by Mengistu Debele 152
  • 153. • 2. validity is a matter of degree. That means, tests are not absolutely valid or invalid • 3. validity is specific to a particular use. No test serves all purposes equally well. There are three main classes of validity. these are content validity, criterion-related validity, and construct validity. 12/8/2020 Prepared by Mengistu Debele 153
  • 154. Content validity • Content validity is defined as the extent to which the instrument measures what it purports to measure. Content validity pertains to the degree to which the instrument fully assesses or measures the contents of interest. When a test has content validity, the items on the test represent the entire range of possible items the test should covers. 12/8/2020 Prepared by Mengistu Debele 154
  • 155. CON’D • Content validity deals with adequate sampling of items. When evaluation instrument adequately samples certain types of situations or subject matter, it is said to have content validity. if item sampling is not representative to adequate degree, the test lacks content validity. • Face validity: is a form of content validity that refers to the measuring instrument appears/seems to measure what it is intended to measure. 12/8/2020 Prepared by Mengistu Debele 155
  • 156. Criterion-related validity • A test is said to have criterion-related validity when the test has demonstrated its effectiveness in predicting criterion or indicators of a construct. There are two types of criterion-related validity: • 1 concurrent validity • 2. predictive validity 12/8/2020 Prepared by Mengistu Debele 156
  • 157. Concurrent validity • Concurrent validity occurs when the criterion measures are obtained at the same time as the test scores. This indicates the extent to which the test scores accurately estimate an individual’s current state with regards to the criterion. It is a comparison of scores on some instrument with current scores on another instrument. 12/8/2020 Prepared by Mengistu Debele 157
  • 158. Predictive validity • Predictive validity occurs when the criterion measures are obtained at a time after the test. examples of test with predictive validity are career or aptitude tests which are helpful in determining who is likely to succeed of fail in certain subjects or occupations. predictive validity is a comparison of scores on some instruments with some future behavior or future scores on another instrument. 12/8/2020 Prepared by Mengistu Debele 158
  • 159. Construct validity • Construct validity is the degree to which an instrument measures the trait or theoretical constructs that it is intended to measure. A construct is human characteristics. A test is said to be valid in terms of construct when it possesses psychological concepts. A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. 12/8/2020 Prepared by Mengistu Debele 159