2. What is Assessment of Learning?
• I focuses on the development and utilization of
assessment tools to improve the reaching-learning
process.
• It emphasizes on the use of testing measuring knowledge,
comprehension and other thinking skills.
• It allows the students to go through the standard steps in
constitution for quality assessment.
• Students will experience how to develop rubrics for
performance-based and portfolio assessment.
3. MEASUREMENT
•Refers to the quantitative aspect of
evaluation. It involves outcomes that can
be quantified statically. It can also be
defined as the process in determining and
differentiating the information about the
attributes or characteristics of things.
4. EVALUATION
•Is the qualitative aspect of determining
outcomes of learning. It involves value
judgment. Evaluation is more comprehensive
then measurements. In fact, measurement is
one aspect of evaluation.
9. According to the nature of test:
• Personality test
• Intelligence test
• Aptitude test
• Achievement or summative test
• Sociometric test
• Diagnostic or formative test
• Trade of vocational test
12. Diagnostic Tests
•Are used to measure a student’s strengths and
weaknesses, usually to identify deficiencies in
skills or performance.
13. Formative and Summative Tests
• Are terms often used with evaluation, but they may
also be used with testing. Formative testing is done
to monitor students’ attainment of the instructional
objectives. Formative testing occurs over a period
of time and monitors students progress.
Summative testing is done at the conclusions of
instruction and measures the extent to which
students have attained the desired outcomes.
14. Standardized Tests
•Are already valid, reliable and objective.
Standardized tests are tests for which contents
have been selected and for which norms or
standards have been established.
Psychological tests and government national
examinations are examples of standardized
tests.
15. Standards or Norms
•Are goals to be achieved expressed in terms of
the average performance of the population
tested.
16. Criterion-referenced measure
•Is a measuring device with a predetermined
level of success or standard on the part of the
test-takes. For example, a level of 75 percent
score in all the test items could be considered
a satisfactory performance.
17. Norm-referenced measure
•Is a test that is scored on the basis of the
norm or standard level of accomplishment by
the whole group taking the test. The grades of
the students are based on the normal curve of
distribution.
22. Nominal Measurement
•Merely classify objects or events by assigning
numbers to them.
For example, one could nominally designate
baseball positions by assigning the pitcher the
number 1, 2; the first baseman and so on.
23. Ordinal Measurement
•Ordinal scaled classify but they also assign
rank order. Ranking individuals according to
their test scores is an example of ordinal
measurement.
24. Interval Measurement
•In order to be able to add and subtract scores,
we sue interval scales or sometimes called
equal interval or equal unit measurement.
This contains the nominal and ordinal
properties and is also characterized by equal
units between score points.
25. Ratio Measurement
•It includes all the preceding properties, but in
ratio scale, the zero point is not arbitrary; a
score of zero included the absence of what is
being measured.
26. Norm-referenced and Criterion referenced
Measurement
• When we contrast norm-referenced measurement (or
testing) with criterion-referenced measurement, we are
basically referring to two different ways of interpreting
information. However, Popham (1988, page 135) points
out that certain characteristics tend to go with each type
of measurement, and it is likely that results of norm-
referenced tests are interpreted in criterion-referenced
ways and vice versa.
27. Norm-referenced Interpretation
•An individual score is interpreted by
comparing it to the scores of a defined groups,
often called the normative group. Norms
represent the scored earned by one or more
groups of students who have taken the test.
28. Achievement Test as an Example
• Most standardized achievement tests, especially those covering several skills
and academic areas, are primarily designed for norm-referenced
interpretations. However, the form of results and the interpretations of these
tests are somewhat complex and require concepts not yet introduced in this
text. Scores on teacher-constructed tests are often given norm-referenced
interpretations, Grading on the curve, for example, is a norm-referenced
interpretation of test scores on some type of performance measure. Specified
percentages of scores are assigned the different grades, and : an individual's
score is positioned in the distribution of scores. (We mention this only as an
example; we do not endorse this procedure.)
29. Criterion-referenced Interpretation
• The concepts of criterion-referenced testing have
developed with a dual meaning for criterion-referenced.
On one hand, it means referencing an individual’s
performance to some criterion that is a defined
performance level. The individual's score is interpreted in
absolute rather than relative terms. The criterion, in this
situation, means some level of specified performance that
has been determined independently of how others might
perform.
30. Distinctions between Norms-referenced and
Criterion-referenced Tests
• Although interpretations, not characteristics, provide the
distinction between norm-referenced and criterion-referenced
tests, the two types do tend to differ in some ways. Norm-
referenced tests are usually more general and comprehensive and
cover a large domain of content and learning tasks. They are used
for survey testing, although this is not their exclusive use.
31. • Criterion-referenced tests focus on a specific group of learner
behaviors. To show the contrast, consider an example. Arithmetic
skills represent a general and broad category of student outcomes
and would likely be measured by a norm-referenced test. On the
other band, behaviors such as solving addition problems with two
five-digit numbers or determining the multiplication products of
three-and four digit numbers are much more specific and may be
measured by criterion-referenced tests.
32. • A criterion-referenced test tends to focus more on sub
skills than on broad skills. Thus, criterion-referenced tests
tend to be shorter. If mastery learning is involved,
criterion-referenced measurement would be used.
• Norm-referenced test scores are transformed to positions
within the normative group. Criterion-referenced test
scores are usually given in the percentage of correct
answers or another indicator of mastery or the tack
thereof. Criterion-referenced tests tend to lend
33. STAGES IN TEST CONSTRUCTION
• I. Planning the Test
• A. Determining the Objectives
• B. Preparing the Table of Specifications i
• C. Selecting the Appropriate Item Format
• D. Writing the Test Items
• E. Editing the Test Items
34. •II . Trying Out the Test
A. Administering the First Tryout - then Item
Analysis
B. Administering the Second Tryout - then
Item Analysis ‘.
C. Preparing the Final Form of the Test
Ill. Establishing Test Validity .
IV. Establishing the Test Reliability ]
V. Interpreting the Test Score
35. MAJOR CONSIDERATIONS IN TEST
CONSTRUCTION
Type of Test
Our usual idea of testing is an in-class test that is administered by
the teacher.
However, there are many variations on this theme: group tests,
individual tests, written tests, oral tests, speed tests, power tests,
pretests and post tests. Each of these has different characteristics
that must be considered when the tests are planned.
36. • Test Length
A major decision in the test planning is how many items
should be included on the test. There should be enough to
cover the content adequately, but the length of the class
period or the attention span or fatigue limits of the
students usually restrict the test length. Decisions about
test length are usually based on practical constraints more
than on theoretical considerations.
37. • Item Formats
Determining what kind of items to include on the test is a major
decision. Should they be objectively scored formats such as
multiple choice or matching type? Should they causes the students
to organize their own thoughts through short answer or essay
formats? These are important questions that can be answered only
by the teacher in terms of the local context, his or her students, his
or her classroom, and the specific purpose of the test. Once the
planning decisions are made, the item writing begins. This tank 1s
often the most feared By the: beginning test constructors,
However, the procedures are more common sense than formal
rules.
38. POINTS TO BE CONSIDERED IN PREPARING A TEST
1. Are the instructional objectives clearly defined?
2. What knowledge, skills and attitudes do you want to measure?
3. Did you prepare a table of specifications?
4. Did you formulate well defined and clear test items?
5. Did you employ correct English in writing the items?
6. Did you avoid giving clues to the correct answer?
7. Did you test the important ideas rather than the trivial?
8. Did you adapt the test's difficulty to your student's ability?
9. Did you avoid using textbook jargons?
10. Did you cast the items in positive form?
11. Did you prepare a scoring key?
12. Does each item have a single correct answer? .
13. Did you review your items?
39. General Principles in Construction Different Types of
Tests
1. The test items should be selected very carefully. Only important facts should
be included.
2. The test should have extensive sampling of items
3. The test items should be carefully expressed in simple, clear, definite, and
meaningful sentences.
4. There should be only one possible correct response for each test item.
5. Each item should be independent.
6. Lifting sentences from books should not be done to encourage thinking and
understanding
7. The first personal pronouns I and we should not be used.
8. Various types of test items should be made to avoid monotomy.
9. Majority of the test items should be of moderate difficulty.
40. 10. The test items should be arranged in an ascending order of
difficulty.
11. Clear, concise and complete directions should precede all types
of test.
12. Items which can be answered by previous experience alone
without knowledge of the subject matter should not be included.
13. Catchy words should not be used in the test items.
14. Test items must be based upon the objectives of the course and
upon the course content.
15. The test should measure the degree of achievement or determine
the difficulties of the learners.
16. The test should emphasize ability to apply and use facts as well
as knowledge of facts.
41. 17. The test should be of such length that it can be completed within
the time allotted by all or nearly all of the pupils.
18. Rules governing good language expression, grammar, spelling,
punctuation, and capitalization should be observed in all items.
19. Information on how scoring will be done should be provided.
20. Scoring keys in correcting and scoring tests should be provided.
42. POINTERS TO BE OBSERVED IN CONSTRUCTING AND SCORING THE
DIFFERENT TYPES OF TESTS
A. RECALL TYPES
1.Simple recatt type
a. This type consists of questions calling for a single word of
expression as an
answer.
b. Items usually begin with who, where, when, and what,
c. Score ts the number of correct answers.
43. 2. Completion type
a. Only important words or phrases should be omitted to avoid
confusion.
b. Blanks should be of equal lengths.
c. The blank, as much as possible, is placed near or at the end of the
sentence.
d. Articles a, an, and the should not be provided before the omitted
word or phrase to
avoid clues for answers.
e. Score is the number of correct answers.
44. 3. Enumeration type
a. The exact number of expected answers should be stated.
b. Blanks should be of equal lengths,
c. Score is the number of correct answers
4. Identification type
a. The items should make an examinee think of a word, number, or
group of words
that would complete the statement or answer the problem.
b. Score is the number of correct answers.
45. B. RECOGNITION TYPES
1. True-false or alternate-response type
a. Declarative sentences should be used.
b. The number of “true” and “false” items should be more or less equal
c. The truth or falsity of the sentence should not be too evident.
d. Negative statements should be avoided.
e. The "modified true-false" is more preferable than the “plain true-false™.
f. In arranging the items, avoid the regular recurrence of “true” and “false”
Statements.
g. Avoid using specific determiners like all, always, never, none, nothing, most,
often, some, etc. and avoid weak statements as may, sometimes, as a rule, in
general ctc
h. Minimize the use of qualitative terms like: few, great, many, more, ete.
46. i. Avoid leading clues to answers to all stems.
j. Score is the number of correct answers in “modified true-false and
right answers
minus wrong answers in “plain true-false”
47. 2. Yes-No type
a. The items should be in interrogative sentences.
b. The same rules as in “true-false” are applied
48. 3. Multiple-response type
a. There should be three to five choices. The number of choices used
in the first item
should be the same number of choices in all the items of this type of
test.
b. The choices should be numbered or lettered so that only the
number or letter can be written on the blank provided.
c. If the choices are figures, they should be arranged in ascending
order.
d. Avoid the use of “a” or “an” as the last word prior to the fisting of
the responses.
49. e. Random occurrence of responses should be employed
f. The choices, as much as possible, should be at the end of the
statements.
g. The choices should be related in some way or should belong to
the same class.
h. Avoid the use of “none of these” as one of the choices,
i. Score is the number of correct answers.
50. 4. Best answer type
a. There should be three to five choices all of which are right but
vary in their degree
of merit, importance or desirability
b. The other rules for multiple-response items are applied here.
c. Score is the number of correct answers.
51. 5. Matching type
a. There should be two columns, Under “A” are the stimuli which
should: be longer and = more descriptive than the responses under
column “p" The’ response may be a word, a Phrase, g number, or a
formula, . .
b. The stimuli under column “At should be numbered and the
responses under column “B should be lettered, Answers will be
indicated by letters only on lines provided in column “A”,
c. The number of Pairs Usually should Not exceed twenty items.
Less than ten I. introduces chance elements Twenty pairs may be
used but more than Twenty is decidedly wasteful of time
52. d. The number of responses in column “B" should be {wo or more
than the number ii of items in Column “A” to avoid guessing. a
e. Only one correct matching for each item should be Possible. -o
f. Matching sets should Neither be too long nor too short. 8
g. All items should be on the Same page to avoid turning of pages in
the process of .matching pairs 4
h. Score is the number of correct answers.
53. C. ESSAY TYPE EXAMINATIONS
Common types of essay questions. (The types are related to
purposes of which the essay examinations are to be used)
1. Comparison of two things
2. Explanation of the use or meaning of a statement or passage.
3. Analysis
4. Decisions for or against
5. Discussion
54. How to construct essay examinations
1. Determine the objectives or essentials for each question to be
evaluated.
2. Phrase questions in simple, clear and concise language. af
3. Suit the length of the questions to the time available for
answering the essay examination. The teacher should try to answer
the test herself.
55. 4. Scoring
a. Have a model answer in advance.
b. Indicate the number of points for each question.
c. Score a point for each essential.
56. Advantages and disadvantages of the Objective Type of
Tests
Advantages
a. The objective test is free from personal bias in scoring. '
b. It is easy to score. With a scoring key, the test can be corrected by
different individuals without affecting the accuracy of the grades
given.
c. It has high validity because it is comprehensive with wide
sampling of essentials.
d. It is less time-consuming since many items can be answered in a
given time.
¢. It is fair to students since the slow writers can accomplish the test
as fast as the fast writers. ' ,
57. Disadvantages
a. It is difficult to construct and requires more time to prepare.
b. It does not afford the students the opportunity in training for
self- and thought organization a
c. It cannot be used to test ability in theme writing or journalistic
writing.
58. Advantages and Disadvantages of the Essay type of Tests
Advantages
a. The essay examination can be used in practically all subjects of
the school
curriculum.
b. It trains students for thought organization and self expression.
c. It affords students opportunities to express their originality and
independence of thinking.
d. Only the essay test can be used in some subjects like composition
writing and journalistic writing which cannot be tested by the
objective type test.
59. e. Essay examination measures higher mental abilities like
comparison,
y interpretation, criticism, defense of opinion and decision.
f. The essay test is easily prepared.
g. It is inexpensive
60. Disadvantages
a. The limited sampling of items makes the test unreliable measure
of achievements or abilities.
b. Questions usually are not well prepared.
c. Scoring is highly subjective due to the influence of the corrector’s
personal judgment.
d. Grading of the essay test is inaccurate measure of pupils’
achievements due to subjectivity of scoring.
61. STATISTICAL MEASURES OR TOOLS USED IN THE
INTEREPRETING NUMERICAL DATA
Frequency Distributions
A simple, common sense technique for describing a sct of
test scores is through the use of a frequency distribution. A
frequency distribution is merely a listing of the possible
score values and the number of persons who achieved each
score. Such an arrangement presents the scores in a more
simple and understandable manner than merely listing all
of the separate scores. Consider a specific set of scores to
clarify these ideas.
62. MEASURES OF CENTRAL TENDENCY
Frequency distributions are helpful for indicating
the shape to describe a distributions of scores, but
we need more information than the shape to
describe a distribution adequately. We need to know
where on the scale of measure of central tendency.
63. MEASURES OF DISPERSION
Measures of central tendency are useful for summarizing average
performance, but they tell us nothing about how the scores are
distributed or “spread out” around the averages. Two sets of test
scores may have equal measures of central tendency, but they might
differ in other ways. One of the distributions may have the scores
tightly clustered around the average, and the other distribution may
have scores that are widely separated. As you may have anticipated,
there are descriptive statistics that measure dispersion, which also
are called measures of variability. These measures indicate how
spread out the scores tend to be.
64. Graphing Distributions
A graph of a distribution of test scores is often better understood
than is the frequency distribution or a mere table of numbers. The
general pattern of scores, as well as any unique characteristics of
the distribution, can be seen easily in simple graphs. There are
several kinds of graphs that can be used, but a simple bar graph, or
histogram, is as useful as any.