Assessmentoflearning

ASSESSMENT OF LEARNING
Mary Joy D. Antiola

ASSESSMENT OF LEARNING
It focuses on on the development and utilization of
assessment tools to improve the teaching learning process.
MEASUREMENT: refers to the quantitative aspect of evaluation.
EVALUATION: the qualitative aspect of evaluation.
TEST: consist of questions or exercise or other devises for
measuring the outcome of learning.

CLASSIFICATION OF TEST
1, According to manner and
response
a. oral
b. written
2, According to method of
preparation
a. subjective
b. objective
3, According to nature of answer
a. personality test
b. intelligence test
c. aptitude test
d. achievement or summative test
e. sociometric test
f. diagnostic or formative test
g. trade or vocational test

• Objective test: test which have definite answers and
therefore are not subject for personal bias.
• Teacher made test: constructed by the teacher based on the
content of different subject taught
• Diagnostic test: to measure a student strength and
weaknesses.
• Formative test: done to monitor students attainment of the
instructional objectives.
• Summative test: done at the conclusion of instruction
• Standardize test: are test for which contents have been
selected and for which norms and standards have been
established.

• Criterion referenced measure: a measuring
device with a pre determined level of success or
standard on the part of the test takers.
• Norm referenced measure: is a test the is scored
on the basis of the norm or standard level of
accomplishment by the whole group taking the
test.

CRITERIA OF A GOOD EXAMINATION
Validity:
Refers to a degree to which the test measures what it is intended
to measure. It is the usefulness of the test for a given measure.
RELIABILITY:
Pertains to the degree to which a test measures what is suppose
to measure. The test of reliability is the consistency of the result when it
is administered to different group of individuals with similar
characteristics in different places at different times.

OBJECTIVITY:
is the degree to which personal bias is eliminated in the
scoring of the answers. When we refer to the quality of
measurement, essentialy we mean to have the amount of
information contained in a score generated by the
measurement.
Measurement may differ in the amount on information the
members contain. These are:
Nominal Measurement
Are the least sophisticated, they merely classify object or
events by assigning numbers to them. These numbers are
arbitrary and imply no qualification, but the categories must be
mutually exclusive and exhaustive.

Ordinal Measurement
Ordinal scales classify, but they also assign rank order. Example,
ranking individuals in a class according to their test scores. Their scores
are ordered from highest to lowest.
Interval Measurement:
in order to be able to add and subtract scores, we use interval
scales. This measurement contains the nominal and ordinal properties
and is also characterized by equal units between scire points.
Ratio measurement:
the ost sophisticated type of measurement includes all the
proceeding properties, but in a ratio scale, the zone point is not
arbitrary; a score of zero includes the absence of what is being
measured.

Norm-referenced and criterion-referenced Measurement
norm-referenced has been used in education; norm-
referenced test continue to comprise a substantial portion of
the measurement in today’s schools. Criterion-referenced
measurement imphasize that the type of measurement or
testing depends on how the scores are interpreted . Both type
can be be use effectively by the teacher.
Norn- referenced interpretation
stems from the desire to differentiate among individuals or to
discriminate among the individuals of some defined groups on
whatever is being measured.

Criterion-referenced interpretation
It means referencing an individual’s performance to some criterion
that is a defined performance level. A second meaning fo this term
involves the idea of a defined behavioral domain that is, a defined body
of learner’s behavior.
Distinction
Norm-reference test are usually more general and comprehensire and
cover a large domain of content and learning tasks.
Criterion=referenced test focus on a specific group of learner behavior.

STAGES IN TEST CONSTRUCTION
i. Planning the test
A. Determining the objectives
B. Preparing the table of specification.
C. Selecting the appropriate item format
D. writing the test items
E. editing the test items
II. Trying out the test
A. administering the first try out – then item analysis
B. administering the second try out – then item analysis
C. preparing the finaal form of test
III. Establishing test validity
IV. Establishing the test reliability
V. Interpreting the test scores

MAJOR CONSIDERATON IN TEST
CONSTRUCTION
Type of test
It is a take home test rather that an in class test, how do you
make sure that students work independently, have equal access to
sources and resources, or spent a sufficient but not enourmous amount
of time on the task? The test plan must include a wide array of issues.
Anticipating this potential problem allows the test constructor to develop
positions or policies that are consistent with his/her testing philosophy.

CONSTRUCTION
TEST LENGTH
a majot decision in the test planning is how many items
should be included on the test. There should be enough to
cover the content adequately, but the length of the class period
or the attention span or fatigue limits of the students usually
restrict the test length.

CONSTRUCTION
ITEM FORMATS
determining what kind of items to include on the test
is a major decision once the planning decisions are
made, the item writing begins. This task is often the
most feared by the beginning test constructors.
However, the proceedures are more common sense
than formal rules.a

POINTS TO BE CONSIDERED IN PREPARING
ATEST
1. Are the instructional objectives clearly defined?
2. What knowledge, skills and attitudes do you want measure?
3. Did you prepare table of specifications?
4. Did you formulate well defined and clear test items?
5. Did you employ correct English in writing items?
6. Did you avoid giving to the correct answer?
7. Did you test the important ideas rather than the trivial?
8. Did you adapt the test's difficulty to your student's ability?

POINTS TO BE CONSIDERED IN PREPARING
ATEST
9. Did you avoid using textbook jargons?
10. Did you cast the items in positive form?
11. Did you prepare a scoring key?
12. Does each item have single correct answer?
13. Did you review your items?

DIFFERENT TYPES OF TEST
1, The test items should be selected very carefully. Only important facts should
be included.
2. The test should have extensive sampling of items.
3. The test items should be carefully expressed in simple, clear, definite, and
meaningful sentences
4 There should be only one possible correct response for each test item
5. Each item should be independent. Leading clues to other items should be
avoided.
6 Lifting sentences from books should not be done to encourage thinking and
understanding

7. The first person personal pronouns / and we should not be used.
8. Various types of test items should be made to avoid monotony .
9. Majority of test items should be of moderate difficulty, Few difficult and few casy items should
be included.
10. The test items should be arranged in ascending order of difficulty, Easy items should the
beginning encourage the examinee to pursue the the most difficult items the end.
11. Clear, concise, and complete directions should precede Sample test items be provided for
expected responses.
12. Items which can be answered previous experience alone without knowledge ofthe 12. subject
matter should not be included
13. Catchy should not be used in the test iterns.

14. Test items must be based upon the objectives of the course and upon the course content.
15. The test should measure degree of achievement or determine difficulties the learners 16. The
test should emphasize ability to apply and use facts as well as knowledge of facts.
17. The test should be of such length that it can be completed within the allotted by all or nearly
all of the pupils, The teacher should perform the test herself to determine its approximate time
allotment.
18. Rules governing good language expression, grammar, spelling, punctuation, and
capitalization should be observed items.
19. Information on how scoring be done should be provided.
20. Scoring Keys in correcting and scoring tests provided.

POINTERS TO BE OBSERVED IN
CONSTRUCTING AND SCORING THE DIFFERENT
KINDS OF TEST
a. Recall types
I. Simple recall type
a. This type consists of questions calling for a single word or expression as an answer
b. Items usually begin with who, where, when, and what
c. Score is the number of correct answers,
2. Completion type
a. Only important words or phrases should omitted avoid confusion.
b. Blanks should equal lengths.
c. The blank, as much possible, is placed near end of the sentence.
D. Articles a, an, and the be provided before omitted word phrase to clues for
answers.
e. number of correct answers

KINDS OF TEST
3. Ennumeration Type.
a, the exact number of expected answers should be
stated.
b blanks should be on equal length.
c score is the number of correct answer.
4, identification Type
a. The items should make an examinee think of a word, number or group of
words that would complete the statement or answer the problem.
b. Score is the number of correct answer.

KINDS OF TEST
B. RECOGNITION TYPES
1,True-false or alternate-response type
a. Declarative sentences should be used.
b. The number of "true" and "false" items should be more or less
equal
c. The truth or falsity of the sentence should not be too evident.
d. Negative statements should be avoided.
e. The "modified true-false" is more preferable than the "plain true-
false"

KINDS OF TEST
f. In arranging the items, avoid the regular recurrence of "true" and
"false" statements.
g. Avoid using specific determiners like: all, always, never, none,
nothing, most, oflen, some, etc, avoid weak statements as may,
sometimes, as rule, in general etc
h. Minimize the use qualitative terms like few, great, many, more, etc
i. Avoid leading clues answers in all items.
J. Score is the number of correct answers in "modified true-false and

KINDS OF TEST
2, Yes-No type
a. The items should be in interrogative sentence.
b. The same rules as in “true or false” are applied.

KINDS OF TEST
3. Multiple=response type
a. There should be three to five choices. The number of coices
choose in the item.
b. The choices should be numbered or lettered so that only the
number or letter can be written on the space provided,
c. If the choices are figures, thy should be arranged in ascending
order.
d. Avoid the use of “a” or “an” as the last word prior to the listing of the

KINDS OF TEST
e. Random occurrence of responses should be employed.
f. The choices, as much as possible should be at the end of the
statement.
g. The choices should be related in some way or should belong to the
same class.
h. Avoid the use of “none of these” as one of the choices.
i. Score is the number of correct answers.

KINDS OF TEST
4. Best Answer Type
A. There should be three in five choices all of which are
right but vary in their degree of merit, importance or
desirability.
B. The other rules for multiple response items are applied
here.
C. Score is the number of correct answer.

KINDS OF TEST
5. Matching Type
A. There should be two columns under “A” are the stimuli which
should be longer and more descriptive than the responses under
column “B”. The response may be a word, a phrase, a number or a
formula.
B. The stimuli under column “A” should be numbered and the
responses under column ”B” should be lettered. Answers will be
endicated by letters only on lines provided in column “A”.
C. The number of pairs usually should not exceed twenty items

KINDS OF TEST
D. The number of responses in column “B” should be two or
more than the number of items in column “A” to avoid
guessing.
E. Only one correct answer for each item should be possible.
F. Matching sets should neither be too long or too short.
G. All items should be in the same page.
H. Score is the number of the correct answer.

KINDS OF TEST
C. Essay type examination.
1. Comparison of two things.
2. Explanation of the use of meaning of a statement or
passage.
3. analysis
4. Decision for or against
5. discussion

KINDS OF TEST
How to construct essay examinations.
1, Determine the objectives or essentials for each question to be evaluated.
2. Phrase questions in simple, clear and concise language.
3. Suit the length of the questions to the time available for answering the essay
examination. The teacher should try to answer the test herself,
4. Scoring:
a. Have a model answer in advance.
b. Indicate the number of points for each question.
c. Score a point for each essential.

ADVANTAGES AND DISADVANTAGES OF THE
OBJECTIVE TYPE OF TEST
Advantages
a. The objective test is free from personal bias in scoring
b. It is easy to score. With a scoring key, the test can be corrected by different
individuals without affecting the accuracy of the grades given
c. It has high validity because it is comprehensive with wide sampling of
essentials.
d. It is less time-consuming since many items can be answered in a given time.
e. It is fair to students since the slow writers can accomplish the test us fast as the
fast writers.

OBJECTIVE TYPE OF TEST
Disadvantages
a. It is difficult to construct and requires more time to prepare.
b. It does not afford the students the opportunity in training for self-
and thought organization.
c. organization. It cannot be used to test ability in theme writing or
journalistic writing.

ESSAY TYPE OF TEST
Advantages
a. The essay examination can be used in practically all subjects of the school
curriculum.
b. It trains students for thought organization and self expression
c. It affords students opportunities to express their originality and independence of
thinking .
d. Only the essay test can be used in some subjects like composition writing
journalistic writing which cannot be tested the objective type test
e. Essay examination measures higher mental abilities comparison, interpretation,
defense of opinion and decision

ESSAY TYPE OF TEST
Disadvantages
a. The limited sampling of items makes the test unreliable measure of
achievements or abilities.
b. Questions usually are not well prepared.
c. Scoring is highly subjective due to the influence of the corrector's
personal judgment
d. Grading of the essay test is inaccurate measure pupils'
achievements due to subjectivity of scoring

STATISTICAL MEASURES OR TOOLS USED IN
INTERPRETING NUMERICAL DATA
FREQUENCY DISTRIBUTION
a simple, common sense technique for
describing a set of test scores is through the use
of frequency distribution. It is merely listing of the
possible score values and the number of persons
who ahieve each scores.

StTEPS THAT ARE INVLOVED IN CREATING THE
FREQUENCY DISTRIBUTION
FIRST, List the possible score values
in rank order, from highest to lowest then a
second column indicate the frequency or number
of person who receive each scores.

When there is a wide range of scores in a frequency distribution, the distribution can
be quite long , with a lot of zeros in the column of frequencies. Such a frequency
distribution can make interpretation difficult and confusing. a grouped frequency
distribution would be more appropriate in this kind of situation. groups of score
values are listed rather than each separate possible score value.
If we were to change the frequency distribution and Table 2 into a grouped frequency
distribution, you might choose intervals such as 48 -50 , 45 -47, and so forth. The
frequency corresponding to intervals 48 -50 would be 9 (1 + 3 + 5). The choice of
width of the interval is arbitrary cramp , but it must be the same as all intervals. In
addition, it is a good idea to have an odd numbered interval width so that the midpoint
of the interval is a whole number. This strategy will simplify subsequent grass at 10
descriptions of data. The group frequency distribution is presented in Table 3.

Measures of Central Tendency
Frequency distributions are helpful for indicating the shape of to
describe a distributions of scores , but we need more information than the
shape to describe the distribution adequately . We need to know we're on
the scale of measurement a distribution is located and how the scores are
dispersed in the description . For the former, we compute measures of
central tendency, and for the latter we compute measures of dispersion.
Measures of central tendency are points of the scale of measurement,
and they are representative of how the scores then to others . There are
three commonly used measures of central tendency, the mean, the
median, and the mode, but the mean is by far the most widely used .

The Mean
The mean of a set of scores is the arithmetic mean.
It is found by summing the scores and dividing the sum
by the number of scores . The mean is the most
commonly used measure of central tendency because it
is easily understood and is based on all the scores in
the set; hence , it summarizes a lot of information. The
formula of the Mean is as follows.

The Median
Another measure of central tendency is the median which is the point that divides
distribution in half; that is, half of the scores fall above the median and half of the
scores fall below the median.
When they are only a few scores, the median can often be found by inspection . if
there is an odd number of scores , the middle score is the median . When there is
even a number of scores, the median is halfway between the two middle scores .
However, when they are tied scores in the middle of the distribution, or when the
scores are in a frequency distribution, the median may not be so obvious.
Consider again the frequency distribution in table 2. there were 25 scores and
distribution, so the middle score should be the median. A straightforward way to find
this median is to augment the frequency distribution with a column of cumulative
frequency. Cumulative frequencies indicate the number of scores at or below each
score. Table 4 indicates the cumulative frequencies for the data in Table 2

For example, 7 persons scored at or below a score of 40 , and
21 persons scored at or below a score of 48.
To find the median , we need to locate the middle score in
the cumulative frequency column, because this score is the
median. Since there are 25 scores in the distribution , the
middle one is the 13th, a score of 46. Thus, 46 is a median of
this distribution; half of the people scored above 46 and half
scored.
When there are times in the middle of the distribution,
there may be a need to interpolate between scores to get the
exact median. However, such precision is not needed for most
classroom tests . The whole nuamber closest to the median is
usually sufficient.

The Mode
The measure of central tendency that is the easiest to
find is the mode . the mode is the most frequently occurring
score in the distribution . The mode of the scores in table 1 is
48. 5 persons had two scores of 48 and no other score
occurred as often.

Each of the three measures of central tendency - the mean , median , and the
mode means a legitimate definition of “average” performance on this test. However ,
each does provide different information . The arithmetic average was 44; half the
people scored at or below 46 and more people received 48 than any other score.
When a distribution has a small number of very extreme scores , though , the
median may be a better definition of central tendency . the mode provides the least
information and is used infrequently as “average”. The mode can be used with
nominal scale data , just as an indicator of the most frequently appearing category .
The mean, the median, and the mode all describes central tendency:
The mean is the arithmetic average
The median divides the distribution in half
The mode is the most frequent score

Measures of Dispersion
Measures of central tendency are useful for summarizing average
performances, but they tell us nothing about how the scores are
distributed or spread out around the averages . Two sets of test scores
may have equal measures of central tendency , but they may differ and
other ways. One of the distributions may have the scores tightly
clustered around the average , and the other distribution may have
scores that are widely separated. As you may have anticipated, there are
descriptive statistics that measure dispersion, which are also called
measures of variability. These measures indicate how spread out the
scores tend to be.

The Range
The range Indicates the difference between the highest and
lowest scores in a distribution . It is simple to calculate, but it provides
limited information. We subtract the lowest from the highest score and
add 1 so that we include both scores in the spread between them. For
the scores of Table 2 the range is 50 - 34 + 1 = 17.
A problem with using the range is that only the two most extreme
scores are used in this computation. There's no indication of the spread
of scores between highest and lowest. Measures of dispersion that take
into consideration every score in the distribution are the variance and
standard deviation. The standard deviation is used a great deal in
Interpreting scores from standardized tests.

The Variance
• The variance measures how
widely the scores in the
distribution are spread about
the mean . In other words ,
the variance is the average
squared difference between
the scores and the mean.
• The computation of the

The Standard Deviation
The standard deviation also indicates how spread out the scores are, but is
expressed in the same units as original scores. The standard deviation is computed
by finding the square root of the variance.
S = S2
For the data in table 1 , the variance is 22.8. The standard deviation is 22.8, or
4.77. The scores of most norm groups have the shape of a normal distribution- a
symmetrical bell-shaped distribution with which most people are familiar. With the
normal distribution, about 95% of the scores are within 2 standard deviation of the
mean,
Even the scores are not normally distributed, most of the scores will be within 2
standard deviations of the mean. In the example, the mean minus 2 standard
deviation is 34.46, and the mean plus two standard deviation is 53.54. Therefore only
one score is outside of this interval; the lowest score, 34, is slightly more than two
standard deviations from the mean.

Graphing Distributions
A graph of distribution of tes course is often better understood than is the
frequency distribution or amir table of numbers . The general pattern of scores , as
well as and unique characteristics of the distribution , can be seen easily in simple
graphs. There are several kinds of graphs that can be used, but a simple bar graph,
or a histogram, is as useful as any.
The general shape of the distribution is clear from the graph. most of the
scores in a distribution are high, at the upper end of the graph. Such a shape is
quite common for the scores of classroom tests. That is, test scores will be grouped
toward the right and of the measurement scale.
A normal distribution has most of the test scores in the middle of the
distribution and progressively Fewer Scores toward extremes. the scores of norm
groups are seldom graphed but they could be if we were concerned about seeing the
specific shape of the distribution of scores. Usually, we know or assume that the
scores are normally distributed.

Assessmentoflearning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Assessmentoflearning

Ähnlich wie Assessmentoflearning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Assessmentoflearning