2. INTRODUCTION
Assessment and evaluation plays important role in the classroom instructional
planning, execution and the final instructional decision making process.
It does not only provide evidence about the student performance but also the
concrete basis for providing feedback and decision about the future prospects
of teaching learning process.
3. CONCEPT OF MEASUREMENT AND
EVALUATION
In educational assessment the concept of assessment, measurement and
evaluation are interrelated and at the same time very different things, yet
most of the times students are unable to adequately explain the differences.
Measurement Educational Measurement is the process of assigning numbers
to individuals or their characteristics according to specified rules.
Measurement requires the use of numbers but does not require the value
judgments be made about the numbers obtained from the process.
Achievement is measured with a test by counting the number of test items a
student answers correctly, and use exactly the same rule to assign a number
to the achievement of each, student in the class. Measurements are useful for
describing the amount of certain abilities that individuals have (Farooq,
2013).
4. Cont…
Measurement is therefore a process of assigning numerals to objects,
quantities or events in order to give quantitative meaning to such qualities.In
the classroom, to determine a child‘s performance, one needs to obtain
quantitative measures on the individual scores of the child.
If the child scores 80 in Mathematics, there is no other interpretation one
should give it and it cannot be said he/she has passed or failed.
5. Evaluation
Evaluation on the other hand adds the ingredient of value judgment to assessment. It is
concerned with the application of its findings and implies some judgment of the
effectiveness, social utility or desirability of a product, process or progress in terms of
carefully defined and agreed upon objectives or values.
Evaluation often includes recommendations for constructive action.
Thus, evaluation is a qualitative measure of the prevailing situation. It calls for evidence of
effectiveness, suitability, or goodness of the programme.
A classroom average temperature of 75 degrees is simply information. It is the context of the
temperature for a particular purpose that provides the criteria for evaluation.
A temperature of 75 degrees may not be very good for some students, while for others, it is
ideal for learning.
6. Good Evaluation Achieves Several Goals
Describe: it will describe the characteristics of the object of study, its strengths and
weaknesses, the intended and unintended effects, and the critical issues that have to
be understood to place the object in the correct context.
Appraise: this is the section where the evaluator addresses the focus questions. It is
not enough to describe the object – the evaluator has to make judgments about the
object and provide the hard answers.
Advice: again, evaluation is no place for those who are risk avoiders. Evaluators put
their reputation on the line because they must not only address the questions, but
also provide advice on whether to cease with a teacher‘s favorite programme, or
amend it in some quite specific ways taking into account what is known in the field.
7. TYPES OF TESTS
Standardized Achievement Tests
A standardized test is any form of test that:
requires all test takers to answer the same questions, or a selection of
questions from common bank of questions, in the same way, and that is
scored in a ―standard‖ or consistent manner, which makes it possible to
compare the relative performance of individual students or groups of
students.
While different types of tests and assessments may be ―standardized‖ in this
way, the term is primarily associated with large-scale tests administered to
large populations of students, such as a multiple-choice test given to all the
eighth-grade public-school students in a particular state or province.
8. Examples of standardized test
Achievement tests
are designed to measure the knowledge and skills students learned in school or
to determine the academic progress they have made over a period of time.
Aptitude tests
attempt to predict a student‘s ability to succeed in an intellectual or physical
endeavor by, for example, evaluating mathematical ability, language
proficiency, abstract reasoning, motor coordination, or musical talent.
College-admissions/ Entry tests
are used in the process of deciding which students will be admitted to a
collegiate program.
9. Cont…
International-comparison tests
are administered periodically to representative samples of students in a
number of countries, for the purposes of monitoring achievement trends in
individual countries and comparing educational performance across countries.
Psychological tests
including IQ tests, are used to measure a person‘s cognitive abilities and
mental, emotional, developmental, and social characteristics. Trained
professionals, such as school psychologists, typically administer the tests,
which may require students to perform a series of tasks or solve a set of
problems
10. Characteristic of standardized achievement
tests
Unlike teacher-made tests that are regularly administered to students often
on a weekly basis, standardized tests are scheduled in advance. But the
schedule is not the primary reason why such tests are described as
standardized. Such tests meet the technical qualities established by the
American Psychological Association (APA).
Test Construction and Evaluation
The test should be constructed in a way that eliminates the influence of
guessing as well as misunderstandings of the item‘s question. This is where a
well-structured format is crucial in test construction
11. Cont…
Test Use –
The method of utilizing the test should meet professional and ethical
considerations. For example, some tests are inappropriate for children who
are younger than two years and some tests cannot be used for students who
are suffering from particular disabilities.
Particular Applications
The use of scores obtained from the standardized test should be clearly
specified. One standardized test that measure intelligence and achievement
should not be used to measure social skills. Administrative Procedures – Each
standardized test should have a systematic procedure in administration,
scoring, and interpretation
12. Criteria
Validity
This refers to the appropriateness and meaningfulness of the test. The test should
measure what it is supposed
Reliability
This refers to the consistency of the scores that can be obtained from the test.
For example, the score obtained by a student from a particular test should be
roughly the same as the score that the same student will get when he/she takes
the test again.
Norms
This refers to the comparison of a student‘s score in a test to the scores of a
reference group of students. A norm that can be followed with confidence results
in a good comparison.
13. Technical Characteristics of
Standardized Tests
Reliability
In popular use, reliability refers to the extent to which one obtains consistent
results with some thing or process. For example, a reliable automobile is one that
consistently starts when the ignition is turned, a reliable employee consistently
shows up for work when scheduled, and a reliable performer consistently yields
good (or bad) performances.
Test–Retest Estimates of Reliability.
The test–retest technique for estimating reliability entails administering the
same test to a group of examinees on two distinct occasions.
Reliability Estimates Based on Equivalent Forms
It is not uncommon for test developers to have multiple forms of the same test.
This is necessary for purposes of makeup exams, educational research using a
pre–post design, and so on.
14. Cont…
Inter-rater Estimates of Reliability
Because many of the current commercially developed tests have open-
ended items or give respondents latitude in constructing a response, judges
must be used to evaluate the appropriateness of a given response.
Reliability Estimates for Classifications of Examinees
The minimum- competency and Criterion Referenced Tests (CRT) movements
of the early 1970s emerged in response to the widespread use of
standardized tests designed to facilitate comparisons among students. These
latter tests were not particularly informative about what examinees could or
could not do. CRT was designed to accomplish this end
15. Standard Error of Measurement
The reliability statistics mentioned above are group-based statistics. A
measure of the amount of error associated with a specific examinee is given
by the standard error of measurement. This index can be interpreted as the
typical amount of error associated with the scores of individuals.
16. Validity
The issue of validity is the most basic of all measurement concepts. It poses
the question of whether the instrument measures that which the user
intends. This concept is so fundamental that it precedes the question of
reliability in importance. If a test is not valid, then its reliability is of no
importance to the user.
Construct Validity
There are many psychological constructs which are of concern to educators.
These include intelligence, motivation, self-concept, and anxiety.
Commercially developed measures of these constructs are not as common in
schools as achievement tests, yet they are important.
17. Cont…
Content Validity
Evidence that a test has content validity is based, as one might expect, on
the items in the test. The question of content validity concerns the extent to
which the items on a test reflect the type of content and cognitive skills
expected by the user.
Predictive Validity
This type of validity is usually reported for tests designed to predict some
future event. Many college admissions tests, for example, report validity
coefficients related to their ability to predict freshman grade point averages
18. Cont…
Consequential Validity.
A final issue discussed in the context of validity is the notion of consequential
validity. Again, building on the dependence of validity on test use, this type of
validity is concerned with the consequences of test use.
The general idea is that the use of any measure should be evaluated in terms
of its potential impact on those involved.