2. DEFINITION OF TERMS – test, measurement,
evaluation, and assessment
• A test is a subset of assessment intended to measure a test-taker's language
proficiency, knowledge, performance or skills
• Brown defined a test as a process of quantifying a test-taker’s performance
according to explicit procedures or rules.
• Assessment : process of observing and measuring learning. It is an ongoing
process in educational practice, which involves a multitude of methodological
techniques.
• It can consist tests, projects, portfolios.
• Evaluation involves the interpretation of information. When a tester or marker
evaluate, s/he “values” the results in such a way that the worth of the
performance is conveyed to the test-taker.
• Measurement is the assigning of numbers to certain attributes of objects, events,
or people according to a rule-governed system.
4. Four stages/phases of development of
examination system in our country
• Pre-Independence
• Razak Report
• RahmanTalib Report
• Cabinet Report
• Malaysia Education Blueprint (2013-2025)
8. Assessment OF Learning
• the use of a task or an activity to measure, record, and report on a
student’s level of achievement in regards to specific learning
expectations.
• This type of assessment is also known as summative assessment.
• provide the focus to improve student achievement, give everyone the
information they need to improve student achievement, and apply
the pressure needed to motivate teachers to work harder to teach
and learn.
10. Assessment FOR leaning
• the use of a task or an activity for the purpose of determining student
progress during a unit or block of instruction.
• Is roughly equivalent ( the same) to formative assessment -
assessment intended to promote further improvement of student
learning during the learning process.
• commonly known as formative and diagnostic assessments.
• students are provided valuable feedback on their own learning.
11. Importance of AFL
• reflects a view of learning in which assessment helps students learn better,
rather than just achieve a better mark
• involves assessment activities as part of learning and to inform the
planning of future learning
• includes clear goals for the learning activity
• provides effective feedback that motivates the learner and can lead to
improvement
• reflects a belief that all students can improve
• encourages self-assessment and peer assessment as part of the regular
classroom routines
• involves teachers, students and parents reflecting on evidence
• is inclusive of all learners.
12. Types of tests
Henning (1987) identifies six kinds of information that tests provide
about students. They are:
o Diagnosis and feedback
o Screening and selection
o Placement
o Program evaluation
o Providing research criteria
o Assessment of attitudes and socio-psychological differences
13. Type of tests Explanation
1. Proficiency tests - designed to assess the overall language ability of students at varying levels.
- usually developed by external bodies such as examination boards like Educational
Testing Services (ETS) or Cambridge ESOL.
- Standardized
2. Achievement tests - to see what a student has learned with regard to stated course outcomes
- usually administered at mid-and end- point of the semester or academic year.
- generally based on the specific course content or on the course objectives.
- cumulative, covering material drawn from an entire course or semester.
3. Diagnosis tests - seek to identify those language areas in which a student needs further help.
- is crucial for further course activities and providing students with remediation.
- placement tests often serve a dual function of both placement and diagnosis (Harris &
McCann, 1994; Davies et al., 1999).
4. Aptitude tests - designed to measure general ability or capacity to learn a foreign language a priori
(before taking a course) and ultimate predicted success in that undertaking.
- designed to apply to the classroom learning of any language
5. Progress tests - measure the progress that students are making towards defined course or programme
goals.
- administered at various stages throughout a language course to see what the students
have learned
14. Type of tests Explanation
6. Placement tests - designed to assess students’ level of language
ability for placement in an appropriate course or
class.
- indicates the level at which a student will learn
most effectively
- main aim is to create groups, which are
homogeneous in level.
15. Malaysian Context (KSSR)
• School-Based Assessment
Purpose:
1. to realign the education system from one that focuses on academic
excellence to a more holistic one
2. To ensure a more systematic mastery of knowledge by emphasising
on assessment of each child.
3. To achieve the aspiration of National Philosophy of Education
towards developing well rounded learners (JERIS)
4. to reduce exam-oriented learning among learners
5. to evaluate learners’ learning progress
16. Malaysian context ctd..
SBE features:
• Assessment for and of learning
• Standard-referenced Assessment (Performance Standard)
• *Formative tests which are assessed using Bands 1 to 6, HOTS (Higher
Order Thinking Skills)
• Holistic
• Integrated
18. SBE Instrument (WHO):
• Teachers
• Rationale:
• - Can continuously monitor their pupils’ growth
• - Can provide constructive feedback to help improve pupils’ learning
abilities
• - Better understand the context and environment most conducive to
assess pupils
• - Appraise and provide feedback based on the Performance Standards
HOW:
Observation, Performance, Project, Product,Hands on, Written Essays,
Pencil and Paper, Worksheet, Open ended discussion, Quizzes,
Checklist,Homework.
19. Performance Standard ;
a set of statements detailing the achievement and mastery of an individual within a
certain discipline, in a specific period of study based on an identified benchmark.
22. Norm-Referenced Test Criterion-Referenced Test
(Mastery tests)
Definition
Purpose
A test that measures student’s
achievement as compared to other
students in the group
*Designed to yield a normal curve, 50%
above 50% below.
Determine performance difference among
individual and groups
An approach that provides information
on student’s mastery based on a
criterion specified by the teacher
*Anyone who meets the criterion can get
high score
Determine learning mastery based on
specified criterion and standard
Test Item
Frequency
From easy to difficult level and able to
discriminate examinee’s ability
Continuous assessment in the classroom
Guided by minimum achievement in the
related objectives
Continuous assessment
Appropriateness
Example
Summative evaluation
Public exams: UPSR, PMR, SPM, and STPM
Formative evaluation
Mastery test: monthly test, coursework,
project, exercises in the classroom
23. Norm-Referenced Test Criterion-Referenced Test
Purpose - To rank each pupil with respect to the
achievement of others in broad areas of knowledge.
- To discriminate between high and low achievers.
- To show how a student’s performance compares
to that of other test-takers
- To determine whether each student has
achieved specific skills or concepts.
- To find out how much student know
before instruction begins and after it has
finished.
- To classify students according to whether
they have met established standard.
Content - Measures broad skill areas sampled from a
variety of textbooks,syllabus,and the judgment
of curriculum experts
- Measures specific skills which make up a
designated curriculum
- Each skill is expressed as an instructional
objective.
Item characteristics - Each skills is usually tested by less few items
- Items vary in difficulty
- Items are selected that discriminate between
high and low achievers
- Each skills tested by at least 4 items in
order to obtain an adequate sample of
pupil performance.
- Minimize guessing
- The items which test any given skill are
parallel in difficulty.
SET B: Q1a)
24. Norm-Referenced test (Normal Curve)
• represents the norm or average performance of a population
and the scores that are above and below the average within
that population.
• include percentile ranks, standard scores, and other statistics
for the norm group on which the test was standardized.
• A certain percentage of the norm group falls within various
ranges along the normal curve.
• Depending on the range within which test scores fall, scores
correspond to various descriptors ranging from deficient to
superior.
• An examinee's test score is compared to that of a norm group
by converting the examinee's raw scores into derived or scale
scores.
• Testmakers make the test so that most students will score
near the middle, and only a few will low (the left side of the
curve) or high (the right side of the curve).
• Scores are usually reported as percentile ranks.
• The scores range from 1st percentile to 99th percentile, with
the average students scores set at the 50th percentile.
25. Positive Skew
• Positive skew is when the long
tail is on the positive side of the
peak, and some people say it is
"skewed to the right".
• The mean is on the right of the
peak value.
• the mean is greater than the
mode.
• distribution has scores clustered
to the left, with the tail
extending to the right.
26. Negative Skew
• Majority of the score falls toward the upper hand.
• Curves are not symmetrical and have more scores on
the higher ends of distribution which will tend to
reduce the reliability of the test.
• Also called the mastery curve.
Problem:
• Scores are scrunched up around one point and thus
making it difficult to make decisions as many pupils
will be around that same point.
• Skewed distributions will also create problems as they
indicate violations of the assumption of normality that
underlies many of the other statistics that are used to
study test validity. (James Dean Brown, 1997)
27. Characteristics Formative Summative
Relation to Instruction - Occurs during instruction - Occurs after instruction
Frequency - Occurs on an ongoing basis - Occurs at a particular point in time to
determine to know
Relation to grading - Not graded – information used as feedback to
students and teachers, mastery is not expected
when students are first introduced to a concept
- Graded
Students role - Active engagement – self assessment - Passive engagement in design and
monitoring
Requirements for use - Clearly defined learning targets that students
understand
- Clearly defined criteria for success that
student understand
- Use of descriptive versus evaluation feedback
- Well designed assessment blue print
that outlines the learning targets.
- Well designed test items using best
practices
Examples - a process .
- Observations, interviews, evidence from
work samples, paper and pencil tasks
- Final assessment
Purpose - Designed to provide information needed to
adjust teaching and learning
- Designed to provide information about
the amount of learning that has
occurred at a particular point.
28. Formative Vs Summative
Assessment For Learning AFL/AOL? Assessment Of Learning
involves both teachers and students in ongoing dialogue,
descriptive feedback, and reflection throughout instruction
Elaboration -evaluate student learning at the end of an instructional unit by
comparing it against some standard or benchmark
-Specific learning outcomes and standards are reference points
-grade levels may be the benchmarks for reporting
-Rubrics can be given to students before they begin working on a
particular project so they know what is expected of them for
each of the criteria.
-help students identify their strengths and weaknesses and
target areas that need work
recognize where students are struggling and address
problems immediately
-gain as much information as possible of what the student
has achieved, what has not been achieved, and what the
student requires to best facilitate further progress
-Students’ involvement-Opportunities for students to
express their understandings
Benefit -create clear expectations
-Includes different level of difficulty
-make a judgment of student competency
29. Formative Summative
Exit slips: Ask students to solve one problem or answer one question
on a small piece of paper. Students hand on the slips as “exit tickets” to
pass to their next class, go to lunch, or transition to another activity.
The slips give teachers a way to quickly check progress toward skills
mastery.
Graphic organizers: When students complete mind maps or graphic
organizers that show relationships between concepts, they’re engaging
in higher level thinking. These organizers will allow teachers to monitor
student thinking about topics and lessons in progress.
Self-assessments: One way to check for student understanding is to
simply ask students to rate their learning. They can use a numerical
scale, a thumbs up or down, or even smiley faces to show how
confident they feel about their understanding of a topic.
Think-pair-share: Ask a question, give students time to think about it,
pair students with a partner, have students share their ideas. By
listening into the conversations, teachers can check student
understanding and assess any misconceptions. Students learn from
each other when discussing their ideas on a topic.
Observation: Watching how students solve a problem can lead to
further information about misunderstanding.
Discussion: Hearing how students reply to their peers can help a
teacher better understand a student’s level of understanding.
Categorizing: Let students sort ideas into self-selected categories. Ask
them to explain why such concepts go together. This will give you some
insight into how students view topics.
Example Multiple choice, True/false, Matching
Short answer
Fill in the blank
One or two sentence response
Portfolios: Portfolios allow students to collect evidence of
their learning throughout the unit, quarter, semester, or
year, rather than being judged on a number from a test
taken one time.
Projects: Projects allow students to synthesize many
concepts into one product or process. They require students
to address real world issues and put their learning to use to
solve or demonstrate multiple related skills.
Performance Tasks: Performance tasks are like mini-projects.
They can be completed in a few hours, yet still require
students to show mastery of a broad topic.
30. SetAQ1b)Benefits of integrating formative
and summative
• The integration of summative assessments with formative practices
can make the assessment process more meaningful for students by
providing regular feedback that supports learning whilst also
contributing towards an overall picture of their learning.
• Integrated assessment practices can also help learners to understand
connections between learning and assessment. Developing students’
active involvement as assessors of their own learning supports them
in life-long learning beyond formal education.
• The integration of assessments facilitates the accumulation of
evidence which can be used for both formative and summative
purposes over time, reducing ‘teaching to the test’.
31. Objective vs Subjective
Objective Subjective
Those with a single correct response regardless
of who scores a set of responses, an identical
score will be obtained
Those items that typically do not have a single
correct response
Subjective judgment of the scorer do not
influence an individual’s score
Subjective judgments of the scorer are an
integral part of the scoring process
Also known as “selected response” and
“structured-response” items
Also known as “free-response”, “constructed-
response” and “supply-type” items
Include multiple-choice question, matching and
alternative-choice items
Include short-answer and essay items
Assess lower-level skills such as
knowledge,comprehension
Require students to produce what they know
Relatively easy to administer, score and analyse Easy to construct
32. 5 Basic Terminology in Objective test
1. Receptive or selective response
Items that the test-takers chooses from a set of responses, commonly called a supply type of response rather than creating a
response.
2. Stem
Every multiple-choice item consists of a stem (the ‘body’ of the item that presents a stimulus). Stem is the question or
assignment in an item. It is in a complete or open, positive or negative sentence form. Stem must be short or simple,
compact and clear. However, it must not easily give away the right answer.
3. Options or alternatives
They are known as a list of possible responses to a test item. There are usually between three and five options/alternatives
to choose from.
4. Key
This is the correct response. The response can either be correct or the best one. Usually for a good item, the correct answer
is not obvious as compared to the distractors.
5. Distractors
This is known as a ‘disturber’ that is included to distract students from selecting the correct answer. An excellent distractor
is almost the same as the correct answer but it is not.
33. SETBQ1B Objective tests
Strength Weaknesses
Quick grading Difficult to design, has to consider good distractor
High inter-rater reliability
-requires no judgment from the scorer
Considerable effect
- Guessing is possible
Easy to administer especially for a big group Low validity
Wide coverage of topics in the outlined curriculum Difficult to construct HOTS question
Precision in testing specific skills Testing on the skill rather than content
34. General Guidelines for Objective Test items
MCQ Alternate-choice items
i.Design each item to measure a single objective;
ii.State both stem and options as simply and directly as possible;
iii.Make certain that the intended answer is clearly the one
correct one;
iv.(Optional) Use item indices to accept, discard or revise item.
1.Must have only one correct answer
2. Format the items vertically, not horizontally.
3. Avoid using ‘All of the above”, “None of the above”, or other
special distractors.
4. Use the author’s examples as a basis for developing your
items.
5. Avoid trick items which will mislead or deceive examinees into
answering incorrectly
An alternate-choice test item is a simple declarative sentence, one
portion of which is given with two different wordings.
E.g:
Ali seems to be (a) eager (b) hesitant in making decision to further his
studies.
The examinee's task is to choose the alternative that makes the sentence
most nearly true.
Rate of guessing is high
-Difficult to write good alternate choice that covers all aspects.
Takes a shorter time
-Examiners take shorter time to evaluate the examinee.
Trick questions are seldom appropriate
-Examiners need to test the examinee directly.
Avoid taking statements directly from the text and placing them out of
context.
-Avoid confusion. It will not test the examinee understanding but their
ability of finding answers.
Use other symbols other than T/F, Y/N
-Examiners could make the examinee to underline the correct answers.
35. General guidelines Subjective test items
Short answer Essay items
Short-answer questions are open-ended questions that require students
to create an answer. They are commonly used in examinations to assess
the basic knowledge and understanding (low cognitive levels) of a topic
before more in-depth assessment questions are asked on the topic.
-Design short answer items which are appropriate assessment of the
learning objective
-Make sure the content of the short answer question measures
knowledge appropriate to the desired learning goal
-Express the questions with clear wordings and language which are
appropriate to the student population
-Ensure there is only one clearly correct answer in each question
-Ensure that the item clearly specifies how the question should be
answered
-Write the instructions clearly so as to specify the desired knowledge and
specificity of response
-Set the questions explicitly and precisely.
-Direct questions are better than those which require completing the
sentences.
-Let the students know what your marking style is like, is bullet point
format acceptable, or does it have to be an essay format?
-Prepare a structured marking sheet; allocate marks or part-marks for
acceptable answer(s).
-Do not make the correct answer a “giveaway” word
that could be guessed by students who do not
really know the information.
-In addition, avoid giving grammatical cues or other cues to the
correct answer.
Avoid using statements taken directly from the
curriculum.
-Develop grading criteria that lists all
acceptable answers to the test item. Have subject matter experts
determine the acceptable answers.
-Clearly state questions
not only to make essay tests easier for students to answer,
but also to make the responses easier to evaluate
-Specify and define what mental process you want the students to perform
(e.g., analyze, synthesize, compare, contrast, etc.).
-Do not assume learner is practiced with the process
-Avoid writing essay questions that require factual knowledge,as those
beginning questions with interrogative pronouns(who, when, why, where)
-Avoid vague, ambiguous, or non-specific verbs
(consider, examine, discuss, explain) unless you include specific instructions
in developing responses
-Have each student answer all the questions
-Do not offer options for questions
-Structure the question to minimize subjective interpretations
37. Reliability (Brown)
• Consistent and dependable
- If you give to another pupil or matched pupil on 2 different occasion,
the test should yield similar result
• Consistent in its conditions across two or more administrations
• Gives clear directions for scoring / evaluation
• Has uniform rubrics for scoring / evaluation
• Lends itself to consistent application of those rubrics by the scorer
• Contains item / tasks that are unambiguous to the test-taker
38. Factor to UNRELIABILITY of a test
1. Student-related realibility
- Temporary illness, fatigue, a ‘bad day’, anxiety etc which make an observed score
deviate from one’s true score.
2. Rater-reliability-Human error and biasness while scoring.
-Inter-rater reliability happen when 2 or more scores award inconsistent scores of
the same test.
-Unclear scoring criteria,fatigue,biasness,carelessness.
3. Test Administration Reliability – conditions in which the test is administered
-noise,room lighting,variation in temperature,condition of table and chair
4. Test-reliability – Nature of the test can cause measurement errors.
-duration of the test (too long,timed), poorly written test items ie.
Ambigious,generic, have more than one answer.
39. Validity
• second characteristic of good tests is validity, which refers to whether the
test is actually measuring what it claims to measure.
• The extent to which inferences made from assessment results are
appropriate, meaningful and useful in terms of the purpose of the
assessment (Groundland,1998)
A valid test:
1. Measures exactly what it proposes to measure
2. Does not measure irrelevant or ‘contaminating’ variable
3. Relies as much as possible on empirical evidence (performance)
4. Involves performance that samples the test’s criterion (objective)
5. Offers useful, meaningful information about a test-takers ability
6. Is supported by a theoretical rationale or argument
40. Face Validity: Do the assessment items appear to be appropriate?
• “determined impressionistically; for example by asking students whether the
examination was appropriate to the expectations” (Henning, 1987).
• as the degree to which a test looks right, and appears to measure the knowledge or
abilities it claims to measure, based on the subjective judgement of the examinees who
take it, the administrative personnel who decide on its use, and other psychometrically
unsophisticated observers.
High validity if:
1. Well-constructed,expected format with familiar tasks
2. Clearly doable within the allotted time limit
3. Items that are clear and uncomplicated
4. Directions that crystal clear
5. Task that relate to their course work
6. A difficulty level that presents a reasonable challenge
41. Content Validity - Does the assessment content cover what you want to assess?
Have satisfactory samples of language and language skills been selected for testing?
• whether or not the content of the test is sufficiently representative and
comprehensive for the test to be a valid measure of what it is supposed to
measure” (Henning, 1987).
• “If a test samples the subject matter about which conclusions are to be drawn,
and if it requires the test-taker to perform the behaviour that is being measured”
(Mousavi,2002).
• Validity can be verified through the use of Table of Test Specification:
1. is to make sure all content domains are presented in the test.
2. give detailed information on each content,
3. level of skills,
4. status of difficulty,
5. number of items, item representation for rating in each content or skill or topic.
42. Construct Validity –
Are you measuring what you think you're measuring? Is the test based
on the best available theory of language and language use?
• The extent to which a test measures a theoretical construct or
attribute
• Proficiency, communicative competence, and fluency are examples of
linguistic constructs;
• Self-esteem and motivation are psychological constructs.
43. Criterion-Related Validity
is usually expressed as a correlation between the test in question and the criterion measure.
-Concurrent (parallel) validity: Can you use the current test score to estimate
scores of other criteria? Does the test correlate with other existing
measures?
• The extent to which procedure correlates with the current behaviour of
subjects
• the use of another more reputable and recognised test to validate one’s
own test.
-Predictive validity: Is it accurate for you to use your existing students’ scores
to predict future students’ scores? Does the test successfully predict future
outcomes?
• The extent to which a procedure allows accurate prediction about a
subject’s future behaviour
44. Consequential Validity
• Encompasses all of the consequences of a test, including considering
its accuracy in measuring intended criteria, its impact in the
preparation of test takers, its effect on the learner, and the social
consequences of a test’s interpretation and use.
45. Practicality
• Refers to logistical, administrative issues involved in making,giving, and
scoring an assessment instrument.
• Include “cost,time to construct and administer,ease of scoring and ease of
reporting the results (Mousavi,2009)
Practical test:
1. Stays within budgetary limits
2. Stays within appropriate time constraint
3. Relatively easy to administer
4. Appropriately utilizes available human resources
5. Does not exceed available material resources
6. Has a scoring/evaluation procedure that is specific and time-efficient
46. Objectivity
• refers to the ability of teachers/examiners who mark the answer
scripts.
• The extent in which an examiner examines and awards scores to the
same answer script.
High objective if:
1. Examiners are able to give the same score to the similar answers
guided by the marking scheme
Objective test = highest objectivity
Subjective test = lowest objectivity
47. Authenticity
• “the degree of correspondence of the characteristics of a given
language test task to the features of a target language task”
High authenticity if:
1. The language in the test is as natural as possible
2. Items are contextualized
3. Topics are meaningful
4. Some thematic organization to items is provided
5. Task represent to real world tasks
48. Washback
• refers to the impact that tests have on teaching and learning
Positive Washback Negative Washback
Teacher -Induce teachers to cover their subject
more thoroughly.
-Improve teaching strategies
-Encourage positive teaching learning
process
-Encourage teachers to make “teaching
to the test” curriculum
-Teacher not fulfil curriculum standard
-Neglect teaching of skill
Student -Make student work harder -Bring anxiety and distort performance
-Make student to create a negative
judgment towards test
Decision makers Use the authority power of high stakes
testing to achieve the goals
To improve and the introduction of new
curriculum
-Overwhelmingly use tests to promote
their political agendas and seize
49. Interpretability
• Test should be written in a clear,correct and simple language
• Avoid ambiguous questions and instruction
• Clarity is essential to enable the pupils know exactly what the
examiner wants them to do.
• Difficulty: The test questions should be appropriate in difficulty not
too hard or easy
• Should be progressive to reduce stress and tension
51. Stages of Test
Construction
Explanation
Determining 1) What it is one wants to know
2) For what purpose
Aspect (Questions need answered)
- Examinees
- Kind of test
- Purpose (State)
- Abilities tested
- Accuracy of results
- Importance of backwash effect
- Scope of test
- Constraints set by the unavailability of expertise, facilities, time of construction, administration, and
scoring
Planning 1) Determine the content
Aspect
- Purpose (Describe)
- Characteristics of the test takers, the nature of the population of the examinees for whom the test
is being designed
- A plan for evaluating the qualities of test usefulness (reliability, validity, authenticity, practicality
inter-activeness, and impact)
52. Stages of Test
Construction
Explanation
Planning ctd - Nature of the ability we want measured
- Identify resources
- A plan for allocation and management of resources
- Format and timing
- Criteria
- Levels of performance
- Scoring procedures
Writing Test items writers’ characteristics:
• Experienced in test construction.
• Quite knowledgeable of the content of the test.
• Have the capacity in using language clearly and economically.
• Ready to sacrifice time and energy.
Other aspects:
• Sampling : test constructors choose widely from the whole area of the course content. (Not
including EVERYTHING under course content in 1 version of test)
• Decision regarding content validity and beneficial backwash
You’ve written it well when..
(/) It is representative sample of the course material
53. Stages of Test
Construction
Explanation
Preparing You have to…
(/) Understand the major principles, techniques and experience
…before preparing test items.
AVOID preparing
• Test items which can be answered through test-wiseness.
Test wiseness : examinees utilise the characteristics and formats of the test to guess the correct answer
Reviewing Principles for reviewing test items:
• The test should not be reviewed immediately after its construction, but after some considerable
time.
• Other teachers or testers should review it. In a language test, it is preferable if native speakers are
available to review the test.
Pre-testing • The tester should administer the newly-developed test to a group of examinees similar to the target
group; PURPOSE Analyse every individual item as well as the whole test.
• Numerical data (test results) should be collected to check the efficiency of the item, it should include
item facility and discrimination.
54. Stages of Test
Construction
Explanation
Validating • Identify IF
• Item Facility (IF) shows to what extent the item is easy or difficult.
• IF= number of correct responses (Σc) / total number of candidates (N)
• And to measure item difficulty:
IF= (Σw) / (N)
The results of such equations range from 0 – 1. An item with a facility index of 0 is too difficult, and
with 1 is too easy. The ideal item is one with the value of (0.5) and the acceptability range for item
facility is between [0.37 → 0.63], i.e. less than 0.37 is difficult, and above 0.63 is easy.
Too easy/Too hard = Low reliability
55. Preparing Test Blueprint / Test Specifications
• Test specs = an outline of your test /what it will “look like” + your guiding
plan for designing an instrument that effectively fulfils your desired
principles, especially validity.
• They include the following:
a description of its content
item types (methods, such as multiple-choice, cloze, etc.)
tasks (e.g. written essay, reading a short passage, etc.)
skills to be included
how the test will be scored
how it will be reported to students
56. What is an item?
• A tool, an instrument, instruction or question used to get feedback
from test-takers
• Evidence of something that is being measured.
• Useful information for consideration in measuring or asserting a
construct measurement.
• Can be classified as a recall and thinking item.
• Recall item : item that requires one to recall in order to answer
• Thinking item : item that requires test-takers to use their thinking
skills to attempt.
57. Sequential steps in designing test specs
• A broad outline of how the test will be organised
• Which of the eight sub-skills you will test
• What the various tasks and item types will be
• How results will be scored, reported to students, and used in future class
(washback)
Remember to…
Know the purpose of the test you are creating
Know as precisely as possible what it is you want to test
Not conduct a test hastily
Examine the objectives for the unit you are testing carefully
58. Bloom’s Taxonomy (Revised)
• Def : A systematic way of describing how a learner’s performance
develops from simple to complex levels in their affective,
psychomotor and cognitive domain of learning.
65. Categories & Cognitive
Processes
Definition
Factual Knowledge The basic elements students must know to the acquainted
with a discipline or solve problems in it
Conceptual Knowledge The interrelationships among the basic elements within a
larger structure that enable them to function together
Procedural Knowledge How to do something, methods of inquiry, and criteria for
using skills, algorithms, techniques, and methods
Metacognitive Knowledge Knowledge of cognition in general as well as awareness
and knowledge of one’s own cognition
The Knowledge Domain
66. SOLO Taxonomy
• Def : (Structure of the Observed Learning Outcome) a systematic way
of describing how a learner’s performance develops from simple to
complex levels in their learning.
• There are 5 stages, namely :
Prestructural, Unistructural, Multistructural, which are in a quantitative
phrase and Relational and Extended Abstract, which are in a qualitative
phrase (Refer Figure 1.0)
• A means of classifying learning outcomes in terms of their complexity,
enabling teachers to assess students’ work in terms of its quality.
69. Functions of SOLO taxonomy
• An integrated strategy, to be used
In lesson design (learning outcomes intended)
In task guidance
In formative and summative assessment
In deconstructing exam questions to understand marks awarded
As a vehicle for self-assessment and peer-assessment
70. Advantages of SOLO taxonomy
Aspect
Structure of the taxonomy • Encourages viewing learning as an on-going process, moving from simple recall of facts
towards a deeper understanding; that learning is a series of interconnected webs that can
be built upon and extended.
• Consisting as a series of cycles (especially between the Unistructural, Multistructural and
Relational levels), which would allow for a development of breadth of knowledge as well
as depth.
In turn..
• Creating sts that are.. “self-regulating, self-evaluating learners who were well motivated
by learning.”
SOLO based techniques • Use of constructional alignment encourages teachers to be more explicit when creating
learning objectives, focusing on what the student should be able to do and at which level.
In turn..
• Sts will be able to make progress and allows for the creation of rubrics, for use in class, to
make the process explicit to the student.
It’s HOTs properties • Scaffold in depth discussion
In turn..
• Encouraging sts to develop interpretations, use research and critical thinking effectively to
develop their own answers, and write essays that engage with the critical conversation of
the field.
• May also be helpful in providing a range of techniques for differentiated learning.
71. Proponents of the SOLO taxonomy say..
• A model of learning outcomes that helps schools develop a common
understanding.
• A ‘framework for developing the quality of assessment’ and that it is
‘easily communicable to students’.
• Hattie outlines three levels of understanding: surface, deep and
conceptual. He indicates that:
“The most powerful model for understanding these three levels and
integrating them into learning intentions and success criteria is the
SOLO model.”
72. Critics of the SOLO taxonomy say…
• There is potential to misjudge the level of functioning.
• It has ‘conceptual ambiguity’; that the ‘categorisation’ is ‘unstable’.
• The structure is referred as a hierarchy, hence rise of concerns when
complex processes, such as human thought, are categorised in this
manner.
73. Guidelines for constructing test items
Guideline Elaboration
Aim of test • Developed to precisely measure the objectives prescribed by the blueprint
• Meet quality standards
Range of the topics to be
tested
Measure the test-takers’ ability or proficiency in applying the knowledge and principles on the
topics that they have learnt
Range of skills to be tested • Have cognitive characteristics exemplifying understanding, problem-solving, critical
• thinking, analysis, synthesis, evaluation and interpreting rather than just declarative
knowledge.
• (Bloom’s taxonomy as tool to use in item writing)
Test format Needs to be a logical and consistent stimulus format
Why?
For test item writers : help expedite the laborious process of writing test items as well as supply
a format for asking basic questions.
For test-takers :
• So that the questioning process in itself does not give unnecessary difficulty to answering
questions
• test takers can quickly read and understand the questions, since the format is expected
74. Guideline Elaboration
International and Cultural
Considerations (biasness)
refrain from…
the use of slang
geographic references
historical references or dates (holidays)
…that may not be understood by an international examinee.
Level of difficulty Assure that the test item…
Has a planned number of questions at each level of difficulty
Able to determine mastery and non-mastery performance states
Weak students could answer easy item
Intermediate language proficiency students could answer easy and moderate item
High language proficiency students could answer easy, moderate and advance test items
encompass all three levels of difficulties
75. Test format
• Refers to the layout of questions on a test. For example, the format of
a test could be two essay questions, 50 multiple- choice questions,
etc.
*Note : If you wish to know on the outlines of some large-scale
standardised tests, please refer to pages 64 & 65 in the PPG Module
77. Types of test items to assess language skills
Language Skills Elaboration
Listening Two kinds of listening tests:
• Tests that test specific aspects of listening, like sound discrimination
• Task based tests which test skills in accomplishing different types of listening tasks considered
important for the students being tested
Four types of listening performance from which assessment could be considered.
Intensive Listening for perception of the components (phonemes, words, intonation, discourse markers,etc) of a larger stretch of
language.
Responsive Listening to a relatively short stretch of language ( a greeting, question, command, comprehension check, etc.) in order
to make an equally short response
Selective Processing stretches of discourse such as short monologues for several minutes in order to “scan” for certain
information. For example, to listen for names, numbers, grammatical category, directions (in a map exercise), or certain
facts and events.
Extensive Listening to develop a top-down , global understanding of spoken language. For example listening to a conversation and
deriving a comprehensive message or purpose and listening for the gist and making inferences.
78. Language Skills Elaboration
Speaking Objective test : tests skills such as …
• Pronunciation
• Knowledge of what language is appropriate in different situations
• Language required in doing different things like describing, giving directions, giving instructions,
etc
Integrative task-based test : involves finding out if pupils can perform different tasks using spoken
language that is appropriate for the purpose and the context.
For example :
• Describing scenes shown in a picture
• Participating in a discussion about a given topic
• Narrating a story, etc.
CATEGORIES FOR ORAL ASSESSMENT (Refer yellow table)
79. Category Elaboration
Imitative • Ability to imitate a word or phrase or possibly a sentence/ pronunciation
• A number of prosodic (intonation, rhythm,etc.), lexical , and grammatical properties of language may be
included
Intensive • The production of short stretches of oral language designed to demonstrate competence in a narrow band of
grammatical, phrasal, lexical, or phonological relationships.
• Eg :directed response tasks (requests for specific production of speech), reading aloud, sentence and dialogue
completion, limited picture-cued tasks including simple sentences, and translation up to the simple sentence
level.
Responsive • Interaction and test comprehension but at somewhat limited level of very short conversation, standard
greetings, and small talk, simple requests and comments.
• The stimulus is almost always a spoken prompt (to preserve authenticity) with one or two follow-up questions or
retorts
Interactive • Increased length + complexity from responsive.
• May include multiple exchanges and/or multiple participants.
• Two types : (a) transactional language, which has the purpose of exchanging specific information, and (b)
interpersonal exchanges, which have the purpose of maintaining social relationships.
Extensive • Speeches, oral presentations, and storytelling, during which the opportunity for oral interaction from listeners is
either highly limited (perhaps to nonverbal responses) or ruled out together.
• Language style is more deliberative (planning is involved)
• May include informal monologue such as casually delivered speech (e.g., recalling a vacation in the mountains,
80. Language Skills Elaboration
Reading
Meaning conveyed through reading text
Type Elaboration
Skimming Inspect lengthy passage rapidly
Scanning Locate specific information within a short
period of time
Receptive/ Intensive A form of reading aimed at discovering exactly
what the author seeks to convey
Responsive Respond to some point in a reading text
through writing or by answering questions
81. Meaning conveyed through reading text
Grammatical meaning Meanings that are expressed through
linguistic structures such as complex and
simple sentences and the correct
interpretation of those structures.
Informational meaning The concept or messages contained in the
text. May be assessed through various means
such as summary and précis writing.
Discourse meaning The perception of rhetorical functions
conveyed by the text.
Writer’s tone The writer’s tone – whether it is cynical,
sarcastic, sad or etc
82. Language Skills Elaboration
Writing
Imitative • The ability to spell correctly and to perceive phoneme-grapheme correspondences in the English spelling
system
• The mechanics of writing
• Form is the primary focus while context and meaning are of secondary concern.
Intensive
(controlled
)
• Producing appropriate vocabulary within a context, collocation and idioms, and correct grammatical features
up to the length of a sentence.
Responsive • Perform at a limited discourse level, connecting sentences into a paragraph and creating a logically connected
sequence of two or three paragraphs.
• Tasks relate to pedagogical directives, lists of criteria, outlines, and other guidelines.
• Eg : brief narratives and descriptions, short reports, lab reports, summaries, brief responses to reading, and
interpretations of charts and graphs.
• Form-focused attention is mostly at the discourse level, with a strong emphasis on context and meaning.
Extensive • Implies successful management of all the processes and strategies of writing for all purposes, up to the length
of eg : an essay,
• Focus is on achieving a purpose, organizing and developing ideas logically, using details to support or
illustrate ideas, demonstrating syntactic and lexical variety and engaging in the process of multiple drafts to
achieve a final product.
• Focus on grammatical form is limited to occasional editing and proofreading of a draft
83. Brown’s (Assessing Skills)
Skill Type Test item
Listening Intensive Listening • Recognizing phonological and morphological elements
• Paraphrase recognition
Responsive Listening • Responding to a stimulus; conversation, requests
Selective Listening • Listening cloze
• Information transfer
• Sentence repetition
Extensive Listening • Dictation
• Communicative stimulus-response tasks
• Authentic listening tasks
Speaking Intensive Speaking • Directed response tasks
• Read-Aloud tasks
• Sentence/dialogue completion tasks and oral questionnaires
• Picture-cued tasks
Responsive Speaking • Q & A
• Giving instructions and directions
• Paraphrasing
Interactive Speaking • Interview
• Role-play
• Discussions and conversations
• Games
Extensive speaking • Oral presentations
• Picture-cued storytelling
• Retelling a story, news event
84. Skill Type Test item
Reading Perceptive reading • Reading aloud
• Written response
• Multiple-choice
• Picture-cued items
Selective reading • Matching tasks
• Editing tasks
• Picture-cued tasks
• Gap-filling tasks
Interactive reading • Cloze tasks
• Impromptu reading + comprehension questions
• Short answer tasks
• Editing longer texts
• Scanning
• Ordering tasks
• Information transfer; reading charts, maps, graphs, diagrams
Extensive reading • Skimming tasks
• Summarizing and responding
• Notetaking and outlining
Writing Imitative writing • Writing letters, words and punctuation
• Spelling tasks and detecting phoneme – grapheme correspondences
Intensive (Controlled) writing • Dictation and dicto-comp
• Grammatical transformation tasks
• Picture-cued tasks
• Vocabulary assessment tasks
• Ordering tasks
• Short answer and sentence completion tasks
85. Skill Type • Test item
Writing Responsive and extensive writing • Paraphrasing
• Guided Q & A
• Paragraph constructions tasks
• Strategic options
• Standardized tests of responsive writing
Grammar &
Vocabulary
Selected response • Multiple-choice tasks
• Discrimination tasks
• Noticing tasks or consciousness-raising tasks
Limited production • Gap-filling tasks
• Short-answer tasks
• Dialogue-completion tasks
Extended production • Information gap tasks
• Role-play or simulation tasks
86. Objective and Subjective Test
Objective test • Tests that are graded objectively
• Include the multiple choice test, true false items
and matching items
• Similar to select type tests where students are
expected to select or choose the answer from a list
of options
Subjective test • Involve subjectivity in grading
• Include essays and short answer questions
• Similar to supply type as the students are expected
to supply the answer through their essay
Subjective + objective • Dictation test, filling in the blank type tests, as well
as interviews and role plays
87. Type of test : according to how students are
expected to respond
Selected response:
Do not create any language but rather
select the answer from a given list
Constructed response:
Produce language by writing, speaking,
or doing something else
Personal response:
Produce language but also allows each
students’ response to be different from
one another and for students to
“communicate what they want to
communicate”
True false Fill-in Conferences
Matching Short answer Portfolios
Multiple choice Performance test
Self and peer
assessments
89. Discrete Integrative
Language is seen to be
made up of smaller units
and it may be possible to
test language by testing
each unit at a time
Language is that of an
integrated whole which
cannot be broken up into
smaller units or elements
90. Communicative test
• Sts have to produce the language in an interactive setting involving
some degree of unpredictability which is typical of any language
interaction situation.
91. The three principles of communicative tests are :
• involve performance;
• are authentic; and
• are scored on real-life outcomes
92. Limitation in applying the communicative test
• Issues of practicality, involving especially the amount of time and
extent of organisation to allow for such communicative elements to
emerge.
Advantages in applying the communicative
test
• Have valid language that are purposeful and can stimulate positive
washback in teaching and learning.
94. Scoring approaches
Objective • Relies on quantified methods of evaluating
students’ writing
Holistic • The reader (examiner) reacts to the students’
compositions as a whole and a single score is
awarded to the writing
• Each score on the scale will be accompanied with
general descriptors of ability
• Related : Primary trait scoring
Analytical • Raters assess students’ performance on a variety of
categories which are hypothesised to make up the
skill of writing
95. Comparison between approaches
Scoring Approach Advantages Disadvantages
Holistic
Quickly graded
Provide a public standard that is
understood by the teachers and students
alike
Relatively higher degree of rater reliability
Applicable to the assessment of many
different topics
Emphasise the students’ strengths rather
than their weaknesses.
The single score may actually mask differences
across individual compositions.
Does not provide a lot of diagnostic feedback
Analytical
It provides clear guidelines in grading in the
form of the various components.
Allows the graders to consciously address
important aspects of writing.
Writing ability is unnaturally split up into
components.
Objective:
Emphasises the students’ strengths rather
than their weaknesses.
Still some degree of subjectivity involved.
Accentuates negative aspects of the learner’s
writing without giving credit for what they can
do well.
96. Questions you can attempt..
• Describe with examples how holistic and analytical rubrics can be
used to assess Year 6 pupils’ writing based on the following skill
- Write simple factual descriptions of things, events, scenes and what
one saw and did.
- Characteristics of each approach
98. Purposes of reporting
• Main purpose of tests is to obtain information concerning a particular
behaviour or characteristic.
• Evaluate the effectiveness of one’s own teaching or instructional
approach and implement the necessary changes
• Based on information obtained from tests, several different types of
decisions can be made.
99.
100. Reporting methods
Norm - Referenced Assessment and Reporting Assessing and reporting a student's achievement and
progress in comparison to other students.
Criterion - Referenced Assessment and Reporting Assessing and reporting a student's achievement and
progress in comparison to predetermined criteria.
An outcomes-approach to assessment will provide
information about student achievement to enable
reporting against a standards framework.
An outcomes-approach Acknowledges that students, regardless of their class
or grade, can be working towards syllabus outcomes
anywhere along the learning continuum.
101. Principles of effective and informative
assessment and reporting
Has clear, direct links with outcomes
Is integral to teaching and learning
Is balanced, comprehensive and varied
Is valid
Is fair
Engages the learner
Values teacher judgement
Is time efficient and manageable
Recognises individual achievement and progress
Involves a whole school approach
Actively involves parents
Conveys meaningful and useful information
103. Components of PBS
School assessment Refers to written tests that assess subject learning. The test questions and marking
schemes are developed,
administered, scored, and reported by school teachers based on guidance from LP.
Central assessment Refers to written tests, project work, or oral tests (for languages) that assess subject
learning. LP develops the test questions and marking schemes. The tests are, however,
administered and marked by school teachers
Psychometric assessment Refers to aptitude tests and a personality inventory to assess students’ skills, interests,
aptitude, attitude and personality. Aptitude tests are used to assess students’
innate and acquired abilities, for example in thinking and problem solving. The personality
inventory is used to identify key traits and characteristics that make up the students’
personality. LP develops these instruments and provides guidelines for use.
Physical, sports, and co-
curricular activities assessment
Refers to assessments of student performance and participation in physical and health
education, sports, uniformed bodies, clubs, and other non-school sponsored activities
104. Benefits of PBS
• enables students to be assessed on a broader range of output over a
longer period of time.
• Provides teachers with more regular information to take the
appropriate remedial actions for their students.
• Will hopefully reduce the overall emphasis on teaching totest, so that
teachers can focus more time on delivering meaningful learning as
stipulated in the curriculum.