1. TOPIC 1 – OVERVIEW OF ASSESSMENT (CONTEXT, ISSUES AND TRENDS)
DEFINITIONS OF TERMS
1. Test A subset of ASSESSMENT intended to measure a test-takers language proficiency, knowledge, performance or skills. A technique of assessment.
2. Assessment A comprehensive process of planning, collecting, analyzing, reporting and using information on students over time. Appraising the level of magnitude of a person’s attributes. Consists of tests, projects, portfolios, anecdotal information and student self- reflection.
3. Evaluation Evaluation ≠ testing Involves the interpretation of information. Involves conveying the marks or test results into performance report for test- takers.
4. Measurement Assigning numbers or values to certain attributes of objects, events or people according to a rule-governed system. Measurement must be conducted according to explicit rules and procedures e.g. test specifications, criteria and marking procedures. Process of quantifying the observable performance of classroom learners. Test scores are measurements. Conveying the meaning of those scores is evaluation. Evaluation, however can also take place WITHOUT measurement (e.g. responses in classroom by learners)
Trends and Issues – read Module.
TOPIC 2 – ROLE AND PURPOSES OF ASSESSMENT IN TEACHING AND LEARNING
REASONS/PURPOSE OF ASSESSMENT To determine the effective teaching strategies to be used in the classroom. To improve classroom practice and instructions. To provide information to children, parents and administrators. To measure students’ achievement – identify students’ strengths and weaknesses. Identify the difficult topic in a given unit – need to reteach. Used to measure proficiency, place students into levels of course, diagnose students’ strengths and weaknesses.
ASSESSMENT OF LEARNING @ SUMMATIVE LEARNING A summary of learning. Measure, record and report on student’s level of achievement in regards to specific learning expectations. Tells teacher the current status of the students’ learning. Provide focus to improve on students’ achievement.
ASSESSMENT FOR LEARNING @ FORMATIVE ASSESSMENT/DIAGNOSTIC ASSESSMENTS Conducted to improve students learning in teaching and learning process. Find out alternative teaching strategies Find out students’ understanding of the instructions.
ASSESSMENT TYPES FORMAL INFORMAL Public exams (UPSR, PMR, SPM, STPM) Year-end exams / semester exams. Monthly test Projects Question and Answer Quizzes Students’ self-reflection
2. SIX TYPES OF INFORMATION TESTS PROVIDE ABOUT STUDENTS (HENNING, 1987)
Diagnosis and feedback
Screening and selection
Placement
Program evaluation
Providing research criteria
Assessment of attitudes and socio-psychological differences
CLASSIFICATION OF TESTS (ALDERSON, CLAPHAM & WALL, 1995) Proficiency Tests
Not based on a particular curriculum or language program.
Assess overall language ability of students at varying levels.
Describes what students are capable of doing in a language.
E.g. ETS, Cambridge ESOL, American TOEFL, IELTS. Achievement Tests
Similar to progress tests
Administered at mid-and-end point of the semester or academic year.
Content of achievement tests is generally based on the specific course content. Diagnostic Tests
Seek to identify those language areas in which students need further help.
Information gained is crucial for further course activities and providing students with remediation. Aptitude Tests
Measure general ability or capacity to learn a foreign language a priori.
Predicts success in that undertaking. Progress Test
Measure the progress the students are making towards defined course or program goals.
Teacher-produced, narrowed focus.
Cover smaller amount of material and assess fewer objectives.
Placement Tests
Assess students’ level of language ability for placement in an appropriate course or class.
Indicates the level at which a student will learn most effectively.
Create groups homogenous in level.
3. TOPIC 3 – BASIC TERMINOLOGY IN TESTING
NORM-REFERENCE TEST
In NRTs individual test-taker’s score is interpreted in relation to
o a mean (average score),
o median (middle score),
o standard deviation (extent on variance in scores)
o and/or percentile rank
Purpose: to place test-takers along a mathematical continuum in rank order.
Scores commonly reported back to test-taker in the form of a numerical score.
Administered to compare an individual performance with his peers or a group/
In SBE, NRTs are used for summative evaluation e.g. year-end examination for streaming and selection. CRITERION-REFERENCE TEST (CRT)
A collection of information about student progress or achievement in relation to a specified criterion.
Standards serve as the criteria or yardstick for measurement.
Provide:
o Feedback to test-takers mostly in the form of grades, on specific course or lesson objectives.
o Information on students’ mastery based on outcomes or objectives as specified in the syllabus.
Advantage: allows test-takers to make inferences about how much language proficiency they originally had and their successive gains over time.
NRT CRT Definition/Purpose Test that measures student achievement as compared to others. Determine performance difference among individual and groups. Approach that provides information on student’s mastery based on a specified criterion Determine learning mastery based on specified criterion. Test item / Frequency Easy to difficult – discriminates examinees’ ability. Continuous assessment in classroom. Guided by minimum achievement in the related objectives. Continuous assessment. Appropriateness / Examples Summative evaluation UPSR, PMR, SPM, STPM Formative evaluation. Monthly test, coursework, project. OBJECTIVE TEST
A test that consist of right or wrong answers or responses.
Can be marked objectively.
Popular because easy to prepare and quick to mark, providing a quantifiable and concrete result.
Focus more on specific facts than general ideas and concepts.
Types:
o MCQ
o True/false items
o Matching items
o Fill-in the blanks items.
MCQ
o Limit beneficial washback.
o Enable cheating
o Challenging to write successful items.
o Strictly limits what can be tested.
o Test only recognition knowledge.
o Encourage guessing.
SUBJECTIVE TEST
Evaluated by giving an opinion based on agreed criteria.
Include essay, short-answer questions, vocabulary tests, and take-home tests.
Provides more opportunity to test-takes to show/demonstrate their understanding and in depth knowledge and skills in the subject matter.
Test takers might provide some acceptable answers that might not be predicted.
Enable students to be more creative and critical.
E.g. extended-response items, restricted-response items, essay.
4. TOPIC 4 – BASIC PRINCIPLES OF ASSESSMENT RELIABILITY
The degree to which an assessment tool produces stable and consistent results.
A reliable test is stable, consistent and dependable.
Same test, given to same test-takers on two circumstances, should yield same results.
Lack of reliability threats the item’s validity.
A reliable test is
o Consistent in its conditions across two or more administrations
o Gives clear directions for scoring / evaluation
o Has uniform rubrics for scoring / evaluation.
o Unambiguous to test-takers.
Two types of reliability o Rater reliability (markers) - Inter-rater reliability – degree of similarity between 2 tester without influencing one another - Intra-rater reliability – consistency within the raters.
o Test administration reliability
- Deals with interference and conditions during the test administration.
- Outside interference – noise, temperature, variations in photocopying, lighting, conditions of chairs and tables. Factors affecting the reliability of a test:
o Test length factor
- Longer tests are more reliable – avoids guessing.
o Teacher – student factor
- Teacher’s encouragement.
- Familiarity of test formats.
o Environmental factors
- Lighting
- Ventilation
- Comfortable setting
o Test administration factor
- Clear instruction
- Enough time
o Marking factors
- Markers’ objectivity and perception of the task of marking itself.
VALIDITY
Whether the assessment really measures what it claims to measure.
Types of validity Face validity A test should look right (students take it seriously) Should measure the knowledge or skills it is made to measure. Affects how the students see and respond to the test (serious or otherwise) Content validity Are all the content learnt presented in the test? Can use the table of test specification o Skills o Level o Topics with all the items Construct validity Does the test measure what it claims to? o Linguistic construct – fluency, proficiency. o Oral test – speed and rhythm. Concurrent validity The use of another reputable test and compare with test set – measure students’ performance. Predictive validity Predict result on a public exam from the trial paper Pre-SPM to SPM High predictive validity = predictable results in latter measure.
PRACTICALITY
Is the assessment tool practical to be used, and could be set up within possible means?
If carried out, would the effort of setting it up worth the objectives of having the test in the first place?
Could the results be easily interpreted? OBJECTIVITY
Refers to the ability of the examiners to be objective and non-partisan or non-bias.
Scores must be awarded not under the influence of examiners’ emotions or opinions or skills.
5. WASHBACK EFFECT
Impact that tests have on teaching and learning.
Students’ mistakes can indicate areas of improvement.
Compliment students’ achievements.
More important tests have greater washback effects. AUTHENTIC
Assessment should reflect the actual feature of the target language task.
Students’ are motivate when tasks involve real world situations and contexts. INTERPRETABILITY
Involves how meaning is assigned to the scores.
Take into consideration these factors:
o Reliability.
o Validity
o Scores, norms and related technical features.
o Administration and scoring variation.
TOPIC 5 – DESIGNING CLASSROOM LANGUAGE TEST
Stages of Test Planning Determining Who is to be tested? What is the purpose of testing What is to be tested? What is the scope of the test? How detailed and accurate must the results be? Planning Set a specification for the test (Table of Specifications etc.) Includes information on content, format and timing, criteria, level of performance and determine them. Writing Construct test items. Collaborate with colleagues to find faults in the test paper. Preparing Prepare the test items based on principles and techniques. Reviewing Should not be reviewed straight away after construction. Other teachers / native speakers need to review the test paper prepared. Pre-testing Test on a similar target group to analyse individual items as well as whole test. Validating Validate the test using formula: 퐼퐹 (푖푡푒푚 푓푎푐푖푙푖푡푦)= Σ푐 (푛표.표푓 푐표푟푟푒푐푡 푎푛푠푤푒푟푠) 푁(푛표.표푓 푠푡푢푑푒푛푡푠) 퐼퐹 (푖푡푒푚 푑푖푓푓푖푐푢푙푡푦)= Σ푤 (푛표.표푓 푤푟표푛푔 푎푛푤푒푟푠) 푁 (푛표.표푓 푠푡푢푑푒푛푡푠) 0 to 1 (<0.37 – difficult, >0.63 – easy)
6. PREPARING A TEST SPECIFICATION / BLUEPRINT
Outlines the test construct and how it would look like.
Covers:
o Content description
o Item types (MCQ, cloze, etc.)
o Tasks (essay, short passage)
o Skills included
o Scoring
o Reporting
BLOOM’S TAXONOMY Cognitive Dimension Definition Key-words Remembering Recalling information Retrieving knowledge from long term memory Recognizing Naming Listing Describing Understanding Explaining ideas or concepts Construct meaning from instructional messages. Interpreting Summarizing Classifying Explaining Applying Using information in another familiar situation. Apply a procedure to a familiar task. Implementing Carrying out Executing Analyzing Break materials into parts to explore understandings and relationships. Comparing Organizing Deconstructing Evaluating Justifying a decision or making judgments based on criteria and standards. Checking Hypothesizing Critiquing Experimenting Evaluating Creating Generating new ideas, putting elements together to form coherent whole or new pattern. Designing Constructing Planning
SOLO (STRUCTURE OF THE OBSERVED LEARNING OUTCOME) TAXONOMY
Describes how a learner’s performance develops from simple to complex levels.
5 stages:
o Quantitative phase
Pre-structural
Uni-structural
Multi-structural
o Qualitative phase
Relational
Extended abstract Stages Descriptions Sample Verbs Prestructural Incompetence Fail Incompetent Misses the point Unistructural One relevant aspect Identify Name Follow simple procedure Multistructural Several relevant independent aspects. Combine Describe Enumerate Perform serial skills List Relational Integrated into a structure Analyse Apply Argue / compare Criticize Relate Extended abstract Generalized into a new domain Create Formulate Generate Hypothesize
7. GUIDELINES FOR CONSTRUCTING TEST ITEMS
Building test items must take into consideration following items:
Aims of the test
Range of the topics to be tested
Range of skills to be tested.
Test format
Level of difficulty
International and cultural considerations
TEST FORMATS
Depends on the test itself
UPSR - paper 1 and paper 2
TOEFL – internet based test and paper based test
IELTS – four language skills
MUET – similar to IELTS
TOPIC 6 – ASSESSING LANGUAGE SKILLS CONTENT
8. OBJECTIVE AND SUBJECTIVE TEST
Select type tests – objective (MCQ, TRUE/FALSE)
Supply type tests – subjective (ESSAY, DICTATION)
Types of tests according to students’ expected response o Selected response
True false
Matching
Multiple choice o Constructed response
Fill in
Short answer
Performance test o Personal response
Conferences
Portfolios
Self and peer assessments
TYPES OF TEST ITEMS TO ASSESS LANGUAGE CONTENT Discrete Point Test
o Examine one element at a time
o Language is seen to be made up of smaller units and it may be possible to test language by testing each unit at a time.
o Example – MCQ Integrative Test
o Requires the candidates to combine many language elements in the completion of a task.
o A simultaneous measure of knowledge and ability of a variety of language features, modes or skills.
o Example – essays Communicative Test
o Testing involves many aspects that revolves around communicative elements and meaningful content.
o Involves performance, authenticity and scored on real-life outcomes (usually behavioral)
TOPIC 7 – SCORING, GRADING AND ASSESSMENT CRITERIA
APPROACHES TO SCORING 1. OBJECTIVE APPROACH / METHOD
Relies on quantified methods of evaluating students’ writing
o Establish standardization – limit length of assessment (word count)
o Identify elements to be assessed – identify mistakes (spelling error, grammar mistakes, vocabulary, etc)
o Operationalize the assessment – assign a weight score to each error, according to distortion of readability or flow error.
o Quantify the assessment using Correctness score:
o 푤표푟푑 푙푖푚푖푡 푠푢푚 표푓 푒푟푟표푟 푠푐표푟푒푠
2. HOLISTIC APPROACH
Reader reacts to the students’ compositions as a whole and a single score is awarded to the writing.
Each score will be accompanied with a general descriptor of abilities (rubric)
Primary trait scoring – particular functional focus is selected which is based on the purpose of the writing and grading is based on how well the student is able to express that function.
Emphasizes functional and communicative ability rather than ability and accuracy.
3. ANALYTIC APPROACH
A familiar approach for many evaluators.
Raters assess students’ performance on a variety of categories which are hypothesized to make up the skill of writing
Example:
Components
Weight
Content
30%
Organization
40%
Vocabulary
30%
Points assigned to each component reflects the importance of the components in question.
9. COMPARING THE THREE APPROACHES
Approach
Advantages
Disadvantages
Holistic
Quickly graded
Provide public standard that is understood by the teachers and students alike
Relatively higher degree of rater reliability
Applicable to assessment of many different topics
Emphasize students’ strengths rather than weaknesses.
Single score may actually mask differences across individual compositions
Does not provide a lot of diagnostic feedback.
Analytic
It provides clear guidelines in grading in form of the various components
Allows graders to consciously address important aspects of writing
Writing ability is unnaturally split up into components.
Objective
Emphasizes the students’ strengths rather than their weaknesses.
Still some degree of subjectivity involved.
Accentuates negative aspects of learners’ writing without giving credit for what they can do well.
TOPIC 9 – REPORTING OF ASSESSMENT DATA
PURPOSES OF REPORTING
To obtain information concerning a particular behaviour or characteristic.
Based on those information, several types of decisions can be made.
o Instructional decisions
o Grading decisions
o Diagnostic decisions
o Selection decisions
o Placement decisions
o Counselling & guidance decisions
o Program or curriculum decisions
o Administrative decisions REPORTING METHODS
Student achievement and progress can be reported by comparing:
Norm-referenced assessment and reporting
o Assessing and reporting students’ achievement and progress in comparison to other students.
Criterion-referenced assessment and reporting
o In comparison to a predetermined criteria.
Outcome-based approach
o Provide information that students can be working towards syllabus outcomes anywhere along the continuum of learning. PRINCIPLES OF EFFECTIVE AND INFORMATIVE ASSESSMENT AND REPORTING
Has clear, direct links with outcomes
Is integral to teaching and learning
Is balanced, comprehensive and varied.
Is valid.
Is fair.
Engages the learner.
Values teacher judgment.
Time efficient and manageable
10. Recognizes individual achievement and progress
Involves a whole school approach
Actively involves parents
Conveys meaningful and useful information.
TOPIC 10 – ISSUES AND CONCERNS RELATED TO ASSESSMENT IN MALAYSIAN PRIMARY SCHOOLS
EXAM ORIENTED SYSTEM
Current education system is too examination oriented and over-emphasized rote learning with institutions of higher learning fast becoming diploma mills.
Too focused on public examinations results as important determinants of students’ progression to higher levels of educations or occupational opportunities.
PTS (defunct) > UPSR > PMR > SPM/STAM /STPM
Summative national examinations itself should not have any negative impacts on students – the issue is that they do not currently test the full range of skills that the education system aspires to produce.
LP has started some reforms – PBS (SCHOOL BASED ASSESSMENT)
o School assessment – assesses students learning. Test questions and marking schemes are developed, administered, scored and reported by school teachers based on LP guidance.
o Central assessment – developed wholly by LP itself.
o Psychometric assessment – aptitude tests and personality inventory to assess students’ skills, interests, aptitude and attitude and personality. Assess students’ innate ability and acquired abilities.
o Physical, sports and co-curricular activities assessment – refers to assessments of student performance and participation in physical and health education, sports, uniformed bodies, etc.
o This new format enables broader assessment range over longer period of time.
o Teacher can focus more on delivering meaningful learning as stipulated in the curriculum.
SCHOOL-BASED ASSESSMENT
Traditional assessment methods no longer adequate in today’s setting.
School-based assessment have immense potential in terms of flexibility and validity, but have potential drawback in terms of reliability, quality control, and assurance.
Recommended because of the gains in validity which can be seen when students’ performance on assessed tasks can be judged in a greater range and context and more than is possible in time limited tests.
11. Malaysian SBA context
o Standard –referenced assessment
o Holistic
o Integrated
o Balanced
o Robust
Components
o Academic
School assessment (Performance Standards)
Centralized Assessment
o Non-academic
Physical activities, co-curricular,
Psychometric tests
ALTERNATIVE ASSESSMENTS
Procedures that differ from traditional notions.
PORTFOLIOS
Contain:
o Introductory section
o Academic works section
o Personal section
o Assessment section
SELF ASSESSMENT AND PEER ASSESSMENT
Remind learners that they are not working in isolation.
Help create a community of learners
Improve the product
Improve the process
Help learners to be reflective
Stimulate metacognition
TOPIC 8 – ITEM ANALYSIS AND INTERPRETATION