19. Jun 2013

Más contenido relacionado



  1. Summer School 2013
  2. • In today’s language classrooms, the term assessment usually evokes images of an end-of- course paper-pencil test designed to tell both teachers and students how much material the student doesn’t know or hasn’t yet mastered • It includes a broad range of activities and tasks that teachers use to evaluate student’s progress and growth on a daily basis.
  3. To make use of evaluation, assessment and test procedures more effective it is necessary to clarify what these concepts are and to explain how they differ from one another It is all-inclusive and it is the widest basis for collecting information in education. It involves looking at all factors that influence the learning process: syllabus, objectives, course design, and materials. Test is a subcategory of assessment, it is a formal systematic procedure used to gather information about student progress. Assessment is part of evaluation because it is concerned with the student and with what the student does. It refers to the variety of ways of collecting information on a learner’s language ability or achievement.
  4. The most common use of language tests is to identify strengths and weaknesses in student’s abilities Information gleaned from tests also assist us in deciding who should be allowed to participate in a particular course or program area. Another common use of tests is to provide information about effectiveness of programs instructions
  5. • They asses student’s level of language abilities so they can be placed in an appropriate course or class. This type of test indicated the level at which a student will learn most effectively. The primary aim is to create groups of learners that are homogeneous in level • They measures capacity or general ability to learn a foreign language. (Although not commonly used these days) • They identify language area in which student needs further help. The information gained from diagnostic tests are crucial for further course activities and providing students with remediation.
  6. • They measures the progress that students are making toward defined course or program goals. Progress tests are generally teacher produced because they cover less material and assess fewer objectives • They are similar to progress tests. They are usually administrated at the mid- and end- point of the semester or academic year. • The content is generally based on the specific course content or on the course objectives. • They assess the overall language ability of students at varying levels. • They tell us how capable a person is in a particular language skill area.
  7. • Objective versus subjective tests- sometimes tests are distinguished by the manner in which they are scored by comparing a student’s responses with an established set of acceptable/correct responses on an answer key. With objectively scored tests, the scorer does not require particular knowledge or training in the examined area • In contrast, a subjective test, such as writing an essay, requires scoring by opinion or personal judgment so the human element is very important. • Even experienced scorer need moderated training sessions to ensure inter-rater reliability
  8. Criterion referenced tests versus Standardized tests- • Criterion referenced tests are usually developed to measure mastery of well-defined instructional objectives specific for a particular course or program. Their propose is to measure how much learning has occurred. Students performance is compared only to the amount or percentage of material learned. • Standardized tests are designed to measure global language abilities. Students’ scores are interpreted relative to all other students who take the exam. Their purpose is to spread students out along a continuum of scores so that those with low abilities in a certain skill are at one end of the normal distribution and those with high scores are at the other end, with the majority of the students falling between extremes.
  9. Summative versus formative tests- • Tests or tasks administered at the end of the course to determine if students have achieved the objectives set out in the curriculum are called summative assessments. they are often used to decide which students move on to a higher level • Formative assessments however, are carried out with the aim of using the results to improve instruction, so they are given during course and feedback is provided to students. High-stakes versus Low-stakes tests- • High-stakes tests are those in which the results are likely to have major impact on the lives of large number individuals or an large programs. • Low-stakes tests are those in which the results have relatively minor impact on the lives of the individual or on small programs. In class progress tests or short quizzes are examples of low-stakes tests
  10. practicality reliability validity authenticity washback
  11. • Designed items • Subjective test • Rating • Test item itself • Conditions of administration (noise, light, temperat ure, desks, chairs) • Human error • Subjectivity • Bias toward “good” and “bad students” • Inexperience • Inattention • Temporary illness, fatigue, an xiety (other physical, psycholo gical factors) Student- related Reliability Rater Reliability Test reliability Test Administr ation Reliability
  12. Validity - Measures exactly what is proposed to measure. - Involves performance that samples the test the test’s criterion. - Offers useful, meaningful information about a test-taker’s abilities. - Is supported by an argument. Criterion- related validityConstruct- related validity Consequential validity (Impact) Content-related validity
  13. Introduction • Objectives will include 4 distinct components: Audience, Behavior, Condition and Degree. • Objectives must be both observable and measurable to be effective. • Use of words like understand and learn in writing objectives are generally not acceptable as they are difficult to measure. • Written objectives are a vital part of instructional design because they provide the roadmap for designing and delivering curriculum. • Throughout the design and development of curriculum, a comparison of the content to be delivered should be made to the objectives identified for the program. This process, called performance agreement, ensures that the final product meets the overall goal of instruction identified in the first level objectives.
  14. - Describe the intended learner or end user of the instruction - Often the audience is identified only in the 1st level of objective because of redundancy Describes learner capability Must be observable and measurable (you will define the measurement elsewhere in the goal) If it is a skill, it should be a real world skill The “behavior” can include demonstration of knowledge or skills in any of the domains of learning: cognitive, psychomotor, affective, or interpersonal - Equipment or tools that may (or may not) be utilized in completion of the behavior - Environmental conditions may also be included - States the standard for acceptable performance (time, accuracy, proportion, quality, etc)
  15. The common mistakes have been grouped into four categories as follows: - General examination characteristics. - Item characteristics. - Test validity concerns. - Administrative and scoring issues.
  16. General Examination Characteristics Item characteristics Test-validity concerns Administrative and scoring issue: Lack of cheating control Inadequate instruction Administrative inequities Lack of piloting Subjectivity of scoring • Too difficult or too easy • Insufficient nr of items • Redundancy of test type • Lack of confidence measure • Negative wash back through non- occurrent forms • Tricky questions • Redundant wording • Divergence cues • Convergence cues • Option number • Mixed content • Wrong medium • Common knowledge • Syllabus mismatch • Content matching
  17. Tradition assessment  Pencil-and-paper test.  Answer the question  Choose or produce a correct grammatical form or vocabulary item.  Good to check reading and listening comprehension ability Alternative assessment • Reveal what students can do with language • It is scored differently • Students can evaluate their own learning and learn from the evaluation process • Gives instructors a way to connect assessment with review of learning strategies
  18. - They are build around the topics of the interest to the students - They replicate real-world communication context and situations -They require students to produce a quality product or performance -The evaluation criteria and standards are known to the student - They involve multi-stage tasks and real problems that require creative use of language rather than simple repetition -They involve interaction between assessor and person assessed They allow for self-evaluation
  19. Rubrics- provide measurement of quality of performance on the basis of established criteria. There are four main types of rubrics: • Holistic rubrics • Analytic rubrics • Primary trait rubrics • Multi-trait rubrics
  20. In holistic evaluation, raters make judgments by forming an overall impression of a performance and matching it to the best fit from among the descriptions on the scale. • They are often written generically and can be used with many tasks. • They emphasize what learners can do, rather than what they cannot do. • They save time by minimizing the number of decisions raters must make. • Trained raters tend to apply them consistently, resulting in more reliable measurement. • They are easily understood by younger learners. • They do not provide specific feedback to test takers about the strengths and weaknesses of their performance. • Performances may meet criteria in two or more categories, making it difficult to select the one best description. (If this occurs frequently, the rubric may be poorly written.)
  21. Analytic scales are usually associated with generic rubrics and tend to focus on broad dimensions of writing or speaking performance. These dimensions may be the same as those found in a generic, holistic scale, but they are presented in separate categories and rated individually. Points may be assigned for performance on each of the dimensions and a total score calculated. • They provide useful feedback to learners on areas of strength and weakness. • Their dimensions can be weighted to reflect relative importance. • They can show learners that they have made progress over time in some or all dimensions when the same rubric categories are used repeatedly • They take more time to create and use.
  22. primary trait scoring would be strictly classified as task-specific, and performance would be evaluated on only one trait, such as the "Persuading an audience Ex. Primary Trait: Persuading an audience 0 Fails to persuade the audience. 1 Attempts to persuade but does not provide sufficient support. 2 Presents a somewhat persuasive argument but without consistent development and support 3 Develops a persuasive argument that is well developed and supported.
  23. multiple trait scoring rubrics are based on the concepts of primary trait scoring, to provide diagnostic feedback to learners about performance on "context-appropriate and task-appropriate criteria" for a specified topic. • The rubrics are aligned with the task and curriculum. • Aligned and well-written primary and multiple trait rubrics can ensure construct and content validity of criterion-referenced assessments. • Feedback is focused on one or more dimensions that are important in the current learning context. • With a multiple trait rubric, learners receive information about their strengths and weaknesses. • Primary and multiple trait rubrics are generally written in language that students understand. • Teachers are able to rate performances quickly. • Many rubrics of this type have been developed by teachers who are willing to share them online, at conferences, and in materials available for purchase.