Você está na página 1de 6

http://taesig.8m.com/createxiii.

html

Glossary Of Important Testing Terms


Achievement test: measures what a learner knows from what he/she has been taught; this type of test is typically given by the teacher at a particular time throughout the course covering a certain amount of material. Alternative assessment: refers to a non-conventional way of evaluating what students know and can do with the language; it is informal and usually administered in class; examples of this type of assessment include self-assessment and portfolio assessment. Analytical scale: a type of rating scale that requires teachers to allot separate ratings for the different components of language ability i.e. content, grammar, vocabulary etc. Authenticity: refers to evaluation based mainly on real-life experiences; students show what they have learned by performing tasks similar to those required in real-life contexts. back to top Banding scale: a type of holistic scale that measures language competence via descriptors of language ability; an example of this is the IELTS bands from UCLES. back to top Computer-based testing (CBT): is programmed and then administered to students on computer; question formats are frequently objective, discrete-point items; these tests are subsequently scored electronically. Computer-adaptive testing (CAT): presents language items to the learner via computer; subsequent questions on the exam are "adapted" based on a student's response(s) to a previous question(s). Content validity: this type of validity refers to testing what you teach how you teach it; i.e. testing content covered in some way in the course materials using formats that are familiar to the student. Cornerstones of good testing practice: the guidelines of effective test writers; they include the concepts of validity, reliability, practicality, transparency, authenticity, security and washback. Construct validity: refers to the fit between the theoretical and methodological approaches used in a program and the assessment instruments administered. Criterion-referenced test: compares students' performance to particular outcomes or expectations. back to top Descriptive statistics: describe the population taking the test; the most common descriptive statistics include mean, mode, medium, standard deviation and range; they are also known as the measures of central tendency. Diagnostic test: a type of formative evaluation that attempts to diagnose students' strengths and weaknesses vis a vis the course materials; students receive no grades on diagnostic instruments. Discrete-point test: an objective test that measures the students' ability to answer questions on a particular aspect of language; discrete-point items are very popular with teachers because they are quick to write and

easy to score. back to top Face validity: refers to the overall appearance of the test; it is the extent to which a test appeals to test takers. Formative evaluation: refers to tests that are designed to measure students' achievement of instructional objectives; these tests give feedback on the extent to which students have mastered the course materials; examples of this type of evaluation include achievement tests and mastery tests. back to top Holistic scoring: is based on an impressionistic method of scoring; an example of this is the scoring used with the TOEFL Test of Written English (TWE). back to top Integrative testing: goes beyond discrete-point test items and contextualizes language ability. Inter-rater reliability: attempts to standardize the consistency of marks between raters; it is established through rater training and calibration. Item bank: a large bank or number of items measuring the same skill or competency; item banks are most frequently found in objective testing in particularly CBT and CAT. Item Analysis: a procedure whereby test items and distractors are examined based on the level of difficulty of the item and the extent to which they discriminates between high-achieving and low-achieving students; results of item analyses are used in the upkeep and revision of item banks. back to top Mean: known as the arithmetic average; to obtain the mean, the scores are added together and then divided by the number of students who took the test; the mean is a descriptive statistic. Mode: the most frequently received score in a distribution. Norm-referenced test: measures language ability against a standard or "norm" performance of a group; standardized tests like the TOEFL are norm-referenced tests because they are normed through prior administrations to large numbers of students. back to top Objective test: can be scored based solely on an answer key; it requires no expert judgment on the part of the scorer. back to top Parallel tests: multiple versions of a test; they are written with test security in mind; they share the same framework, but the exact items differ. Performance-based test: requires students to show what they can do with the language as opposed to what they know about the language; they are often referred to as task-based. Piloting: a common practice among language testers; piloting is a practice whereby an item or a format is administered to a small random or representative selection of the population to be tested; information from piloting is commonly used to revise items and improve them; also known as field testing.

Placement test: is administered to incoming students in order to place or put them in the correct ability level; content on placement tests is specific to a given curriculum; placement tests are most successfully produced in-house. Portfolio assessment: one type of alternative assessment; portfolios are a representative collection of a student's work throughout an extended period of time; the aim is to document the student's progress in language learning via the completion of such tasks as reports, projects, artwork, and essays. Practicality: one of the cornerstones of good testing practice; practicality refers to the practical issues that teachers and administrators must keep in mind when developing and administering tests; examples include time, and available resources. Proficiency test: is not specific to a particular curriculum; it assesses a student's general ability level in the language as compared to all other students who study that language. An example is the TOEFL. back to top Range: one of the descriptive statistics or measures of central tendency; the range or min/max is the lowest and highest score in a distribution. Rating scale: instruments that are used for the evaluation of writing and speaking; they are either analytical or holistic. Reliability: one of the cornerstones of good testing practice; reliability refers to the consistency of exam results over repeated administrations. back to top Self-assessment: asks students to judge their own ability level in a language; one type of alternative assessment. Specifications: a document that states what the test should be used for and who is it aimed at; test specifications usually contain all instructions, examples of test formats/items, weighting information and pass/fail criteria. Standardized test: measures language ability against a norm or standard. Subjective test: requires a knowledge of the content area being tested; a subjective test frequently depends on impression and opinion at the time of the scoring. Summative evaluation: refers to a test that is given at the end of a course or course segment; the aim of summative evaluation is to give the student a grade that represents his/her mastery of the course content. back to top Validity: one of the cornerstones of good testing practice; refers to the degree to which a test measures what it is supposed to measure. Washback: one of the cornerstones of good testing practice; refers to the impact a test or testing program may have on the curriculum. Weighting: refers to the value that is placed on certain skills within the exam determined through prior administrations to large numbers of students.

The Cornerstones of Testing


Language testing at any level is a highly complex undertaking that must be based on theory as well as practice. Although this PCI focuses on practical aspects of classroom testing, an understanding of the basic principles of good testing is essential. The guiding principles that govern good test design, development and analysis are validity, reliability, practicality, washback, authenticity, transparency and security. Constant references to these important "cornerstones" of language testing will be made throughout the workshop. cornerstones checklist

Validity
The term validity refers to the extent to which a test measures what it says it measures. In other words, test what you teach, how you teach it! Types of validity include content, construct, and face. For classroom teachers, content validity means that the test assesses the course content and outcomes using formats familiar to the students. Construct validity refers to the "fit" between the underlying theories and methodology of language learning and the type of assessment. For example, a communicative language learning approach must be matched by communicative language testing. Face validity means that the test looks as though it measures what it is supposed to measure. This is an important factor for both students and administrators. Other types of validity are more appropriate to large-scale assessment. back to top

Reliability
Reliability refers to the consistency of test scores. It simply means that a test would give similar results if it were given at another time. Three important factors effect test reliability. Test factors such as the formats and content of the questions and the length of the exam must be consistent. For example, testing research shows that longer exams produce more reliable results than very brief quizzes. Administrative factors are also important for reliability. These include the classroom setting (lighting, seating arrangements, acoustics, lack of intrusive noise etc.) and how the teacher manages the exam administration. Affective factors in the response of individual students can also affect reliability. Test anxiety can be allayed by coaching students in good test-taking strategies. back to top

Practicality
Classroom teachers are well familiar with practical issues, but they need to think of how practical matters relate to testing. A good classroom test should be "teacher-friendly". A teacher should be able to develop, administer and mark it within the available time and with available resources. Classroom tests are only valuable to students when they are returned promptly and when the feedback from assessment is understood by the student. In this way, students can benefit from the test-taking process. Practical issues include time, resources (everything from computer access, copying facilities, AV equipment to storage space), and administrative logistics. back to top

Washback

Washback refers to the effect of testing on teaching and learning. Unfortunately, students and teachers tend to think of the negative effects of testing such as "test-driven" curricula and only studying and learning "what they need to know for the test". Positive washback, or what we prefer to call "guided washback" can benefit teachers, students and administrators. Positive washback assumes that testing and curriculum design are both based on clear course outcomes which are known to both students and teachers/testers. If students perceive that tests are markers of their progress towards achieving these outcomes, they have a sense of accomplishment. In short, tests must be part of learning experiences for all involved. back to top

Authenticity
Language learners are motivated to perform when they are faced with tasks that reflect real world situations and contexts. Good testing or assessment strives to use formats and tasks that mirror the types of situations in which students would authentically use the target language. Whenever possible, teachers should attempt to use authentic materials in testing language skills. back to top

Transparency
Transparency refers to the availability of clear, accurate information to students about testing. Such information should include outcomes to be evaluated, formats used, weighting of items and sections, time allowed to complete the test, and grading criteria. Transparency dispels the myths and mysteries surrounding secretive testing and the adversarial relationship between learning and assessment. Transparency makes students part of the testing process. back to top

Security
Most teachers feel that security is an issue only in large-scale, high-stakes testing. However, security is part of both reliability and validity. If a teacher invests time and energy in developing good tests that accurately reflect the course outcomes, then it is desirable to be able to recycle the tests or similar materials. This is especially important if analyses show that the items, distractors and test segments are valid and discriminating. In some parts of the world, cultural attitudes towards "collaborative test-taking" are a threat to test security and thus to reliability and validity. As a result, there is a trade-off between letting tests into the public domain and giving students adequate information about tests. back to top

Cornerstones Checklist
When developing, administering and grading exams, ask yourself the following questions:
t t t t

Does your exam test the curriculum content? Does your exam contain formats familiar to the students? Does your test reflect your philosophy of teaching? Would this test yield the same results if you gave it again?

t t t t t t t t t

Will the administration of your test be the same for all classes? Have you helped students reduce test anxiety through test-taking strategies? Do you have enough time to write, grade and analyze your test? Do you have all the resources (equipment, paper, storage) you need? Will this test have a positive effect on teaching and learning? Are the exam tasks authentic and meaningful? Do students have accurate information about this test? Have you taken measures to insure test security? Is your test a good learning experience for all involved?

Você também pode gostar