Você está na página 1de 63

Evaluation in

Language Teaching
By The Untamed

The Untamed

Table of contents
Introduction

to the topic

Concepts
Types

of testing
Techniques
Things to consider when designing a test
What makes a good test?

Table of contents(cont...)
Marking

tests
Conclusion
Bibliography

Introduction
Education is a weapon against many evils that could destroy the most
powerful beings on planet earth (humankind), such as poverty, illnesses
and ignorance and many others and evaluation is a vital element of all
systems of education. It is what makes possible to educators, teachers,
policy makers and the community in general have an idea of what is
missing and what has been accomplished, and that is not different to
Language teachers thats why this is an assignment about evaluation in
language teaching.

Concepts of evaluation

Evaluation is the systematic process of collecting


and analyzing data in order to determine whether,
and to what degree objectives have or are being
achieved. (Osman,)
For Tyler (1949) evaluation is the process of
determining to what extent the educational
objectives are being realized.

Concepts of evaluation(cont...)
Cameron

(1945) summed all these definition


saying that: Evaluation is the process of
making overall judgment about ones work or
a whole schools work

Testing
Testing

is a technique of obtaining
information for evaluation purpose.

Types of testing
Placement

Tests
Diagnostic Tests
Achievement Tests
Proficiency Tests

Placement tests
Placing

new students in the right class in a


school is facilitated by the use of placement
tests. Usually based on syllabuses and
materials the student will follow and use once
their level has been decided on, these test
grammar and vocabulary knowledge and
asses students productive and receptive
skills

Diagnostic tests
This

type of test can be use to expose


learners difficulties, gapes in their knowledge
and skill deficiencies during a course. Thus,
when we know what the problems are we do
can do something.

Progressive or
achievement test
These

tests are designed to measure learns


language and skill progress in relation to the
syllabus (plan showing) they have been
following.
Progress tests are often written by teachers
and given to students every few weeks to
see how well they are doing..

Achievement tests(cont..)
Achievement

tests only work if they contain


item types which the students are familiar
with. At the end of a term the achievement
test should reflect progress not failure. They
should reinforce the learning that has taken
place, not go out of the way to expose
weakness.

Proficiency tests
Give

a general picture of a students


knowledge and ability, they are frequently
used as stages people have to reach if they
want to be admitted to foreign University, get
a job or obtain some kind of certificate.

Common test techniques


For

Hughes, test techniques are means of


eliciting behavior from candidates that will tell
us about their language abilities. Useful
techniques should:
Elicit behavior which is reliable and valid
indicator of the ability in which teachers are
interested;
Elicit behavior which can be reliably scored;
Be as economical of time and effort as
possible.

1. Multiple choice items


Multiple choice items can come in many forms,
but they all have a stem and a number of
options- where only one is correct, and the
others are distractors.
Ms. Larson has been teaching __________ a
month.
A. during
B. for
C. while
D. since

Advantages
Scoring

can be perfectly reliable


Scoring should also be rapid and economical
Its possible to include more items than itd
otherwise be possible in a given period
It allows the testing of receptive skills

Disadvantages
The

technique tests only recognition


knowledge. It might give an inaccurate
picture of the candidates ability if theres a
gap between their productive and receptive
skills. For instance, the person who can
identify the correct answer in the item above
may not be able to produce the correct form
when speaking or writing.

Guessing may have a considerable but


unknowable effect on test scores. The chance
of guessing the correct answer in a three-option
multiple choice item is one in three. On average
someone can be expected to score 33 on a 100item test purely by guessing. The trouble, then,
is that we can never know what part of any
particular individuals score has come about
through guessing. So if multiple choice is to be
used, there should be at least be four options.

Cheating

may be facilitated. The fact that


the responses on a multiple choice test (a, b,
c , d) are so simple makes it easy to
communicate them to other candidates nonverbally.
Therefore, this can be prevented by by
having at least two versions of the test, the
only difference being between them being
the order in which the options are presented.

2. YES/NO and TRUE/FALSE items


When

test takes have to choose between


YES and NO, or between TRUE and FALSE
we are facing a case of multiple choice items
with only two options, which means that he
has a 50% chance of guessing the right
answer. True/False items are sometimes
extended by requiring test takers to give a
reason for their choice. However, this is a
potentially difficult writing task when writing is
not meant to be tested

3. Short-answer items
Items in which the test taker has to provide a
short answer are common, particularly in
listening and reading tests. Examples:
What does them in the last sentence refer
two?
At what time is the plane scheduled to leave
to London?

Advantages:
There

will be less guessing


The technique is not restricted by the need
for distracters
Cheating is likely to be more difficult
Items should be easier to design

Disadvantages:
Responses

may take longer and so reduce


the possible number of items;
The test taker has to produce language in
order to respond;
Scoring may be invalid or unreliable, if
judgment is required;
Scoring may take longer.

4. Gap filling items


This technique can be used for both reading
and listening,
E.g. Hannibal particularly liked to eat brains
because of their _______and their _______
As

well as for vocabulary and grammar,


E.g. He asked me for money, __________
though he knows I earn a lot less than him.
E.g. Dad has lost his job. We have to tighten
our _________ from now on.

Advantages:
Makes

use of short answers


Does not call for significant productive skills
It doesnt take long when words can be found
in the text or are straightforward;

Things to consider when


designing a test
Assess

the test situation


Decide what to test
Balance the elements
Weight the scores
Make the test work

Assess the test situation


before

we start to write the test we need to


be aware of the context in which the test
takes place. We have to decide how much
time should be given to the test-taking, when
and where it will take place, and how much
time there is for marking

Decide what to test


we

have to list what we want to include in our


test having in mind the purpose of the test
and the skills to be tested. We have to know
what programme items can be included and
what kinds of topics and situations are
appropriate for our students.

Balance the elements


balancing

elements involves estimating how


long we want each section of the test to take
and then writing test items within those time
constraints. The amount of space and time
we give to the various elements should also
reflect their importance in our teaching.

Weight the Scores

even after having balanced the elements in our


test we still dont have perception of our
students success or failure, such perception
depends upon how many marks are given to
each section of the test. If we have 3 sections
test, with 5 questions each, if we give two
marks for each of the question in section one
but only one to the questions on the remaining
sections, it means that is more important for
students to do well in the former than in the
latter.

Make the test Work


when

we write test items, the first thing to do


is to get fellow teachers to try them out,
because they may spot problems which we
are not aware of and come up with possible
answers and alternatives that we had not
anticipated.
Later, having made changes based on our
colleagues reactions, we could try out the
test on students of course these students
wont be the ones to which we intend to test.

Make the test Work(cont...)


but

a class that is roughly similar or even a


class one level above, this will allow us to
see what items cause unnecessary
problems, how long the test takes.
Such trialling is designed to avoid disaster
and yield a whole range of possible answers
to the many test items. This means that when
other people finally mark the test, we can
give them a list of possible alternatives and
thus ensure reliable scoring

What makes a good test?


Validity
Reliability

Validity
Criterion-related

validity
Concurrent validity
Predictive validity
Face validity
Construct Validity
Formative Validity
Sampling validity
Validity in Scoring
How to make a test valid?

Criterion-related validity
The

second evidence of a tests construct


validity relates to the degree to which results
on the test agree with those provided by
some independent and highly dependable
assessment of the candidates ability.
There are essentially two kinds of criterionrelated validity which are: concurrent
validity and predictive validity

Face validity
A test

is said to be face-validity if it looks as if


it measures what it is supposed to measure.
Face validity is not a scientific notion and is
not seen as providing evidence for construct
validity, yet it can be very important. A test
which does not have face-validity may not be
accepted by candidates, teachers, education
authorities or employers

Construct validity
Is

used to ensure that the measure is


actually the measure what it is intended to
evaluate and not other variables, using a
panel of experts familiar with the construct
is a way in which this type of validity can be
assessed.
The experts can examine the items and
decided what that specific item is intended to
measure, students can be involved in this
process to obtain their feedback.

Formative validity
When

applied to outcomes assessment, it is


used to assess how well a measure is able to
provide information to help improve the
program under study.

Sampling validity
Ensures

that the measure covers the broad


range of areas within the concept under
study. Not everything can be covered, so
items need to be sampled from all of the
domains. This may need to be completed
using a reflecting what an individual
personally feels are the most important or
relevant areas.

Validity in Scoring

It is worth pointing out that if a test is to have


validity, not only the items but also the way the
responses are scored must be valid, it is no use
having excellent items if they are scored
invalidly. A reading test may call for short written
responses, if the scoring of these responses
take into account spelling and grammar, then it
is not valid assume that reading test is mad to
measure reading ability. By measuring more
than one ability, it makes the measurment of the
one ability in question less accurate

How to make a test more valid?


In

order to make it more valid, there is an


obligation to carry out a full validation
exercise before the test becomes
operational.
First, write explicit specification for the test
with take account of all that known about the
construct that are to be measured. Make
sure that you include a representative
sample of the content of these in test.

How to make a test more valid?


(cont...)
Second,

whenever feasible, use direct


testing. If for some reason it is decided that
indirect test is necessary, reference should
be made to the reaserch literature to confirm
that measurement of the relevant underlying
construct has been demonstrated using the
testing techniques that are to be employed.
Third, make sure that the scoring of
responses relates directly to what is being
tested.

How to make a test more valid?


(cont...)
Finally,

do everything possible to make the


test reliable. If a test is not reliable, it cannot
be valid

Reliability
Test-retest

reliability
Parallel forms reliability
Inter-rater reliability
Inter-consistency reliability
Reliability coefficient
How to make a test more reliable?

Test-retest reliability
is

a measure of reliability obtained by


administering the same test twice over a
period of time to group of individuals, the
score from time one and time two can then
be correlated in order to evaluate the test for
stability over time.

Parallel forms reliability


is

measure of reliability obtained by


administering different versions of an
assessment tool. Both version must contain
items that explore the same construct, skill,
Knowledge base to the same group of
individuals
The scores from the two then be correlated
in order to evaluate the consistency of results
across alternate versions

Inter-rater reliability
this

used to assess the degree to which


different judges or raters agree in their
assessment decisions. This measure is
useful because human observers will not
necessarily interpret answers the same way,
they may disagree as to how well certain
responses or materials demonstrate
knowledge of the construct or skill being
assessed

Inter-consistency reliability
in

this measure we find: average inter-item


correlation and split-half reliability. These
measure are used to evaluate the degree to
which different test items explore the same
produce similar results

Reliability coefficient
is

a way of confirming how accurate a test or


measure is, by giving it to the same subject
more than once and determining if there is a
correlation which is the strength of the
relationship and similarity between the two
score.

How to make a test more reliable?


Take

enough samples of behaviour


Exclude items which do not discriminate well
between weaker and stronger students
Do not allow candidates to much freedom
Write unambiguous items
Provide clear and explicit instructions
Ensure that are well laid out and perfectly
legible

How to make a test more reliable?


(cont...)
Make

candidates familiar with format and


testing techniques
Provide uniform and non-distracting condition
of administration
Use items that permit scoring which is as
objective as possible
Make comparisons between candidates as
direct as possible
Provide a detailed scoring key

How to make a test more reliable?


(cont...)
Train

scores
Agree acceptable responses and appropriate
scores at outset of scoring
Indentify candidates by number, not name
Employ multiple, independent scoring

Marking Tests
Training
More

than one Scorer


Global assessment
Analytic profiles
Scoring and interacting during oral tests

Training
if

scorers have examples of scripts at various


different levels and discuss what marks they
should be given, then their marking is likely
to be less erratic than if they come to the
task fresh. If scorers are allowed to watch
and discuss videoed oral tests, they can be
trained to rate the samples of spoken English
accurately and constantly in terms of the
predefined description of performance

Having More than one


Scorer
reliability

can be greatly enhanced by having


more than one scorer. The more people at a
script, the greater the chance that its true
worth will be located somewhere between
the various scores it is given. Two examiners
watching an oral test are more likely to agree
on a more reliable score than one.

Use Global assessment


Global

assessment scales a way of


specifying scores that can be given to the
productive skill work is to create pre defined
description of performance. Such description
says what students need to be capable of in
order to gain the required marks

Analytic profiles
marking

gets more reliable when a students


performance is analyzed in much greater
detail instead of just a general assessment
marks are rewarded for different elements.
For oral assessment we can judge a student
speaking in a number of different ways such
as pronunciation, fluency, use of lexis and
grammar and intelligibility. A combination of
global and analytic scoring gives the best
chance of reliable mark.

Scoring and interacting


during oral tests
scorer

reliability in oral tests is helped not


only by global assessment and analytic
profiles but also by separating the role of
scorer (examiner) from the role of interlobular
(the examiners who guides and provokes
conversation ).

Scoring and interacting during oral


tests(cont...)

in many tests of speaking, students are now


put in pairs or groups for certain tasks since
it is felt that this will ensure genuine
interaction and will help to relax students in a
way that this interlocutor-candidate
interaction might not.

Conclusion
Evaluation

is one of the methods used to


test, to see how well the knowledge is, skills
being delivered and received.
While there is a lot to be said about this
topic, the principles are the one aspect that
called our attention the most. We look at
evaluation as a system of testing the learners
and the principles as the rules that govern
that system, without which evaluation would
not be effective.

Bibliography
Cronbach,

L. J. 1971. Test Validation. In R.L


Thondike (Ed.) Educational
Harmer,J. 2007. The Practice of English
Language Teaching, England:Pearson
Education Ltd
Hedge, T. 2003. Teaching and Learning in
the Language Classroom, UK: OUP
Hughes, A. 2003. Testing for Language
Teaching, UK: CUP

Thank You

Você também pode gostar