TR Speaking 130805

Testing Speaking
for the E8 Standards

Technical Report 2012
Claudia Mewald
Otmar Gassner
Rainer Brock
Fiona Lackenbauer
Klaus Siller
Testing Speaking for the E8 Standards

Technical Report 2012

Claudia Mewald
Otmar Gassner
Rainer Brock
Fiona Lackenbauer
Klaus Siller
Bundesinstitut fr Bildungsforschung, Innovation & Entwicklung

des sterreichischen Schulwesens
Alpenstrae 121 / 5020 Salzburg
www.bifie.at
Testing Speaking for the E8 Standards.

Technical Report 2012.
BIFIE Salzburg (Hrsg.), Salzburg 2013
Der Text sowie die Aufgabenbeispiele knnen fr Zwecke des Unterrichts in sterreichischen
Schulen sowie von den Pdagogischen Hochschulen und Universitten im Bereich der
Lehreraus-, Lehrerfort- und Lehrerweiterbildung in dem fr die jeweilige Lehrveranstaltung
erforderlichen Umfang von der Homepage (www.bifie.at) heruntergeladen, kopiert und verbreitet werden. Ebenso ist die Vervielfltigung der Texte und Aufgabenbeispiele auf einem
anderen Trger als Papier (z. B. im Rahmen von Power-Point Prsentationen) fr Zwecke des
Unterrichts gestattet.
Autorinnen und Autoren:

Claudia Mewald
Otmar Gassner
Rainer Brock
Fiona Lackenbauer
Klaus Siller
Contents
3
1 SPEAKING TO COMMUNICATE
2 THEORETICAL MODELS
5
2.1 Models of communicative competence
7
2.2 Communicative competence in the CEFR
8 2.2.1 Linguistic competences
10 2.2.2 Sociolinguistic competence
10 2.2.3 Pragmatic competence
11 2.3 The nature of language in unplanned speech
13 3 Test development
14 3.1 Issues of standardisation

14 3.2 Standardising the content
15 3.2.1 Task
20 3.3. Standardising the setting
21 3.3.1 Interlocutor/Assessor characteristics
21 3.3.2 Interlocutor/Assessor training
23 3.4 The test takers
25 3.5 Standardising the construct: construct validation
27 3.5.1 The Assessment Scale
34 3.5.2 Test taker feedback
35 4 E8 SPEAKING TEST SPECIFICATIONS
35 4.1 Purpose of the test

35 4.2 Description of test takers
35 4.3 Test level
35 4.4 Test Construct
36 4.4.1. Construct Space
40 4.5. Structure of the test
40 4.6. Time allocation
41 4.7 Rubrics
41 4.8 Speaking Assessment Scale
43 4.9 Prompt samples
56 5 WASHBACK
57 BIBLIOGRAPHY
60 Appendix
Abbreviations
ANC
BIFIE

E8 BIST
CEFR

EFL
FL
SZ
Austrian National Curriculum (sterreichischer Lehrplan)

Bundesinstitut fr Bildungsforschung, Innovation und Entwicklung des
sterreichischen Schulwesens
Bildungsstandards Lebende Fremdsprache (Englisch), 8. Schulstufe
Common European Framework of Reference for Languages: Learning,
Teaching, Assessment
English as a foreign language
Foreign language(s)
sterreichisches Sprachen-Kompetenz-Zentrum
1 Speaking to communicate
It is commonly acknowledged that foreign language learners as well as most stakeholders consider speaking or more comprehensively oral communication the most
required and important skill to be mastered.
According to Thornbury (2009, p. iv), however, [i]t is generally accepted that know
ing a language and the ability to speak it are not synonymous. Nevertheless, the
teaching of foreign languages (FL) has been practised as if knowing and speaking
were the same thing for quite some time, thus being ignorant about the frequent
misbelief that knowing the grammar and some vocabulary, making sentences and
pronouncing them properly (ibid., p. iv) in the foreign language amounts to the
ability to speak it. Therefore, Thornbury maintains, many courses and teachers still
teach how to vocalise grammar rather than how to communicate effectively.
Modern FL teaching, however, supported by research and the sound judgment of
its receivers, who first and foremost want to become effective FL speakers with the
ability to communicate successfully, has acknowledged that the interactive nature of
communication requires communicative competence. Moreover, the goal of most
language learners being the ability to communicate comprehensibly, effectively, and
naturally, those components of communicative competence (see p. 7) essential to
achieve successful communication are at the heart of modern FL teaching and testing.
The fact that spoken language is significantly different from written text, as deter
mined by the nature of the speaking process, is comparatively new. This has
eventually been made tangible by the CANCODE spoken corpus1, the Cambridge
International Corpus, and modern English dictionaries, which show how English is
really used, not how one is supposed to use English or how one uses it in writing2.
The difference between spoken and written language features not only in its lexis but
also considerably in the grammar of spoken language (Carter & McCarthy 2006,
McCarthy 2006a), which is to be acknowledged in teaching as well as in testing and
assessment (also see p. 31).
Consequently, teaching speaking as a skill has to consider aspects of communicative
competence (see p. 7), communicative genres relevant for the target group (see p.
18), and productive strategies which FL speakers apply to communicate according
to the nature of the communicative task and thereby show their available communicative potential. Taking into consideration that according to the Austrian National
Curriculum for Foreign Languages (ANC), FL education should primarily aim at
communicative competence this seems particularly crucial:
Ziel des Fremdsprachunterrichts ist die Entwicklung der kommunikativen Kompetenz in
den Fertigkeitsbereichen Hren, Lesen, An Gesprchen teilnehmen, Zusammenhngend
Sprechen und Schreiben.
Als bergeordnetes Lernziel in allen Fertigkeitsbereichen ist stets die Fhigkeit zur erfolgreichen Kommunikation die nicht mit fehlerfreier Kommunikation zu verwechseln
ist anzustreben. (bmukk 2009c, pp.12)
The curricular priority on successful communication rather than accuracy suggests a fluency-oriented approach to teaching and assessing speaking (Brown 1999,
1
2
CANCODE = the Cambridge and Nottingham Corpus of Discourse in English

See http://www.pearsonlongman.com/dictionaries/corpus/spoken-bnc.html and http://www.natcorp.ox.ac.uk/
Ebsworth1998, Krashen & Terrell 1988, Krashen 2003, McCarthy 2006b, Richards
2008). This is also emphasized by Brock et al. (2008, p. 24), who suggest that the
practice of communicative competence is even possible in large classes if teachers
manage to let go of correcting and adopt the role of facilitators who enable, support,
and encourage speech processes instead. Moreover, they maintain that the explicit
demand for all five skills to be addressed equally intensively brings forth the obliga
tion to assess spoken interaction and oral production regularly and reliably.
Die Fertigkeitsbereiche Hren, Lesen, An Gesprchen teilnehmen, Zusammenhngend
Sprechen und Schreiben sind in annhernd gleichem Ausma regelmig und mglichst
integrativ zu erarbeiten und zu ben. (bmukk 2009c, p. 2)
Da aber die Erfassung der mndlichen Kompetenzen in der Gesamtbeurteilung vom
Lehrplan im Sinne der Gleichwertigkeit der Fertigkeiten explizit gefordert wird, muss
ein GERS-orientierter Unterricht mndliche Prfungs- und bungsformen beinhalten,
die sowohl monologische als auch dialogische Sprechkompetenzen verlsslich abbilden.
(Brock et al. 2008, p.12)
For this reason, the ANC and the E8 Standards (E8 BIST) describe precisely what
language learners should be able to do in spoken interaction and oral production in
can-do descriptors and CEFR levels. Testing speaking in the E8 BIST context thus
relies on an overarching framework which takes the aspects addressed in the three
documents into consideration.
In the following theoretical models of communicative competence and language
ability, construct specifications, task specifications as well as assessment specifications
are described.
2 Theoretical Models
As early as 1961 Lado (p. 239) suggested that the ability to speak was without doubt
the most highly prized skill, while testing it was the least developed and the least
practised in the field of testing. One might argue that Lados work on testing is history and modern FL education has long overcome this mismatch. However, the current
state of the art of testing and assessing speaking in Austrian classrooms suggests that
testing hardly ever happens in a systematic way and thus the ability to speak does
not have a strong formal impact on the learners final grades. Therefore, it seemed
appropriate and necessary to explore findings from international test development
and apply them to the Austrian context in the development of the E8 Speaking Test.
Lado (1961) argues that the underrepresentation of testing and assessment in speaking
derives from the fact that we lack understanding of what constitutes speaking. Consequently, this section focuses on theories of speaking in order make transparent how
the concepts of speaking and communicative competence have been captured by
the literature since the 1960s and finally in the E8 Speaking Test.
Modern testing of speaking draws on competence models that accept the view that
speaking does not happen in a vacuum but that it is a real-life process, co-constructed
between participants talking in specific contexts and situations (Fulcher 2003).
Theoretical models for testing speaking which acknowledge the communicative
function of that skill therefore define competence models.
2.1 Models of communicative competence

According to Johnson & Johnson (1999, p. 62) communicative competence is the
knowledge which enables someone to use a language effectively and their ability to
use this knowledge for communication. The term is most usually attributed to Dell
Hymess paper On communicative competence (1970).
Since the 1970ies the concept of communicative competence has been discussed and
redefined by many researchers and authors3. Hymess original proposition, however,
usually remains the starting point of discussions of communicative competence. He
suggests that learners of a foreign language have to have linguistic knowledge, dealing
with producing grammatically correct sentences, and communicative competence,
dealing with producing and understanding sentences that are appropriate and acceptable to a particular situation in order to be able to communicate effectively (Hymes
1972, pp. 284286). He thus emphasises the difference between knowledge about
language and the competence that enables a person to communicate functionally
and interactively.
In a later publication Hymes (1974, p. 62) specified the components of speech and
suggested grouping them in a mnemonic code that would spell SPEAKING in
order to make them memorable:
Some of these researchers were Bachman 1990, Bachman & Palmer 1996, Canale & Swain 1980, Fulcher 2003,
Luoma 2004, Widdowson 1978, and Wilkins 1976. Their publications quoted in this paper had an impact on
the development of the E8 Speaking Test.
1. Setting and Scene

(the time and place of a speech act and the psychological setting or cultural
definition of an occasion, e.g. range of formality or sense of seriousness)
2. Participants
(the speakers and the audience of a speech act)
3. Ends
(the purposes, goals, and expected outcomes of a speech act)
4. Acts
(the speech acts and speech events)
5. Key
(the tone, manner, or spirit of the speech act)
6. Instrumentalities
(the forms and styles of speech such as casual register and colloquial features, or
formal register and careful grammatical standard forms)
7. Norms
(the social rules that govern the event and the participants actions and reaction)
8. Genre
(the kind of speech acts or events, i.e. the types)
(Summarised from Hymes 1974, pp. 5262)
Most of the above components of speech are discussed in this paper taking into consideration the content and the context of the E8 Speaking Test: Setting (see p. 22),
Participants (see p. 23), Purpose (see p. 35), Speech Acts (see p. 40), Key (see p. 35ff),
Instrumentalities (see task types p.16), and Genre (see p. 18).
In addition to the already mentioned researchers, several linguists and methodologists (Brumfit & Johnson 1998, Wilkins 1976, Widdowson 1978) took up the
notion of communicative competence in the development of communicative lan
guage teaching during the 1970s and 1980s. Just a few of them will be mentioned
in the following discussion, namely those whose theoretical reflections and empirical
work seem to have had the strongest impact on the theory of communicative competence and on the development of the E8 Speaking Test.
Like Hymes (1974), Widdowson (1978) suggests that knowing a language is more
than just understanding, speaking, reading, and writing sentences. In fact, he pro
poses that knowing a language means using sentences to achieve communicative
purposes. Additionally, Widdowson (1983) introduces a distinction between competence and capacity.
He refers to communicative competence as the knowledge of linguistic and sociolinguistic conventions but to procedural or communicative capacity as the ability to use
knowledge as a means of creating meaning in a language. In this way Widdowson has
established the main concern of successful and meaningful communication, which is
a predominant feature in the assessment of spoken performances in the E8 Speaking
Test.
Canale & Swain (1980) see communicative competence as a synthesis of an underlying system of (conscious or unconscious) knowledge that interacts with other sys
tems of knowledge (e.g. world knowledge) and that is observable in actual communicative performance. This system of knowledge includes knowledge of grammatical
principles, the use of language in social contexts to fulfil communicative functions,
and the use of discourse principles (Canale & Swain 1980, p. 27).
Canale & Swain thus ground their concept of communicative competence on the
following components:
Grammatical competence. This type of competence will be understood to include knowledge of lexical items and rules of morphology, syntax, sentence-grammar semantics, and
phonology.
Sociolinguistic competence. This component is made up of two sets of rules: sociocultural
rules of use and rules of discourse.
Strategic competence. This component will be made up of verbal and nonverbal communication strategies that may be called into action to compensate for breakdowns in communication due to performance variables or to insufficient competence. Such strategies will
be of two main types: those that relate primarily to grammatical competence (e.g. how
to paraphrase grammatical forms that one has not mastered or cannot recall momentar
ily) and those that relate more to sociolinguistic competence (e.g. various role-playing
strategies []).
(Canale & Swain 1980, pp. 2930)
Bachman (1990, p. 84) pursues a similar concept as Widdowson and describes communicative language ability (CLA) as consisting of both knowledge, or competence,
and the capacity for implementing, or executing that competence in appropriate,
contextualized communicative language use [our emphasis]. He therefore proposes
a framework including the following components: language competence, strategic
competence, and psychophysiological mechanisms (i.e. the neurological and psychological processes in the actual execution of language as a physical phenomenon
such as sound).
According to Bachman (ibid.) language competence comprises a set of specific knowledge components utilised in communication via language, while strategic competence embraces the mental capacity for implementing the components of language
competence in contextualized communicative language use. He strongly links this
competence with sociocultural knowledge and real-world knowledge. In this respect Bachman is in agreement with Widdowson as well as Canale & Swain who also
emphasise the procedural and functional notion of communicative competence with
regard to contextualised and meaningful communication.
2.2 Communicative competence in the CEFR

The discussion of the importance of The user/learners competences when carrying
out communicative tasks in the CEFR (Council of Europe 2001, pp. 101108) had
an impact on the development of the E8 Speaking Test. The CEFR suggests that all
human competences contribute in one way or the other to the ability to communicate and may therefore be regarded as aspects contributing to communicative competence. (ibid., p. 101) Nevertheless, those competences closely related to language
in the description of communicative language competence are especially emphasised
(ibid., pp. 108130).
According to the CEFR users/learners employ their general capacities ... together
with more specifically language-related communicative competence in order to fulfil communicative purposes. Thus, communicative competence has the following
components:
linguistic competences;
sociolinguistic competences;
pragmatic competences. (Council
of Europe 2001, p. 108)
2.2.1 Linguistic competences

In the description of linguistic competence, the CEFR refers to the main compo
nents of linguistic competence defined as knowledge of, and ability to use, the formal resources from which well-formed, meaningful messages may be assembled and
formulated. (Council of Europe 2001, p. 109)
From the six linguistic competences mentioned in the CEFR, the following three
have been selected to be assessed in the E8 Speaking Test: lexical, grammatical, and
phonological competence.
Lexical competence
Lexical competence is described as the knowledge of, and the ability to use, the vocabulary of a language, [and] consists of lexical elements and grammatical elements.
(Council of Europe 2001, p. 110)
Lexical elements include fixed expressions and single word forms.
Single word forms are single words that may have several distinct meanings (e.g.
tank container/armoured vehicle), open word classes (nouns, verbs, adjectives etc.),
and lexical sets (days of the week, weights and measures etc.). (Council of Europe
2001, p. 111)
Fixed expressions consist of several words that are used and learnt as wholes. In unplanned speech they are the building blocks of fluency, which demonstrate communicative capacity. According to the CEFR fixed expressions include sentential formulae, phrasal idioms, fixed frames, and fixed phrases (Council of Europe 2001, pp.
110111).
In speaking, fixed expressions are often called lexical phrases, formulaic language,
conversational routines or prefabs. They range from chunks of language to complete
sentences that are not assembled word by word in the speech act but have been preassembled through repeated use. Therefore, they can be accessed easily and quickly
and thus contribute to fluency (Thornbury 2009, p. 23).
Examples from performances recorded during the piloting phase of the E8 Speaking
Test:
How are you? I dont know what you mean. Have a nice day. Good bye.
(sentential formulae, also called social formulas)
What I dont like is ..., Please can I have , Id like to ..., What do you think (about)
..., I hope I will ...
(fixed frames or phrases, also called sentence frames)
Well ..., Right ..., I agree ..., You see ..., Yeah ...
(discourse markers)
Three times a week, brush my teeth, go to a party/cinema/friend
(collocations)
Finally, grammatical elements that belong to closed word classes range from articles
to prepositions and particles (for a complete list see Council of Europe 2001, p.111).
Lexical competence is assessed in the dimension Vocabulary in the E8 Speaking
Test (see p. 33f ).
Grammatical competence
Grammatical competence is defined in the CEFR as the

knowledge of, and ability to use, the grammatical resources
ability to understand and express meaning by recognising
of a language.
and producing wellformed phrases and sentences. (Summarised from Council of Europe 2001,
pp.112113)
The CEFR does not provide a model for grammar or for the organisation of words
into sentences but it identifies parameters [] which have been widely used in
grammatical description: elements, categories, classes, structures, processes, and relations
(Council of Europe 2001, p. 113).
In the E8 Speaking Test, grammatical competence is assessed in the dimension
Grammar (see p. 31f ).
In the assessment of both, lexical and grammatical competence, the nature of lan
guage in unplanned speech is acknowledged (see p. 11).
Phonological competence
The CEFR describes phonological competence as the knowledge of, and skill in the
perception and production of:
sound units,
phonetic composition of
sentence phonetics, and
phonetic reduction.
words,
(Summarised from Council of Europe 2001, pp.116117)
Apart from the test takers success in making use of an appropriate lexical and grammatical range and the accuracy of the performance, the naturalness and clarity of the
language used are assessed as the third component of linguistic competence in the E8
Speaking Test. A performance is considered natural and clear if the pronunciation is
intelligible and the pronunciation and intonation make it sound natural. In order to
achieve this, performances have to reach a certain level of fluency.
According to McCarthy, fluency is shown through
lexico-grammatical and phonological flow,
apparently effortless accurate selection of elements by individual
the ability of participants to converse appropriately on topics,
the ability to retrieve chunks,
interactive support by each speaker to the flow of talk, and
helping one another to be fluent.
speakers,
(Summarised from McCarthy 2006b, p.5)
In this way, McCarthy maintains, speakers are able to express ideas appropriately,
coherently, speak at a suitable pace, and use pausing at expected points.
In the E8 Speaking Test, fluency features as phonological flow in the sense that
natural and clear pronunciation and intonation should make it possible for speakers of
English to understand the test takers utterances without having to guess on meaning.
10
Phonological competence as described above is assessed in the dimension Clarity

and Naturalness of Speech (see p. 30f ).
Aspects of confluence, which also contribute to fluency, are assessed as discourse
competence and design competence (see p.10ff) in the dimension Task Achievement and Communication Skills (see p. 28f ).
2.2.2 Sociolinguistic competence
Sociolinguistic competence is described as the knowledge and skills required to deal
with the social dimension of language use. [T]he matters treated ... are linguistic
markers of social relations; politeness conventions; expressions of folk wisdom; regis
ter differences; and dialect and accent. (Council of Europe 2001, p. 118)
In the context of the E8 Speaking Test the focus is primarily on the linguistic aspects
of sociolinguistic competence. With regard to the limitations of the testing situation,
the FL level and age of the target group, as well as the relationship of interlocutors
and test takers, sociolinguistic competence is restricted to linguistic markers of social relations and politeness conventions.
Therefore, linguistic markers that come into play in the E8 Speaking Test are most
likely greetings on arrival and leaving as well as introductions and conventions for
turntaking. Politeness conventions (Council of Europe 2001, p. 119) are dependent
on the task and the descriptor being tested and therefore restricted to expressing and
responding to feelings such as surprise, happiness, sadness, interest and indifference,
offering things or actions etc.
The use of linguistic markers of social relations and politeness conventions as
evidence for sociolinguistic competence will most likely become observable as lexical
elements and are thus assessed in the dimension Vocabulary, while the test takers
ability to follow Conventions for turntaking is assessed in the dimension Task
achievement and communication skills in the E8 Speaking Test (see p. 28f ).
Other aspects of sociocultural and sociolinguistic competence like folk wisdom, register differences or dialect and accent go beyond the level and knowledge of the target
group and are therefore not considered in the assessment of the E8 Speaking Test.
2.2.3 Pragmatic competence
According to the CEFR, [p]ragmatic competence deals with the ability to organise,
structure and arrange messages (discourse competence), to perform communicative
functions (functional competence), and to sequence turns according to interactional
or transactional schemata (design competence). (Council of Europe 2001, p.123)
Discourse competence
In agreement with the definition by Canale & Swain (1980), the CEFR defines
discourse competence as the ability ... to arrange sentences in sequence so as to produce coherent stretches of language. (Council of Europe 2001, p.123)
In the E8 Speaking Test this competence can best be demonstrated in the monologue
part (see p. 17), where the test takers are most likely to produce text that features
whole sentences. In other parts of the test (interview or dialogue, see pp. 1617)
the nature of interactive talk will primarily trigger the use of short idea units and
incomplete sentences, strings of short phrases, as well as short turns (see also p. 11).
Functional competence
Functional competence refers to the use of spoken discourse .... for particular functional purposes. (Council of Europe 2001, p. 125)
In the context of the E8 Speaking Test, functional competence comes into play as the
already mentioned ability to make use of known expressions (see p. 8ff) in meaningful exchanges surfacing as communicative capacity or communicative strategies (see
pp. 67).
According to the CEFR (Council of Europe 2001, p. 128) the qualitative aspects
which determine functional success are fluency, the ability to articulate, to keep
going, and to cope when one lands in a dead end and propositional precision, the
ability to formulate thoughts and propositions so as to make ones meaning clear.
The aspect of fluency which describes the ability to articulate is assessed in the
dimension Clarity and Naturalness of Speech, while the ability to keep going, and
to cope when one lands in a dead end as well as the ability to formulate thoughts
and propositions so as to make ones mind clear are assessed in the dimension Task
Achievement and Communication Skills (see p. 28f ).
Design competence
Design competence describes the ability to sequence turns according to inter

actional or transactional schemata. (Council of Europe 2001, p. 123) In the context
of the E8 Speaking Test this ability will surface primarily in the dialogue part (see
p.17), where the possibility for turntaking provides opportunities for the effective
use of language to organise the discourse (also see p. 8ff) and thus the chance to
demonstrate discourse competence typical for interactive and unplanned speech.
2.3 The nature of language in unplanned speech

According to Thornbury speech production takes place in real time and is therefore
essentially linear with planning time ... severely limited. Therefore, in speaking
words follow words, phrases follow phrases etc. and to compensate for limited
planning time ... [speakers] ... use what is called an add-on strategy. This results in
chaining together short phrases and clause-like chunks, which accumulate to form
an extended turn. (Thornbury 2009, pp. 2 & 4)
Similar to Thornburys description of speech production, Luoma depicts unplanned
speech as
spoken spontaneously
containing short idea
mostly in reaction to other speakers;

units, incomplete sentences, strings of short phrases, or
short turns;
in a formal to informal register.
(Summarised from Luoma 2004, p. 12)
delivered
Although the test takers in the E8 Speaking Test are given a short time to think
about their speech act (see p. 40), their performances cannot be called planned
in Luomas terms. Planned speech, according to Luoma (2004, pp. 1213), is
rehearsed, consists of well-thought-out points or opinions, and has been said many
times before.
11
12
Therefore, the nature of language of unplanned speech is considered in the assessment of spoken performances in the E8 Speaking Test, especially in the assessment
of the test takers linguistic competence, i.e. Vocabulary and Grammar. It is
acknowledged, that both vocabulary and grammar in unplanned speech are limited
in their range as well as in their accuracy compared to writing and that performance
effects [which] include the use of hesitations (erm, uh ), repeats, false starts,
incomplete utterances, and syntactic blends (i.e. utterances that blend two grammatical structures as in Ive been to China in 1998 (Thornbury 2009, p. 21) are
natural.
3 Test development
The following Model of speaking test performance (Figure 1) describes the components
which constitute the E8 Speaking Test as well as a range of factors and processes that
have impact on the performance and its assessment and have therefore been considered in the development of the E8 Speaking Test.
It depicts that the construct had to be related to communicative competences, taskspecific knowledge and skills as well as test taker characteristics, which had to be
considered in task development, decisions on setting, and how to pair the test takers.
Moreover, it shows that test administration including setting, interlocutor character
istics and training has a bearing on the performance and that its assessment based on
the assessment scale does so, too.
Thus it clarifies that the test performance is at the heart of an interrelated system
which required organised decision making in order to provide testing and assessment
tools that would most likely bring about valid and reliable results. The following
sections will describe the process of test development in the light of these aspects.
Figure 1: Model of speaking test performance (adapted from Fulcher 2003, p. 115)
13
14
3.1 Issues of standardisation

In this chapter we will discuss general issues of standardisation such as reliability and
validity and of test design including the rationale for the paired approach towards
testing speaking. Moreover, we will touch on issues of standardisation through train
ing and the use of the standardised assessment scale for the E8 Speaking Test.
While validity deals with the appropriateness of a given test or any of its component
parts as a measure of what it is purported to measure (Henning 1987, p. 89), test
reliability describes [t]he actual level of agreement between the results of one test
with itself or with another test. (Davies et al. 1999, p. 168)
Bachman suggests considering reliability and validity as complementary aspects to
identify, estimate and control factors that affect scores:
The investigation of reliability is concerned with answering the question, How much of
an individuals test performance is due to measurement error, or to factors other than the
language ability we want to measure? and with minimizing the effects of these factors
on test scores. Validity, on the other hand, is concerned with the question, How much
of an individuals test performance is due to the language abilities we want to measure?
and with maximising the effects of these abilities on test scores. (Bachman 1990, pp.
160161)
Acknowledging the importance of the issues of reliability and validity it was a major
concern of the E8 testing group to identify sources of error in the assessment of the
test takers communicative language ability and to develop a test and an assessment
tool that would be capable of identifying the language abilities to be measured as
reliably as possible.
3.2 Standardising the content

According to Kerlinger (1973, p. 458) content validity is the representativeness
or sampling adequacy of the content the substance, the matter, the topics of a
measuring instrument. Additionally, Cspes & Egyd (2004, p.19) maintain that
speaking tests should present test takers with tasks that resemble as closely as possible what people do with the language in real life and Davies et al. (1999, p. 34)
argue that the test content must include an adequate sample of the target domain
[spoken language] to be measured. An adequate sample involves ensuring that all
major aspects are covered and in suitable proportions.
Since the E8 BIST and the ANC follow the construct of the CEFR in differentiating
speaking into Oral Production and Spoken Interaction, the teaching and testing of
speaking have to deal with two skills. These should receive equal attention in terms
of tuition time and be weighted appropriately in proportion to the skills of reading,
listening, and writing in the general assessment of learners as suggested in the ANC.
As a consequence, the duality in speaking as a skill has brought about two components in the E8 Speaking Test: the monologue and the dialogue part (see p. 17ff).
The following sections will discuss issues of validity referring to the content of the
E8 Speaking Test with reference to the achievement measures of oral production and
spoken interaction realised in the tasks.
3.2.1 Task
The content of the tasks in the E8 Speaking Test is defined by the topics, the communicative function determining the task types, the spoken text types, and the rubrics.
Topics and context
In real life, speaking occurs in a given context. Therefore, the tasks are based on
topics that provide contexts as close to real life as possible and avoid such that might
put some test takers at a disadvantage because the task achievement requires specific
knowledge of the world and/or cultural knowledge. Moreover, topics that require
a great deal of creativity or imagination to accomplish the task or that might easily
trigger stereotypes are not used either.
The topics of the E8 Speaking Test follow curricular guidelines and the contexts the
tasks create reflect the world knowledge and experience of 14-year-old test takers
(also see p. 23). Moreover, great care is given to design tasks that are interesting for
the test takers in order to support motivation and participation.
The topic and the context determine
the
purpose of communication (see Speaking Purpose/Communicative Function, p. 35ff),

the audience to be addressed, which defines the interactional relationship (see
Primary Audience , p. 35ff),
the kind of spoken text type to be produced (see p. 18ff), and
the expected content (see pp. 3640, topics from the ANC).
The constituents of the context are fleshed out in the construct space that defines the
tests framework in terms of its components. The expected content, i.e. the information the test takers are expected to present, is prompted in content points, textual or
visual stimuli.
Prompts are kept short to avoid validity problems through too much reading input,
but no important information needed to complete the task successfully is omitted.
Possible content points are clear and easy to recognise.
For example, if the prompt asks the test takers to describe the village/town where
they live, the prompt could look like the following:
The village/town where you live
Say
what this village/town is like.
what things or places are interesting to see.
what you can do there.
why you like this village/town.
why you do not like this village/town.
what you would change about this village/town.
The prompt and the content points give away as little language as possible that will
be needed to accomplish the task to give the test takers the possibility to make use of
15
16
their own ideas and language. However, making use of the language in the prompts
is not prohibited and does not have negative impact on the assessment.
If drawings, graphs, or pictures are used to illustrate the prompt and/or to stimulate
speaking, these are provided in excellent quality so that the test takers are not put at
a disadvantage.
Input texts that are used as a part of the prompt should be authentic. If this is not
possible, adapted texts must provide correct and appropriate English. Input texts
must be as short as possible and they must not exceed 50 words so that reading is
kept to a minimum. The language level of input texts must also be at or preferably
below the tested level and therefore not exceed CEFR level A2 (see Council of
Europe 2001, p. 24).
Rubrics
All rubrics that provide the instructions for the tasks are written in English (see
p.41). The language used has been piloted and revised several times. It is under the
candidates expected level of language competence and therefore easily understandable for test takers who have mastered low CEFR levels of A2.
Task types
Speaking tasks can be set in a way that the speakers are asked to produce speech events
independently or collaboratively (Kahn 2008 quoted in Wong & Waring 2010). For
this reason the E8 Speaking Test has been developed in a way that the test takers are
given the opportunity to produce language in a monologue and a dialogue part.
The literature (Brooks 2009, Egyud & Glover 2001, Taylor 2001) discusses various
aspects of the individual and paired peer-approach towards testing and performance
assessment and emphasizes the advantages of the latter as being more natural and
less stressful for the test takers, thus producing better and more elaborated language.
Therefore, the tasks in the E8 Speaking Test have been designed in a way that the
interactional relationship the test takers are engaged in is symmetrical, i.e. the test
takers communicate with each other about familiar topics and the power-distance
relationship between test takers and an adult interlocutor is reduced to a minimum.
The interview
At the beginning of the test an interview serves as a Warm-up with the goal to
break the ice and to make the participants and the interlocutor familiar with each
other. Following standardised instructions the interlocutor asks three to five interview questions to create a friendly conversation between the interlocutor and the two
test takers, similar to the standard teacher-pupil interaction in class the test takers
should be familiar with. The questions in the interview are global ones that will most
likely elicit short answers; questions that require knowledge of the world, embarrassing or ambiguous questions, or yes/no questions are not used. Typical interview
questions are: Whats your name? Where do you live? What are your hobbies? When do
you usually get up in the morning? What do you normally do after school? What do you
normally do at the weekend?
In the interview, questions about topics that feature in the monologue or dialogue
part of the prompt set are not permitted to avoid repetition and putting test takers
at an advantage or disadvantage because they could repeat language used previously.
The monologue
In the monologue part each test taker is offered a choice of three topics. The three
topics vary in E8 BIST descriptors and text types. Moreover, they do not provide any
overlap with the second test takers topics or with the topics used in the interview or
dialogue part.
Each monologue is triggered by six to eight content points that should provide a
guideline for the test takers. However, they are not restricting, and it is not compulsory to make use of or to cover all of them. That is, the test takers can also follow
their own ideas in the presentation of the selected topic.
Standardised repair questions are provided for all content points and some additional
ones are added. These are used by the interlocutors to support the test takers in case
of breakdown of communication or lack of ideas.
The dialogue
In the dialogue part the two test takers communicate with each other. The interaction is triggered by visual and textual cues that provide ideas for the interaction.
However, these do not restrict the test takers in their freedom to make their own
choices in the elaboration of the given topic.
The dialogue part consists of a short and a long dialogue because certain E8 Standards descriptors suggest a short format, while others lend themselves to be used in
long dialogues (see Construct Space p. 36ff).
In both formats visual stimuli and short verbal prompts (see p. 43ff) such as key
words or question starters are used to trigger interaction about the topic. Additionally, the prompts encourage the test takers to use their own ideas.
The test takers are not bound to make use of the question starters or key words in
the prompt, but successful interaction of the test takers is at the heart of E8 Standards assessment. Therefore, the prompts are considered to be a thought-provoking
medium, while the test takers have the freedom to carry out their own solutions. On
the one hand, this gives the test takers the opportunity to make use of their linguis
tic and creative potential. On the other hand, test takers who are used to following
guidelines in their interaction are offered the opportunity to make use of the stimuli
offered by the prompt.
There is no standardised prompting by the interlocutor in the short dialogue because
this would result in the interlocutor talking on a part in the dialogue which is not
desired. Therefore, if one test taker does not communicate, the interlocutor asks the
other test taker to do so. If this also fails, the long dialogue is started.
In the long dialogue the interlocutors are trained to facilitate the interaction in a
standardised way without being intrusive. Contrary to the monologue, where the
interlocutor asks questions or gives stimuli in cases of breakdown, the interlocutor
remains silent and passes repair question slips to test takers who do not ask questions.
This opens up the opportunity for one test taker to read out the question and for the
other to respond. Ample piloting of repair questions has shown that this is less intrusive than the interlocutors direct repair questions, which re-direct the interaction
into an interlocutor - test taker conversation.
17
18
Text types
The communicative genres or more precisely the text types used in the E8 Speaking
Test are listed in the Construct Space (see p. 36ff) in alphabetical order, as they cannot
be automatically matched with any particular E8 BIST descriptor or topic. Instead,
they have to be meaningfully selected in task design to match the E8 BIST descriptor, the topic, and the task type.
The following text types are used in the E8 Speaking Test:
Descriptions
Descriptions say what things, people, places, pets, pictures etc. are like. Mostly descriptions follow a typical structure: first they identify the phenomenon and then
they describe it in parts, qualities, and/or characteristics. In most cases, descriptions
will suggest the use of the present tense, adverbs and adjectives, or comparisons to
help picture the person or object, and employ the five senses in saying how something
or someone looks, sounds, feels, smells, or tastes.
Expository discourse
Expository discourse presents a topic. It does not report events or focus on a

performers actions, but presents a topic in a static way. The information is logically
organised around a theme e.g. Positive and negative sides of life in a big town.
Expository discourse presents a problem, some arguments, a solution, and probably
an evaluation of the solution. In the context of the E8 Speaking Test, expository
discourse is limited to topics that do not require concrete factual knowledge. That is,
the test takers may be asked to present familiar topics such as extreme sports, healthy
nutrition, the life/problems of teenagers, the environment etc. but not a specific
geographical region or place, or an event in history etc.
Narratives or Stories (true or invented)
Narratives and stories are predominantly constructed in past tenses because they
usually happened in the past before someone tells them. The tenses used can be
simple past, past continuous tense, and past perfect tense. Narratives or stories often
focus on a series of events that are mostly presented in a linear sequence.
In speaking, narratives and storytelling often use direct speech to make the listeners
feel, think, and share experiences through the real dialogues of the participants. A
lot of direct speech will change the nature of the language in the monologue (e.g.
incomplete sentences, phrases, chunks, tense switches ).
If storytelling is triggered by pictures, present tenses can also be used.
Personal reports
Personal reports describe the features of events within the experience of the test takers
(e.g. reports about holidays, weekends, sports weeks, excursions, family meetings or
feasts etc.). They generally follow a similar structure (what, when, where, with whom,
why, how) and use facts to explain something or give details about a topic. Moreover,
they can be descriptive. Reports are mainly delivered in the past. If the reports focus
on rituals in the test takers daily lives, present tenses can also be used.
Personal statements
In personal statements the test takers present themselves; they give reasons, talk
about their plans and/or give explanations for them. The age and the life experience
of the test takers limit the topics that can be matched with this text type to such
referring to future education/job/life/ideal place to live/ideal partner or family/free
time or holiday preferences etc. A personal statement will mostly feature present
tense, future tense, or the conditional.
Argumentative discourse
Traditionally, argumentative discourse is a form of interaction in which the individuals maintain opposing positions. In the context of the E8 Speaking Test, however,
the test takers will most likely share similar opinions. Thus, argumentative discourse
will trigger arguments of equal actors engaged in personal, social interaction rather
than such of abstract or conflicting nature and differ from informal discussion in its
more personal content.
Functional discourse
Functional discourse refers to speech acts that engage the test takers in carrying
out concrete social functions such as greeting and departing, expressing feelings like
surprise, joy, regret, interest etc., making arrangements or transactions in shops, post
offices, getting information about travel, asking and telling the way etc.
Thus, the audience of the functional discourse would normally originate in the
public domain (e.g. shop personnel, police, drivers, conductors, waiters etc.).
Although the test takers are familiar with these audiences from carrying out role
plays in English lessons at school, in the E8 Speaking Test they will not take on the
roles of adults. Functional discourse will therefore exclusively feature tasks that ask
two teenagers to carry out social functions.
Informal conversations
In informal conversations personal information is exchanged between people who

are familiar with each other and who are from the personal or educational domain
about topics arising in their daily lives.
Informal discussions
In an informal discussion the test takers will present arguments and information
about a familiar topic from different points of view and they may also phrase a
recommendation as to how to solve a problem or react to a certain situation. Informal discussions in the E8 context can only touch the personal or educational domain
of the test takers and will exclusively focus on familiar topics. Informal discussions
differ from argumentative discourse in the level of formality and in product orientation (recommendation, problem solving).
Text types and audience
The text types mentioned in the previous sections will require different audiences to
be addressed. Although EFL lessons offer multiple opportunities to simulate situations from the personal, educational and public domain, the testing situation must
not put test takers at a disadvantage by putting them into roles that are very different
19
20
from their range of experience or which might make them feel ashamed or shy and
thus prevent them from showing what they know.
Therefore, the E8 Speaking Test does not go beyond the typical scenarios the test
takers are used to experiencing in EFL education. Moreover, they will not be asked
to take on roles that do not reflect their real age, i.e. they will not be asked to speak
as parents, teachers, ticket clerks etc.
Prompt writing and prompt difficulty
Validating the content of a test must also be concerned with the question if the tasks
are a representative sample of what the test takers are familiar with from lessons that
teach speaking and if the difficulty of the tasks is similar.
In order to take care of this aspect, the prompts are exclusively written by practising
qualified English teachers who are also trained as interlocutors and assessors. They
are familiar with the test construct, the theoretical model of speaking, and the test
specifications. Moreover, they apply their experience as experts who have current and
intensive contact with the target group.
Prompt writing is carefully trained and follows guidelines. The teachers collaborate
in pairs and produce first drafts, which are screened by tandem pairs. In this way four
qualified teachers have given their feedback on the prompt before they are screened
by expert E8 BIST trainers who moderate editing if necessary. The completed prompt
sets are pre-piloted with learners of the target group by the authors, who function as
interlocutors and assessors in the pre-piloting. During this phase last adjustments to
repair questions and prompts can be made. If this is the case, additional screening by
the trainers is required.
A second pre-piloting takes place during the second interlocutor/assessor training,
when these prompt sets are used with pupils from a school other than that of the
prompt author. Again, adjustments can be made before these prompt sets are stored
in the BIFIE item bank where they are ready to be piloted a last time under real test
conditions in the year before the actual exam.
Prompt writers are instructed to generate prompt sets that are similar in construct
and ideally identical in the anticipated difficulty for both test takers and in comparison to other prompt sets. However, there are some variables that cannot be
controlled. Test takers who have never encountered the topic or even thought about
it or who do not yet have an opinion about it and have to perform on it in the
course of the test may certainly find the task more demanding than test takers who
have already had experiences with the required content. However, if the appropriate
strategic competences asked for in the task (e.g. describing, turn-taking, questioning
etc.) are available to test takers they can succeed in such tasks, even if they do not
have a wide range of linguistic resources available for the topic.
3.3 Standardising the setting

Similarly to the task, the setting of a test has an impact on the performance. It therefore seems important to provide a setting that will interfere as little as possible with
the performance and that will thus create as little measurement error as possible.
By setting we understand the local performance conditions defined by the physical
setting as well as the role of the interlocutors, their characteristics, and their training.
3.3.1 Interlocutor/Assessor characteristics

The interlocutors/assessors working in the E8 Speaking Test project are all practising
teachers with a qualification in English as a main subject and teaching at Austrian
schools. They must be trustworthy and accurate and be able to work under pressure.
In the Austrian E8 Speaking Test project, teachers are carefully trained to act effi
ciently in three roles: prompt writer, interlocutor, and assessor.
All three roles require continuous and carefully sequenced input and practice. This
is why all parts of the face-to-face and on-line training sessions feature each role in
a progressive mode, i.e. skills are presented and practised continuously and system
atically.
3.3.2. Interlocutor/Assessor training
The training stretches over a period of approximately six months and consists of four
phases:
1.
2.
3.
4.
Phase One: Face-to-Face (F2F) Meeting 1

Phase Two: Online Training 1
Phase Three: Online Training 2
Phase Four: F2F Meeting 2
Phase One: F2F Meeting 1
In Phase 1 the trainees are made familiar with communicative competence in the
CEFR, already mentioned in Chapter 2, and the E8 Speaking Test Specifications,
which will be dealt with in detail in the next chapter (see p. 35ff). To set them up
in their role as assessor, the trainees are acquainted with the construct of the test
and the CEFR scales for the assessment of spoken interaction and oral production
before they study the descriptors of the E8 Speaking Assessment Scale (see p. 42),
after which they assess several examples of video recorded benchmarked E8 Speaking
Tests performances.
Throughout the training assessors are given feedback on their assessor behaviour in
relation to the group and can thus reflect and adjust their assessments towards a more
homogeneous behaviour with the help of anchor performances and justifications
that can be used for individual standardisation practice. Moreover, in the assessment
of the E8 Speaking Test multiple-ratings will be collected through the assessment of
a representative sample of performances by the whole assessor population on-line
and thus assessor behaviour (harshness or leniency) will be adjusted through multifaceted Rasch analysis.
To prepare for their future role as an interlocutor the trainees are provided with guidelines for interlocutors and interlocutor behaviour, followed by reflected individual
and group analyses of video recordings of perfect and flawed interlocutor behaviour.
In order to practise their dual role as interlocutors and assessors the trainees learn to
set up the seating arrangement according to a standardised plan (see p. 23) and carry
out test simulations with their peers, who provide feedback on individual interlocutor behaviour in group discussions.
Finally, to help them with the tasks they have to carry out in Phase Two, they are
presented with the intricacies of prompt writing discussed in the previous chapter,
and are provided with guidelines on how to go about writing their own prompts.
21
22
Phase Two: Online Training 1
In the second phase of training the trainees work together in pairs to design and
produce one speaking prompt set (see p. 43ff). To assist them in this task, a second
pair of trainees (tandem pair) moderate and edit the prompt set before it is sent to
the trainers for a final stage of moderation.
Once the prompt set has been passed by the trainers, the trainees carry out trial
speaking tests in one of their schools with eight pairs of fourth year pupils. A select
ed number of these speaking performances showing various competence levels are
assessed, justified and reflected upon in pairs. At this point the interlocutors and
assessors experiences with the prompts are discussed and analysed. If trialling has
uncovered flaws in the prompts quality, more screening and editing takes place.
Phase Three: Online Training 2
During this stage of training the trainees assess eight to ten speaking performances
that are made available to them via a secure online platform. The trainees submit
their scores on the benchmarked performances to the trainers, thereby providing
data to determine inter-rater and intra-rater reliability.
Phase Four: F2F Meeting 2
In this final phase of training the trainees go through a phase of standardisation with
an emphasis on the implementation of the prompt sets in a mock E8 Speaking Test,
referred to as prompt familiarisation, and on interlocutor behaviour. Three or four
prompt sets from within the whole training group and/or the item bank, selected
by the trainers, are pre-piloted with a larger cohort of pupils that the trainees have
not met before to simulate an authentic E8 testing situation. Each trainee receives at
least one opportunity to act as an interlocutor and conduct a speaking test. During
the subsequent tests, the trainees either assume the role of assessors, whereby they
assess several speaking performances, or they observe their peers acting as interlocutors. They thus receive feedback on their interlocutor behaviour and can adjust it if
necessary.
In a future F2F meeting, shortly before the actual E8 Speaking Tests take place, the
trained interlocutors and assessors go through another standardisation and prompt
familiarisation phase.
3.3.3 Physical setting
In addition to the test procedure that is guided by standardised interlocutor behav
iour, the physical setting of the E8 standards test has to be standardised in order to
create an environment that will make the results reliable because all test takers are
tested in a very similar set-up.
The tests are carried out at the test takers school, which provides them with a familiar
environment. The head teachers of the schools are asked to choose rooms that are
well lit, well-aired, friendly, and undisturbed. They are also asked to leave two chairs
outside the testing room for the next test takers waiting for their turn.
The interlocutors arrange chairs and two tables so that they have enough space to
arrange their testing materials and that the test takers sit facing each other and facing
the interlocutor (see Figure 2). In the dialogue part the arrangement should allow for
the test takers to look at each other.
The assessor sits outside this arrangement but must be able to see the test takers
faces.
Candidates
Interlocutor
Assessor
Figure 2: Test seating arrangement
The interlocutors arrange

prompt
cards, question cards, and repair slips
Figureinstructions,
2: Test seating
arrangement
in a way that they can find them easily and quickly and they make sure the repair
slips cannot be seen by the test takers before they need to use them. A (stop-) watch
(with second hand) is brought by the interlocutors for time measurement. This is
done as discretely as possible to avoid creating a feeling of time pressure for the test
takers.
3.4 The test takers

There are many challenges interlocutors and assessors of oral performances are faced
with. The previous sections have provided insight into how the training aims at
minimizing the impact of interlocutor behaviour and physical setting on the performance. This section will look more closely at test taker variables and the factors that
influence their performance as well as how the E8 Standards Test deals with these
issues.
Davies et al. suggest that a wide range of variables may significantly influence test
performance or produce measurement error and thus affect the validity of the assessment. These may include language background, age, sex, educational background,
background knowledge, affective reactions to test taking, level of proficiency in
the target language and familiarity with the test method. (1999, p. 208)
23
24
While physical/physiological variables4 like age and sex can be considered to have
little bearing on performances because the test takers are all of similar age and attending year 8 classes of Austrian schools, cognitive variables like language background, educational background, and background knowledge may have a stronger
diversifying impact on performances.
Affective or situational reactions to test taking such as motivation, physical disposition, as well as factors such as learning strategies and styles, attitude, extrover
sion, introversion, anxiety, personality, or risk taking (Bachman 1990, Davies et al.
1999, Kunnan 1995) can hardly be controlled in a testing situation. Nevertheless,
the following insights from research have been taken into consideration in E8 Standards Testing: Berry (1994) researched the effect of introversion and extroversion on
paired speaking test performance. The results suggest that introverts perform better
in homogenous pairs and in tests with interlocutors than if paired up with extrovert
test takers. Luoma (2004) suggests that test takers who know each other very well
tend to speak less than those who are not too familiar with each other and that
acquaintanceship has a stronger impact on performance than a mismatch in proficiency level. For this reason, it seemed appropriate that the test takers and/or their
teachers should be allowed to choose the peer partners for the E8 Speaking Test in
order to rule out disadvantages caused by individual characteristics discussed above.
We thus expect the effects of personality, culture-specific variables, proficiency levels,
and acquaintanceship to be reduced to a possible minimum.
As much as introversion may have an impact on a test takers performance, lack of
motivation may also result in scores that do not match the actual ability of a test
taker. As the E8 Standards Test is a low-stake test with no direct bearing on the
takers school career, lack of interest in a good performance can prevent test takers
from showing what they actually can do. In order to avoid the undesired situation
where examinees do not approach the testing situation in the expected manner and
thus threaten the validity of results (Henning 1987), it must be the aim to take any
possible measure to foster motivation and to avoid hostile or negative reactions to the
content and format (Fulcher 2003). This can be achieved by making the test takers
familiar with the content and the format.
At this point it has to be acknowledged that most test takers will not have experienced many formal tests in speaking. Apart from rote-learnt role-plays, rehearsed
presentations or book-and film-presentations, which become part of continuous assessment, teachers hardly ever test speaking. Moreover, teachers do not often assess
their pupils pair work. Therefore, it can be expected that the situation of being tested
in speaking will be a new experience for most of the learners.
However, it is hoped that teachers will make use of published testing materials in
order to support their pupils familiarity with the test format and the test procedure.
Prompts, video recorded pilot tests, and the instructions used by the interlocutors
are available at the BIFIE homepage and it is therefore possible for teachers to show
and practise the testing situation with the learners: (Available at: https://www.bifie.
at/node/1821)
Moreover, test takers who have attended eight years of education in Austria and used
the accredited course books will have encountered similar speaking tasks and should
have reached a level of linguistic competence of at least A2 according to the CEFR in
oral production and spoken interaction in the FL as suggested in the curriculum. In
4
Provisions for test takers with special needs are still to be developed (i.e. instruction cards in large fonts, technical support for the hearing impaired etc.)
favourable situations they may even have reached CEFR level B1. Additionally, the
test takers sociolinguistic competence should cover the linguistic markers of social
relations and politeness conventions asked for in the E8 Standards Test.
More culture specific aspects of sociocultural and sociolinguistic competence, which
are part of EFL and certainly important go beyond the possibilities of standardised
testing, are therefore not a requirement for the E8 Standards Test.
Like linguistic competence, making conscious and strategic use of pragmatic competence (discourse competence, functional competence, and design competence) is
required by the curriculum and thus the test takers are expected to be competent in
engaging in interactive speaking tasks that ask them to carry out various communicative functions. Moreover, the test takers will most likely have held planned presentations in their EFL lessons and thus show design competence.
Finally, the presence of a trained person who encourages communication in a standardised way is the big advantage of the E8 Speaking Test. While in all other skills
the test takers are left alone with the task, speaking provides the opportunity for
the interlocutor to promote participation. Moreover, the contribution of the paired
set-up to motivation and participation generally has a positive impact on the performance.
3.5 Standardising the construct: construct validation

According to Alderson et al. (2004, p. 171) [c]onstruct validation refers to what the
test scores actually mean, what they tell us about the examinees.
In the following section, the construct space of the E8 Speaking Test will be presented and the assessment criteria will be explained in order to demonstrate the provi
sions that have been made to support the construct validity of the E8 Standards Test
and to explain what information can be gained from its results.
As the discussion about the interpretation of the quality of oral performances had to
begin before the selection or development of a test, the purpose of the E8 Standards
Test, i.e. the information on what was hoped to be learned about the test takers
competences, has guided the development of an assessment scale which describes
how the competences are displayed at certain levels.
In order to link the competences defined in the construct to objectives that could be
judged in a standardised way, a procedure similar to that described by Hanny (2000)
was pursued:
First, the purpose of the assessment (finding out about communicative competences)
was matched with objectives (can-do statements) as suggested by the E8 BIST. This
was achieved by correlating the E8 BIST with the objectives of the ANC and the
can-do statements of the CEFR; next, assessment criteria that would address the objectives were developed. Piloting the criteria of the E8 Speaking Assessment Scale in
Assessor Training sessions and making use of them in a Benchmarking Conference,
the band descriptors were revised several times.
The categorisation of criteria for the assessment of the test takers communicative
competences as defined by the construct led to the development of a four-dimen
sional assessment scale.
25
26
Figure 3: Developing assessment criteria
This comprises the following dimensions:

Task
achievement and communication skills

(assessing pragmatic and sociolinguistic competences),
Clarity and naturalness of speech
(assessing linguistic and pragmatic competences),
Vocabulary and
Grammar
(both assessing linguistic competences).
The above model only partially describes the evidence that is collected in the assessment of spoken performances. The purpose of the assessment encompasses the
evidence that can be judged according to the construct criteria that are based on
the speaking model, i.e. the communicative competences. These criteria surface as
the can-do statements of the E8 BIST and the CEFR respectively. However, the E8
BIST cannot be judged in a vacuum. Thus they appear in combination with content
that has to be judged in combination with the E8 BIST, a list of topics from the
ANC, the text types appropriate for eliciting the content and the competences, the
communicative functions of the tasks, and the audiences to be addressed. All these
components frame the construct space, which gives information about what has to
be considered in the assessment.
Validity evidence on the basis of the construct space was reflected before and examined after the preliminary assessment scale was established. The construct space
and the scale were revised according to information collected during the process of
piloting and benchmarking. Tables 14 (see pp. 36ff) display the Construct Space
that was considered in the development of the E8 Speaking Assessment Scale (Table
6, p. 42), which comprises four dimensions and seven bands for each dimension.
In the development of the Speaking Assessment Scale clear criteria for the assessment
in four dimensions and at seven bands, i.e. levels, were established. Descriptions that
have been derived from CEFR scales are available for bands 1, 3, 5 and 7 for each of the
four dimensions. Bands 2, 4 or 6 are awarded if a performance is better than one of the
described bands but not good enough to be awarded the next higher one. These band
descriptors are used to guide the process of assessment. Each dimension receives a score
within the seven bands. Although assessment criteria do not completely eliminate variations between assessors, a well-designed scale can reduce the occurrence of discrepancies
in combination with careful training of the assessors (Moscal 2000).
A sound understanding of the test construct and its assessment scale is likely to improve both inter-rater and intra-rater reliability. Therefore, assessors must be made
familiar with the construct and the scales making use of benchmarked performances
and written justifications which exemplify consistent assessment based on set criteria.
The discussion and comparison of written justifications by the assessors and the
benchmarks are important in two ways: firstly they help the assessors adjust to the
standardised assessment scales and the common understanding of their bands and
secondly they unfold possible implicit criteria that may have been applied by the
assessors but that are not stated in the scales (e.g. In my class this would be a top performance, so this must be a high band ...). Identifying the implicit criteria they may
have been using can help the assessors refine their understanding and application of
the scales for future assessments.
In this way, the justifications of the benchmarked performances demonstrate how
the assessment criteria can be directly related to performance criteria. Moreover, they
exemplify the differences between the categories at certain levels through performances. Thus, the aim of the assessor training is to guide the assessors in a way that
they arrive at independent scores based on the band descriptors within a maximum
of +/ one band for a given performance.
One method of further clarifying the E8 Speaking Assessment Scale and to raise the
level of awareness and recall shortly before the test is through the use of anchor performances. Anchor performances are a set of carefully selected benchmarked responses that illustrate the nuances of the categories. These will be presented at the standardisation meeting prior to the mock test before the actual test and made available on
a secure platform so that the assessors may refer to the anchor performances shortly
before the assessment process. This should re-enforce the standardisation and give
the assessors the chance to remember the anchor performances and the assessments
when this information is really needed. This opportunity for individualised recall is
important because the organisation of the E8 Speaking Test requires a time-slot of
several weeks within which the assessors may have to operate.
3.5.1 The Assessment Scale
In order to make the criteria of the assessment scale result in valid interpretations of a
response it is necessary for the criteria to be related to the purpose of the assessment.
Therefore, the criteria should be defined in a way that any given response would
receive the same assessment regardless of who the assessor is or when the response is
assessed.
Therefore, the descriptors of the analytic assessment scale that assessors work with in the
context of the E8 Speaking Test have been carefully designed and linked with the construct to report about the test takers abilities in four dimensions (see p. 41f ).
The E8 Speaking Assessment Scale is applied by the assessors in situ, i.e. the assessment has to be achieved during the test takers performances. This constraint was
considered in the initial development of the E8 Speaking Assessment Scale and taken
seriously in the adaptations of the scale during piloting and benchmarking which
resulted in a shorter and more user-friendly version.
To provide feedback on the test takers communicative competence, the most significant competences needed for speaking as defined in the test construct (see p.36)
are assessed in the following dimensions: task achievement & communicative skills,
clarity & naturalness of speech, grammar, and vocabulary. Due to the above mentioned constraint of in situ assessment the three parts of the E8 Speaking Test, the
monologue, short dialogue, and the long dialogue are assessed holistically, i.e. the
27
28
test takers are awarded one score on each of the four dimensions. The following interpretative descriptions of the four dimensions of the E8 Speaking Assessment Scale
add to the reliability of results in the sense that the judgements are based on defined
categories and band descriptions.
Task achievement & communication skills
In task achievement and communication skills the information the test takers provide (propositional precision, in all parts), the quality of the narrative (thematic
development, primarily in the monologue part) as well as the ability to interact with
a partner (turntaking, primarily in the dialogue part) are assessed.
Propositional precision refers to the information that is communicated in the performance as well as to the successful completion of a communicative speech act. In
propositional precision we ask ourselves: What is the information we get like? Is it
detailed, concrete, limited, or more or less non-existent?
In the monologue part the test takers are asked to give information about a given
topic. In addition they are provided with content points. Thematic development
primarily refers to the monologue part. It deals with the way the speaker develops a
speech act with respect to the given theme. It is to do with the elaboration of ideas
and the narration. If individual ideas (main points) are expanded with relevant detail, thematic development has been very successful.
At the other end of the scale, in basic statements at word or word group level, themes
cannot be developed.
From the linear design of the prompts we can expect the test takers to address the
content points in the sequence that they appear on the prompt cards. However, the
order is not set and therefore test takers may incorporate them into their spoken
production in a random order. The content points are to be seen as guiding points
for the test takers, to help them to speak freely for two minutes about their chosen
subject, but they are not mandatory and test takers are not penalised for not address
ing them. The assessor must concentrate on the overall amount of information that
the test taker is able to pass on and its quality and evaluate it according to the assessment scale. We expect test takers to talk about the topic they have chosen and to give
information that is relevant to the topic. Test takers may even choose one content
point only, but if they give varied information on it they can still reach high bands.
The repair questions provide a guideline of what we would expect the test takers to
talk about in a sufficiently solved task. As the test takers are supposed to produce a
flow of discourse in the monologue section, and not interact with the interlocutor,
it will not be possible to assess the true level of the candidates communication skills
here. If, however, they do interact by asking for the translation of a German word in
English (e.g. What is Schlger in English?) they should receive the support necessary
to carry on. What we can expect in this section are the use of discourse markers such
as well; like; actually; generally; of course; you know; that will reflect the level of test
takers competence in communication skills.
In turntaking we assess the test takers ability to interact with each other. This can be
seen as the ability to begin, maintain, and end a conversation. The test takers may
use prefabricated chunks, stock phrases, discourse markers, or formulaic language in
doing so. If the test situation does not allow for beginning or ending the conversa
tion the lack of evidence for this does not necessarily lead to downgrading. If effective turntaking has been found in the conversation, high bands can still be awarded.
In the short dialogue we can expect the test takers to exhibit turntaking skills in order to achieve the task which may be an invitation, an excuse, a purchase, a decision
making process (e.g. which film to watch) etc.
We can thus expect the test takers to show, in a guided way, the extent to which they
are able to initiate, maintain and close a conversation and how effective they are
when doing this. Good speakers will have no problems formulating the necessary
questions to accomplish the task. Utterances containing suggestions (e.g. Would
you?), agreement (e.g. Me too.), or disagreement (e.g. No, I dont.) and their
quality will also indicate communicative competence. Other indicators of communicative competence will be the use of stock phrases such as of course and not at
all and the frequency of their use.
In the short dialogue the test takers are asked to accomplish a functional discourse.
The detail of information may be limited by the task, therefore the successful completion of the communicative function is the element we are assessing. The functional
aspect of the short dialogue requires the test takers to come to a defined result.
Bearing this in mind, it is likely that the test takers will refer to all the points in the
prompt, because they should succeed in fulfilling the function that is required.
The long dialogues are guided by question prompts or key words that serve the same
function as the content points in the monologue. They are stimuli but not compulsory items to be dealt with. If the test takers develop a conversation about the topic
following their own ideas the task can still be rich in the quality of information we
get.
Unlike the linear designs of the monologue, the long dialogue prompt is cyclical and
there is no telling which content point, (if any), a candidate will address first. As in
the monologue section it is not mandatory to address all the content points. At E8
we can expect good speakers to discuss many of the points, perhaps even all of them,
and to discuss them in some detail. However, the fact that the test takers should
interact with each other, and may in some cases even interrupt each other, it is less
likely that they will have the opportunity to provide too much detailed information
before they are confronted with another point by their partner.
As soon as one candidate has started the conversation and the other candidate has
replied, the decision to initiate, maintain, and end parts of the conversation lies in
the hands of the individual candidates, unless there is a marked imbalance or breakdown of communication and the interlocutor intervenes. Speakers with good communication skills will try to provide a good balance between their verbal input using
learnt phrases such as I think; In my opinion etc. We can expect good speakers to
use phrases such as Me too; I agree/disagree; Really?; Cool etc. when reacting to
their partners utterances. And finally stock phrases such as And what about you?
What do you think? Whats your opinion? will be employed by good speakers to
encourage verbal output from their conversation partners.
Generally speaking the prompts, content points, key words, and question prompts
are there to make the test takers talk. If they find their own ways of solving the task
and the information we get is appropriate and rich this is equally valuable.
In the assessment of task achievement & communication skills the test takers are
allocated one of seven bands.
Band 7 performers give detailed information and are able to expand main points by
relevant new elements. They are effective in turntaking.
29
30
Band 5 performers give concrete information that is clear and they develop a straightforward narrative in the monologue part. They achieve basic turntaking and can initiate, maintain and close a conversation using stock phrases.
Band 3 performers give limited information and in the monologue they give a simple
list of points at sentence or word-group level. They can ask questions effectively
in the dialogue parts. The test takers may partly rely on the interlocutors support
through repair questions to keep going or to come up with some more information.
Band 1 performers give very little information and cannot go beyond simple statements or negations on word or word-group level in the monologue part. This will
mostly result from the fact that they cannot develop a narrative independently and
rely on the interlocutors repair questions to come up with some information. They
make attempts to ask questions (e.g. raising intonation) but are not effective in
questioning. The interlocutor may have to use the repair question cards to keep the
dialogue going.
Clarity & naturalness of speech
A performance is considered natural and clear if the pronunciation is intelligible and

the intonation makes it sound natural. In order to achieve this, performances have to
reach a certain level of fluency and phonological flow. The natural flow of language
in fluent speech is accompanied by the seemingly effortless selection of elements by
an individual speaker and the ability of the other participant(s) to converse appropri
ately on topics. In doing so, the participants of fluent conversations retrieve chunks
and provide interactive support to the flow of talk, helping each other to be fluent and
creating confluence in the conversation. Thus they are able to express ideas appropri
ately, coherently and speak at an appropriate pace and use pausing at expected points.
In the E8 Standards tests, clarity and naturalness of speech surfaces as phonological
flow in the sense that natural and clear pronunciation and intonation should make it
possible for native speakers of English to understand the test takers messages with
out making compromises or too many guesses on meaning.
In the monologues the speakers are expected to speak fluently and naturally for
two minutes and their narrative should flow in the sense that it is as coherent and
cohesive as unplanned speech can be. That is, we cannot expect elaborated, complex
sentences or backward and forward referencing of the quality of a written text, but
we expect the test takers to use simple connectors (and, but, because, first, then, later,
at last, personal pronouns etc.) and possibly some stock phrases that highlight the
beginning, the main part, or the end of their presentation (I have chosen the topic
..., the most important thing..., what I like best is..., all in all this was ..., finally I
would like to say that ...). In dialogues discourse markers (well, you know, right
...), formulaic speech (have a nice day, see you, and you ...) as well as pre-fabri
cated chunks and phrases (would you like a ...?, the thing is ..., are you with me? ...)
make spoken language fluent and compensate for grammatical or lexical planning.
In the assessment of clarity & naturalness of speech the test takers are allocated one
of seven bands.
Band 7 performances that sound clear and natural are fluent and spontaneous. The
performances are delivered at a fairly even tempo and pauses are naturally placed.
The speakers will produce longer stretches of language (especially in the monologue
part) with pronunciation and intonation that make the performance sound natural
and clear.
The performances of band 5 speakers show some degree of fluency, although some
pausing for lexical or grammatical planning can be necessary. The speakers produce
connected stretches of language that are long enough for pronunciation and intonation to sound intelligible, although sometimes with a foreign accent. At this level
some mispronunciations that do not impair communication can be tolerated.
A band 3 performance is interrupted by noticeable pauses, hesitations and false
starts, which sometimes cause breakdown of communication. The contributions are
short and intelligibly pronounced, too short, however, to develop natural intonation.
Foreign accent or mispronunciation may sometimes impair communication.
In a band 1 performance the speaker is very hesitant which frequently causes breakdown of communication. This may not necessarily be caused by pronunciation
problems, but the very short and isolated utterances or frequent mispronunciations
may either not allow for an evaluation of pronunciation or make it hard for native
speakers to understand the message.
Grammar
The scale for grammar comprises descriptors for range, control, and the clarity of
the message. Therefore, the assessors evaluate the ability to make use of a range of
grammatical structures, the level of their accuracy as well as their impact on the
message. The focus is on grammatical forms that create meaning and that are reason
ably correct to accomplish successful communication. In addition, the assessment
of grammar in the E8 Speaking Test considers the nature of grammar in unplanned
speech (see p. 11).
Although there is some planning time, speech production in the E8 Standards Test
takes place in real time and is therefore considered to show the characteristics typical
of unplanned speech. Thus, the performances are expected to be linear and the test
takers will mostly use an add-on strategy of stringing short idea units together. While
we generally expect complete sentences in the monologue, the dialogues will primar
ily feature incomplete sentences, word groups, short phrases, or chunks. We have to
acknowledge that incomplete utterances (Could be), ellipsis (Sounds like a good
idea), syntactic blends (utterances that blend two grammatical structures as in Ive
been to London last year), or vague language (kind of machine) are natural.
Moreover, present, simple, or active verb forms, will, would, can, personal pronouns,
and determiners are frequent; past forms, perfect forms, and the passive are rare.
In the context of E8 Speaking, grammatical range must be seen in relation to the
above-described nature of grammar in unplanned speech and the standardised tasks
of the E8 Speaking Test. On the one hand, we will expect the test takers to use structures that are meaningfully elicited by the task. On the other hand, spoken language
produced in real time has its special features. The speaking prompts focus exclusively
on familiar topics and have been designed in a way that all ability levels have a good
chance to succeed in the speech act. Thus, they are as straightforward in their set-up
as possible. However, this does not suggest that the response cannot exceed the complexity of the stimulus. Even if a task is simple in nature, we expect differentiation
in grammatical forms or sentence types. Verbs, for example, can be modified, mark
aspect, and determine various types of sentence function such as statement, ques
tion, negation, command/directive, or exclamation.
In the E8 Speaking Test range overrules accuracy in the sense that rich grammatical range through risk taking is encouraged, while inaccuracies that do not impair
meaning play a minor role. The more varied the grammatical range, the higher the
31
32
band. Risk taking which results in rich structures, but reduced control, does not
automatically lead to downgrading.
Local errors that do not hinder communication will not cause downgrading unless
their frequency impairs the message. Only global errors that interfere with the comprehensibility of the text will result in downgrading or the placement of a text at a
low band.
Test takers are encouraged to make use of their full potential and the more creative
the structural features they show, the better. Nevertheless, the use of variation should
not be exaggerated either. The tasks suggest certain scenarios that require special
structural solutions. These should produce authentic and natural variation, but not
artificial language.
The placement of a performance at a certain band reflects the range of grammatical
structures and the level of their correctness within a meaningfully and successfully
accomplished communicative task.
The monologues are designed in a way that test takers at A25 or B16 level have a good
chance to succeed and demonstrate their grammatical range appropriately. Short
dialogues are meant to be A2 tasks and the range of grammatical structures that is
likely to be elicited in such tasks comprises structures typically mastered at A2 level.
Long dialogues have the potential to elicit B1 language and, as a consequence, also
grammatical structures representative of that level.
Band 7 performances feature good grammatical range that creates meaning and
natural language within the framework of the task. The speaker varies the grammatical structures the prompt elicits and may occasionally go beyond the obvious and
expected. However, any enhancement should not make the message sound unnatural
or result in exaggeration of grammatical structures (range for the sake of range). In
addition to good range a relatively high degree of grammatical control is expected. A
few inaccuracies can occur but they will not impair communication.
Band 5 performances show sufficient range of grammatical structures. Sufficient
range is achieved, if the speaker makes enough use of the prompts structural features
to make the required communication successful and if the grammatical forms create
appropriate meaning. Occasional inaccuracies that can impair communication can
be tolerated.
Band 3 performances feature a limited range of simple grammatical structures. This
means that the grammatical structures are just enough to achieve successful communication. Mostly they are very simple, repetitive, and hardly varied. Performances at
band 3 can be frequently inaccurate and may show basic mistakes. However, these
mistakes generally do not cause breakdown of communication.
Band 1 performances feature an extremely limited range of simple structures. This
usually forces the speaker to compromise the message regarding meaning, content,
and naturalness of language. Extremely limited range results in structures that are
repetitive and follow very simple Subject-Predicate-Object sentence patterns. The
structures hardly go beyond the learnt repertoire of beginners. In addition to structural
5
6
For an inventory of grammatical areas at A2 level see KET Handbook, p. 89. Available at: http://www.exams.
ru/docs/ket_handbook.pdf.
For an inventory of grammatical areas at B1 level see PET Handbook, p. 78. Available at: http://www.exams.
ru/docs/pet_schools_handbook.pdf.
restrictions, band 1 performances show limited control, which frequently causes

breakdown of communication.
Vocabulary
To assess vocabulary in the E8 Speaking Test the assessors look at content words
(nouns, full verbs, adjectives, adverbs), collocations and chunks of language that a
speaker uses to fulfil a communicative task. They assess the range of lexis that creates
meaning and manages to accomplish successful communication and control, i.e.
the level of accuracy. In doing so, the assessment of vocabulary, as the assessment of
grammar above, considers the nature of lexis in unplanned speech (see p. 11).
Vocabulary range refers to the breadth of vocabulary the speakers use in their performances. In the E8 Speaking context, range must be interpreted in relation to
the prompt, as the assessors can assess only the vocabulary actually elicited by the
prompt, and the constraints that real time performances provide.
Although the notion of vocabulary items is not limited to single words but rather
stretched to include lexical phrases, formulaic language, collocations, discourse
markers, and chunks, which provide good opportunities for speakers to show what
they know, we have to acknowledge the fact that the number of words a language
learner at beginner level needs to control in speaking (and listening) is fifty percent
smaller than in writing (and reading) (Thornbury & Slade 2006, Thornbury 2009).
However, even if the E8 Speaking tasks are simple in nature we may expect differentiation within choice of lexical elements. For example, if a task asks for a narrative
description about the first few days at a new school, the oral production will primarily contain words related to school, teachers, subjects, new friends etc., which,
however, can be varied and modified. Equally, spoken interaction can be varied by
the use of stock phrases and well-placed discourse markers. Although the prompt
language is as simple as possible, speakers may well exceed the prompt stimulus in
their performance.
It is not enough for a speaker to use a large number of different words in a performance to achieve a high band in assessment. The words a speaker chooses must be
relevant and appropriate to the topic and used in a way that messages are communicated meaningfully. A top speaker will use vocabulary that is generally accurate
enough to formulate even more complex ideas with clarity. Speakers who stay in absolutely safe language areas (e.g. language picked up in years one and two) and avoid
taking any risk have less evidence of mistakes. However, it is E8 policy to encourage
the test takers to venture out of their safe language zone by rewarding risk taking to
communicate successfully.
In the assessment of vocabulary the test takers are allocated one of seven bands.
Oral performances that show a good range of vocabulary at band 7 contain a good
selection of content words and phrases that demonstrate that the speakers are able to
express clear and precise ideas and occasionally even vary formulations so as not to
appear repetitive. We may well expect one or the other expression to stick out and
exceed what we typically expect from test takers at this level.
Band 5 performances contain a sufficient range of mostly high-frequency words that
again meet the need to communicate clear ideas and are generally used accurately.
There may be some occasional mistakes, particularly when the speaker is trying to
communicate a more complex idea.
33
34
In a band 3 performance we expect the lexical range to be limited, containing only

a rather narrow repertoire of high-frequency words, but still the simple ideas that
are communicated are mostly understandable, even if there is a certain amount of
inaccurate vocabulary.
Finally, in a band 1 performance a speaker with extremely limited lexical competence
in English will demonstrate this by including only a few very high-frequency content
words which are more often than not inaccurate and inappropriate. We commonly
expect band 1 speakers to compensate for their lack in lexical range by interspersing
their production with fillers (ermm, ahh ...) or L1 words in order to keep going,
thus having the knock on effect of frequently causing breakdown in communication.
3.5.2. Test taker feedback
The purpose of the E8 Speaking Test is to identify strengths and weaknesses in the
speaking competence of Austrian pupils at the end of year 8 with system monitoring
rather than certification or selection at the level of individual test takers in mind.
Although the test results of the E8 Speaking Test are linked to the CEFR in the feedback, critical cut scores on which to base selection decisions need not be established
by the test constructors.
The information that results from the E8 Standards Speaking Test is reported on the
four dimensions of the Speaking Assessment Scale (Task Achievement & Communication Skills, Clarity & Naturalness of Speech, Grammar, and Vocabulary). The
results for each dimension are reported on a scale from 0 to 7; this enables reference
to the CEFR up to B1. Assessments are adjusted for differences in assessor severity
and task difficulty by means of multi-faceted Rasch analysis. The results are therefore
comparable across all test takers regardless of which assessor rated the performance
and what particular prompt the performance was based on.
In compliance with political requirements, the way feedback on the results is given
to test takers and other stakeholders is being developed at the moment. The process
of standard setting and CEFR-linking will be described in more detail in a technical
report after the actual test in 2013.
4 E8 Speaking Test Specifications

While the previous sections have focussed on the theoretical background of the E8
Speaking Test and on issues of validity and reliability, this chapter deals with the test
as a physical construct.
4.1 Purpose of the test

The aim of the E8 Speaking Test is to identify the test takers strengths and weak
nesses in communicating naturally in settings (tasks) that resemble as closely as possible the real life language usage of this age group.
4.2 Description of test takers

The test takers are Austrian pupils in General Secondary School, New Middle School
[Allgemeinbildende Pflichtschule (APS)] and lower Academic Secondary School
[Allgemeinbildende Hhere Schule (AHS), Unterstufe] towards the end of grade 8
(8. Schulstufe). Pupils from all three ability groups in APS will be tested. The majority
of test takers will be aged 14.
4.3 Test level

The difficulty level of the test is supposed to encompass levels A2 to B1 in the
Common European Framework of Reference (CEFR).
4.4 Test Construct

Since the purpose of the test is to provide feedback on the test takers communicative competence, the most significant competences needed for speaking have to be
defined:
an
appropriate response to the task, the adequate use of devices that create coherence and cohesion characteristic of oral communication, and turntaking (task
achievement & communicative skills)
the ability to produce clear and natural speech by using standard pronunciation
and stress and by producing fluent utterances (clarity & naturalness of speech)
the test takers linguistic competence demonstrated in the choice of vocabulary
that has a certain range and is accurate, and the adequate use of a range of grammatical structures reflecting the nature of lexis and grammar in unplanned speech
(grammar; vocabulary)
Moreover, the Construct Space, which is to be used to construct tasks, has to be
specified (see Table 14, pp. 3639). It lists the E8 BIST, the topics from the ANC,
the spoken text types, the speaking purpose/communicative functions, the context/
audience, and the CEFR descriptors that the E8 BIST can be linked with.
35
Prompt
Type
etwas Reales oder

Erfundenes erzhlen oder
in Form einer einfachen
Aufzhlung berichten (A2+)
ber Sachverhalte und

Ablufe aus dem eigenen
alltglichen Lebensbereich
berichten, z.B.ber Leute,
Orte, Ttigkeiten (A2+)
ber persnliche Erlebnisse

und Beobachtungen in
einfachen,
zusammenhngenden
Stzen berichten (A2)
mit einfachen Mitteln

vertraute Gegenstnde
kurz beschreiben und
vergleichen (A2+)
Can describe everyday

aspects of his/her
environment e.g.
people, places, a job or
study experience. (A2+)
Can give short, basic

descriptions of events
and activities. (A2+)
Can use simple

descriptive language to
make brief statements
about and compare
objects and possessions.
(A2+)
Schler/innen knnen
Can tell a story or

describe something in a
simple list of points.
(A2+)
CEFR Descriptor
Deskriptor aus BIST-VO:
Table 1: Construct Space: Monologue Part 1
MONOLOGUE PART 1
Familie und Freunde

Wohnen und Umgebung
Essen und Trinken
Kleidung
Krper und Gesundheit
Jahres und Tagesablauf
Feste und Feiern
Kindheit und
Erwachsenwerden
Schule und Arbeitswelt
Hobbys und Interessen
Umgang mit Geld
Erlebnisse und
Fantasiewelt
Umwelt und Gesellschaft
Kultur, Medien und
Literatur
Interkulturelle und
landeskundliche Aspekte
(sterr. Lehrplan f. Lebende

Fremdsprachen)
Topic Area
Narrative or
story (true or
invented)
Personal
report
Personal
statement
Expository
discourse
Description
Spoken Text
Types
To describe or
compare objects/
people/places
To describe
dreams/hopes/
plans/ambitions/
events/activities/
reactions
To express
feelings/hopes
To give reasons/
explanations
To relate a
narrative
To report about
events/personal
experiences /
topics
To (re)tell a story
Communicative
Function
Speaking Purpose /
Educational:
teachers,
classmates
etc.
Personal:
family,
friends etc.
Audience
Primary
Context /
36
4.4.1 Construct Space
sich, die Familie,

Freundinnen und Freunde
sowie vertraute Orte,
persnliche Gegenstnde
und Ttigkeiten in
mehreren einfachen
Stzen beschreiben (A2)
fr Ansichten, Plne oder

Handlungen kurze
Begrndungen oder
Erklrungen geben (B1)
ber eigene Erfahrungen

detailliert berichten und
dabei ihre eigenen Gefhle
und Reaktionen
beschreiben (B1)
Plne, Ziele, Trume und

Hoffnungen beschreiben
(B1)
Can briefly give reasons

and explanations for
opinions, plans and
actions. (B1)
Can give detailed

accounts of experiences,
describing feelings and
reactions. (B1)
Can describe dreams,

hopes and ambitions.
(B1)
Schler/innen knnen
Can describe people,

places and possessions
in simple terms. (A2)
CEFR Descriptor
Table 2: Construct Space: Monologue Part 2
MONOLOGUE PART 2
Prompt
Type
Familie und Freunde

Wohnen und Umgebung
Essen und Trinken
Kleidung
Feste und Feiern
Kindheit und
Erwachsenwerden
Umgang mit Geld
Erlebnisse und
Fantasiewelt
Kultur, Medien und
Literatur
Interkulturelle und

Fremdsprachen)
Topic Area
Narrative or
story (true or
invented)
Personal
report
Personal
statement
Expository
discourse
Description
Spoken Text
Types
To describe or
compare objects/
people/places
To describe
dreams/hopes/
plans/ambitions/
events/activities/
reactions
To express
feelings/hopes
To give reasons/
explanations
To relate a
narrative
To report about
events/personal
experiences /
topics
To (re)tell a story
Communicative
Function
Speaking Purpose /
Educational:
teachers,
classmates
etc.
Personal:
family,
friends etc.
Audience
Primary
Context /

37
Prompt
Type
einfache Vereinbarungen
treffen (A2)
vertraute
Alltagssituationen
bewltigen, z.B. Gesprche
in Geschften, Restaurants
und an Schaltern fhren
(A2)
einfache Erklrungen und

Anweisungen geben, z.B.
nach dem Weg fragen bzw.
den Weg erklren (A2+)
Gefhle wie berraschung,

Freude, Bedauern und
Gleichgltigkeit ausdrcken
und auf solche
Gefhlsuerungen
reagieren (B1)
Can ask about things

and make simple
transactions in shops,
post offices or banks.
(A2)
Can get simple

information about
travel, use public
transport: buses, trains,
and taxis, ask and give
directions, and buy
tickets. (A2)
Can express and

respond to feelings such
as surprise, happiness,
sadness, interest and
indifference. (B1)
Schler/innen knnen
Can discuss what to do

in the evening, at the
weekend. (A2)
CEFR Descriptor
Table 3: Construct Space: Short Dialogue
SHORT DIALOGUE
Familie und Freunde

Wohnen und Umgebung
Essen und Trinken
Kleidung
Feste und Feiern
Kindheit und
Erwachsenwerden
Umgang mit Geld
Erlebnisse und
Fantasiewelt
Kultur, Medien und
Literatur
Interkulturelle und

Fremdsprachen)
Topic Area
Functional
discourse
Informal
conversation
Spoken Text
Types
To agree/accept/
disagree
To ask for/express
preference
To ask for/give
information
To ask for/offer
help/attention
To express
feelings/attitudes/
opinions
To greet/depart
To initiate/
maintain/ close a
conversation
To invite/request
to join
To request action
To state ignorance
To suggest
To sympathise
Communicative
Function
Speaking Purpose /
Educational:
teachers,
classmates
etc.
Personal:
family,
friends etc.
Audience
Primary
Context /
38
in einfachen Worten die

eigenen Ansichten, Plne
und Absichten uern und
begrnden (B1)
ein einfaches Gesprch

ber vertraute Themen
(z.B. ber Familie,
Freundinnen und Freunde,
Schule, Freizeit) beginnen,
in Gang halten und
beenden (B1)
Can initiate, maintain

and close simple, faceto-face conversations
on topics that are
familiar or of personal
interest. (B1)
in einem Gesprch (z.B.

Gruppengesprch in der
Klasse) Zustimmung uern
bzw. wider sprechen und
andere Vorschlge machen
(A2+)
Schler/innen knnen
Can give or seek

personal views and
opinions in discussing
topics of interest. (B1)
Can agree and disagree

with others. (A2+)
Can make and respond
to suggestions. (A2+)
CEFR Descriptor
Table 4: Construct Space: Long Dialogue
LONG DIALOGUE
Prompt
Type
Familie und Freunde

Wohnen und Umgebung
Essen und Trinken
Kleidung
Feste und Feiern
Kindheit und
Erwachsenwerden
Umgang mit Geld
Erlebnisse und
Fantasiewelt
Kultur, Medien und
Literatur
Interkulturelle und

Fremdsprachen)
Topic Area
Informal
discussion
Informal
conversation
Argumentative
discourse
Spoken Text Types
To agree/accept/
disagree
To ask for/express
preference
To ask for/give
information
To ask for/offer
help/attention
To express
feelings/attitudes/
opinions
To greet/depart
To initiate/
maintain/ close a
conversation
To invite/request
to join
To request action
To state ignorance
To suggest
To sympathise
Communicative
Function
Speaking Purpose /
Educational:
teachers,
classmates
etc.
Personal:
family,
friends etc.
Audience
Primary
Context /

39
40
4.5 Structure of the test

The test is designed to be carried out by trained interlocutors with paired test takers.
It consists of three sections:
Section one is a Warm-up in which the test takers give basic information about
themselves. The interlocutor asks each test taker three to five interview questions.
In
section two, each test taker produces a monologue based on a textual and/or
visual stimulus.
In
section three, the two test takers engage in a short and a long dialogue based
on textual and visual stimuli.
Interlocutors follow a standardised procedure and use standardised repair questions

and/or question slips to repair breakdown of communication or prevent long pauses.
The E8 Speaking Test attempts to establish a framework involving the test takers in
communicative situations requesting spontaneous language performance. In order
to achieve this, it engages the tests takers in language performances that are not rehearsed or prepared in advance.
4.6 Time allocation

The total testing time is 18 minutes, while the test takers real working time is 15
minutes.
For administration at the beginning (moving in, organising the setting) two minutes
have been allocated, administration at the end (moving out) may last up to one
minute.
The test takers active work comprises three sections:
Section
1: Each test taker is interviewed for one minute; 2 minutes altogether.
Section 2: The test takers are given one minute to read the prompts; the speaking
time for each monologue is 2 minutes; 5 minutes altogether.
Section
3: The test takers are given oral instructions to carry out the short
dialogue, which they start straight away. After the short dialogue (approx.
12minutes) they are given one minute to read the prompts for the long dialogue;
the speaking time for the short dialogue is one to two minutes, for the long
dialogue it is five minutes (approx. 2,5 minutes speaking time per test taker);
8minutes altogether.
The E8 Speaking Test: Time allocation

Administration at the beginning: Moving in, registration etc.
2 minutes
Interview
2 minutes
Preparation for Monologue

Monologues
Short Dialogue
Preparation for Long Dialogue
Long Dialogue
Administration at the end: Moving out
TOTAL
1 minute
4 minutes
12 minutes
1 minute
5 minutes
1 minute
18 minutes
Table 5: Time allocation
4.7 Rubrics
All rubrics are in English. However, they must be formulated in language that is well
below the test takers expected level to be easily understandable for the test takers.
Therefore, they must not exceed CEFR level A2. Test takers must not be put at a
disadvantage because they have difficulty understanding the rubrics.
Rubrics referring to the dialogues need to indicate the reason for communication
and the context/audience. The required length of the speaking activity is indicated
in minutes.
4.8 Speaking Assessment Scale

The following table includes the four dimensions of the analytic speaking assessment
scale and the band descriptors for bands 1, 3, 5 and 7 for each dimension. Bands 2,
4, and 6 are not especially described by band descriptors. They are awarded if performances are better than the below and weaker than the above bands.
The performances in the monologue and dialogue part are weighted equally and assessed collectively. Since this happens in situ and because of the relatively short time
frame of the test it is impossible for the assessors to assess the monologue and the
dialogues separately on four dimensions.
Video recordings are used for multiple-rating to ensure reliability. Differences in
assessor severity will be adjusted for in the process of multi-faceted Rasch analysis.
41
no task achievement
basic statements or negations on word or

word group level
no task achievement
attempted questioning
to get information
basic turntaking through initiating,
maintaining or closing discourse,
limited using
information
on familiar and routine
sometimes
stock phrases
matters communicated in a simple and
direct exchange
description
or narrative
in and
a simple
list of
limited
information
on familiar
routine
points
on sentenceinoraword-group
matters
communicated
simple and level
effective
questioning in information
direct
exchange
exchange
description
or narrative in a simple list of
points on sentence or word-group level
effective questioning in information
very little information even in simple

exchange
everyday situations
basic statements or negations on word or

group level even in simple
very word
little information
attempted
questioning to get information
everyday
situations
effective turntaking through initiating,
clear and
concrete
information
of immediate
maintaining
and/or
closing
discourse,
relevance
with
mainphrases
points communicated
sometimes
using
stock
comprehensibly
straightforward
descriptionof
orimmediate
narrative
clear
and concrete information
relevance
basic with
turntaking
through
initiating,
main points communicated
maintaining or closing discourse,
comprehensibly
sometimes description
using stockorphrases
straightforward
narrative
no assessable language
frequent mispronunciations;
only
understood by speakers of English with
some effort
intelligible pronunciation; foreign accent
mispronunciations
much hesitation frequently
causing
or
which sometimes
breakdown
of communication
impair
understanding
very short, isolated, mainly premuchpackaged

hesitationutterances
frequently causing
frequent
only
breakdown
ofmispronunciations;
communication
by speakers
of English with
very understood
short, isolated,
mainly presome
effort
packaged utterances
clearly intelligible pronunciation and
noticeable
pauses,with
hesitation
or false
intonation,
sometimes
a foreign
starts,
sometimes
causing breakdown of
accent;
occasional
mispronunciations
communication
short contributions
and exchanges
noticeable
pauses, hesitation
or false
linked
with some
simple
connectors
starts,
sometimes
causing
breakdown
of
intelligible pronunciation; foreign accent

communication
mispronunciations
which sometimes
shortorcontributions
and exchanges
impair
understanding
linked with some simple connectors
1
good
range of structures
Grammar
relatively high degree of
grammatical control and few
which do not impair
goodinaccuracies
range of structures
communication
relatively
high degree of
messagecontrol
clear and few
grammatical
Table 6: Assessment Scale
limited control causing frequent

breakdown of communication
no assessable
message
seldom clearlanguage
causing breakdown of
communication
message usually clear
extremely limited range of simple

structures
limited control causing frequent

breakdown
communication
extremely
limitedofrange
of simple
message seldom clear

structures
limited range of simple structures

frequently inaccurate with basic
mistakes, generally without
causing
of
limited
rangebreakdown
of simple structures
communication
frequently
inaccurate with basic
message
usually
clear
mistakes,
generally
without
occasional inaccuracies which

can impair communication
message clear
inaccuracies which do not impair
communication
generally
message
clearsufficient range of
structures
occasional inaccuracies which

can impair
communication
generally
sufficient
range of
structures
message clear
Descriptors referring to range and control reflect the features of the task and the nature of grammar and vocabulary in unplanned speech.
2 See above
Clarity
fluent and
spontaneous at aof
fairly even
& Naturalness
tempo with natural pauses
Speech
longer stretches
of language
clear,
natural pronunciation
fluent
and spontaneous
at a fairlyand
even
intonation
tempo
with natural pauses
longer stretches of language

clear, natural pronunciation and
intonation
some degree of fluency with some
pausing for repair or grammatical and

lexical planning
connected
of language
in a
some
degree ofstretches
fluency with
some
connected,
linear
sequence
of
points
pausing for repair or grammatical and
clearly
intelligible pronunciation and
lexical
planning
intonation,
sometimes
with a in
foreign
connected
stretches
of language
a
accent;
occasional
mispronunciations
connected, linear sequence of points
Clarity &Assessment
NaturalnessScale
of (Oct 2011) Grammar1
Speaking
Speech
Descriptors referring to range and control reflect the features of the task and the nature of grammar and vocabulary in unplanned speech.
2 See above
detailed
communicated
Taskinformation
Achievement
& reliably
description or narrative with main points
Communication Skills
expanded by relevant, detailed information
and information
examples communicated reliably
detailed
effective
through
description
orturntaking
narrative with
maininitiating,
points
maintaining
and/ordetailed
closing information
discourse,
expanded
by relevant,
sometimes using stock phrases
and examples
Task Achievement &

Communication Skills
Speaking Assessment Scale (Oct 2011)
mostly inaccurate vocabulary

no assessable
language
frequently
causing breakdown
of communication
frequently inaccurate
vocabulary controlling a
narrow lexical repertoire
extremely limited range of

vocabulary communicating
few clear ideas
extremely
mostlylimited
inaccurate
rangevocabulary
of
frequently
causing breakdown
vocabulary
communicating
of communication
few clear
ideas
vocabulary; major errors

possible when expressing
more complex ideas
limited range of vocabulary

mostly communicating clear
ideas
frequently
limited
range ofinaccurate
vocabulary
vocabulary
controlling
mostly
communicating
cleara
narrow
lexical
repertoire
ideas
formulations sometimes
varied to avoid repetition
generally accurate vocabulary
sufficient range of vocabulary

communicating clear ideas
occasionally inaccurate
vocabulary;
errors
sufficient
range ofmajor
vocabulary
possible
when
expressing
communicating clear ideas
more complex
ideas
occasionally
inaccurate
2
good range of vocabulary
Vocabulary
communicating clear ideas;

formulations sometimes
repetition
goodvaried
rangeto
ofavoid
vocabulary
communicating
generally accurate
vocabulary
clear ideas;
Vocabulary2
Testing Speaking for the E 8 Standards
4.9 Prompt samples

The model prompt set below has been developed and used in the piloting phase. The
length of the spoken performances is controlled by trained interlocutors according to
point 4.6 in the specifications. In case of breakdown of communication the standard
ised repair questions are used to help the test takers to continue the speaking test.
43
Model Promptset 04
2011/12
Interlocutor:
Hello, please sit down. Im . Ill do the speaking test with you.
The lady / gentleman in the back is Mrs / Mr .
Shes / Hes listening.
In the first part I will ask you some questions.
Candidate A
Whats your name?
How are you today?
Candidate B
And how about you? How are you today?
Whats your name?
Candidate A
(Use name), whats your favourite food?
Who cooks it for you?
Candidate B
(Use name), whats your favourite sport?
How often do you practise it?
BIFIE Salzburg I Bildungsstandards E8 Speaking
Model Promptset 04
Time
min:sec
2011/12
In part 2 you will give a two minute talk.

On this card there are three topics.
(Hand out prompt cards.)
0:00 Please choose one and read only that topic text
carefully.
You have one minute to prepare.
Model Promptset 04
1:00
2011/12
[Candidate A] (use name), please start now.
Candidate A:
Choose one topic and read only that topic text carefully.
Topic 1: A star / famous person

Say .
who it is.
why this person is famous / a star.
what this person does.
what this person looks like.
why you would like to be that person.
why you would not like to be that person.
Topic 2: Your dream room
Say .
what this room is like.
what is in this room.
what you do in this room.
who can go into this room and what for.
why you like this room.
what you would have to do to make your room your dream room.
Topic 3: Your last weekend
Say .
where you spent your last weekend.
what you did.
what you liked about it.
what you did not like about it.
who was with you.
what they did.
Would you want to change anything about last weekend?
3:00
(After 2 minutes)
Could you finish, please? /
Thank you, [Candidate A] (use name).
[Candidate B] (use name), would you start, please?
Model Promptset 04
2011/12
Repair questions for Candidate A

Topic 1: A star/famous person
Who is this person?
What does this person do?
How old is he/she?
Tell me about his/her job/hobbies/music/movies/greatest success
What about his/her hair/height/body/favourite clothes/character?
What does he/she like/not like?
Why would you like to be that/a star?
Why would you not like to be that/a star?
Topic 2: Your dream room
What is in it? on the walls?
Tell me about its furniture (chairs, table, wardrobe, shelves, ).
What about its walls/windows?
Who is allowed to go into this room?
What is your favourite thing/object in the room?
What do you do in this room?
Why do you like this room?
What would you have to do to make your room your dream room?
Topic 3: Your last weekend
Where did you go?
What did you do?
Who did you meet?
go with?
stay with?
invite?
What did the place look like?
What did you eat or drink?
Tell me about the people.
place.
activities.
food.
What was the weather like?
How did you get there?
How long did you stay?
Model Promptset 04
Candidate B:
2011/12
Choose one topic and read only that topic text carefully.
Topic 1: Your daily routine

Say .
when it starts.
what you do.
what you like about it.
what you do not like about it.
when it ends.
Would you like to change anything about it? Give reasons.
Topic 2: Your favourite celebration/party
Say .
what you celebrate.
what it is like.
what you do.
who is also there.
what they do.
why you like this celebration.
Topic 3: Your favourite season/time of year
Say .
what it is.
what it is like.
what you do.
what clothes you wear.
why you like this season / time of year. Give reasons.
what season / time of year you do not like. Give reasons.
5:00
(After 2 minutes)
[Candidate B] (use name), could you finish, please?/
Thank you, [Candidate B] (use name).
Well now do part three.
Model Promptset 04
2011/12
Repair questions for Candidate B

Topic 1: Your daily routine
When do you get up?
What do you do then? What do you do after.?
Tell me about your morning.
What do you do at lunch time?
Tell me about your afternoon.
What about your evening?
When do you go to bed?
What time of the day do you like best?
Topic 2: Your favourite celebration/party
What is your favourite celebration? What do you celebrate?
When is it?
Are there any special things/activities you do?
What do you eat?
Tell me about the people there.
What do you like best about it?
What is your favourite activity?
What about birthday parties/Halloween/New Year?
Topic 3: Your favourite season/time of year
What is your favourite season? .time of year?
What is the weather like?
What sports can people do?
What can you do in your free time?
What hobbies can people do?
What hobbies are popular at this time of the year?
What food do you eat in this season?
Tell me about the temperature, the plants (trees, flowers,...)
What clothes do you wear?
Why do you like summer/autumn ?
Why do you not like summer/autumn ?
Model Promptset 04
2011/12
lnterlocutor:
You will now have a conversation together.
You are at the kids flea market. Here are your cards. (Offer
cards and allow the candidates to have a look at the cards for 10
seconds before you carry on.)
[Candidate A] (use name), you go shopping to the flea
market and [Candidate B] (use name) you are selling your
things.
(If necessary) [Candidate A] (use name), please start.
Model Promptset 04
Candidate A:
2011/12
You BUY.
-------------------------------------------------------------------------------------Candidate B:
You SELL.
10
Model Promptset 04
Time
min:sec
2011/12
Now look at these cards. They help you think about

fashion and trends and ask questions.
(Hand out cards.)
0:00
You have one minute to prepare.
1:00 Now, talk about fashion and trends together (use

gestures to encourage the conversation).
[Candidate B] (use name), please start talking to
[Candidate A] (use name).
(If candidate B does not start talking)
[Candidate A] (use name), can you say something,
please?
6:00 This is the end of the speaking test. Thank you for
taking part. Good-bye.
Follow trends?
Buy modern
clothes? Why
(not)?
Fashion and
trends and
you: What?
Why?
Whats in?
Whats out?
Why?
Trends
teenagers
like/do not like?
Model Promptset 04
What to do with
things that are out?
All other pictures from: http://www.freedigitalphotos.net [29.05.2012]
http://en.fotolia.com/id/1814695
Fashion and Trends
Ask questions and talk with your partner about
Money for
buying modern
things?
10
Positive/
negative
things about
trends?
Positive/
negative
things about
fashion?
YOUR OWN
IDEAS
2011/12
Model Promptset 04
Repair Slips Long Dialogue
2011/12
--------------------------
What trends do teenagers like?

What trends do teenagers not like?
Whats in at the moment? Whats out?
What trends do you follow? Why?
How do you follow trends?
Do you buy modern clothes? Why?
What do you do with things that are out?
What are the good and bad things about trends?
What are the good and bad things about fashion?
How do you get the money for buying modern things
11
Model Promptset 04
2011/12
General repair questions:

(Name) please start. / .... go on
(Name) please ask (Name) about ...
(Name) please ask (Name) what she/he thinks about
(Name) please say something about ...
And what do you think, .?
Do you think so,too ?
What about you, ?
What do you think about?
Tell me more about
Talk about (it).
What else can you say about?

Talk to each other, please! (use gestures to support this)
Tell your friend what you think.
Thank you, . ,can you answer this question / carry on?
Please speak English.
Please speak louder.
12
56
5 Washback
Since the publication of Aldersons and Walls (1993) 15 washback hypotheses,
the impact of testing on teaching/teachers and learning/learners has been widely
acknowledged. If we consider teaching and learning closely linked to curriculum,
course design as well as material production, the effects of testing on those also has
to be taken into account.
Although the E8 Standards Test is a low stakes test that does not have any gate
keeping function, it is expected that it will have an impact on the teaching and
learning of speaking in lower secondary foreign language education.
In general the test, together with the implementation of E8 Bildungsstandards in
2009 and the revision of the curriculum for modern foreign languages, should already have changed the teaching of speaking, as in accordance with the CEFR
(Council of Europe 2001) the skill of speaking features the two components of oral
production (Council of Europe 2001, p. 58ff) and spoken interaction(Council
of Europe 2001, p. 73ff) that should be taught and assessed equally intensively in
a fair proportion to the other three skills. Thus, more attention has been placed on
the role of speaking in teaching EFL and its contribution to the pupils final grade in
these official documents; whether it has also been strengthened through the actual
teaching has yet to be shown.
Moreover, the implementation of E8 Bildungsstandards has been supported by official institutions like BIFIE and SZ as well as publishers and course book writers,
who have reacted to the new requirements through offering on-line and printed
teaching and training materials that support teaching and assessing speaking, so that
the format and the activity types of the E8 Standards Test have found their way into
these materials and some course books.
Finally, sample test papers and video-recorded performances are available on-line,
which offer the opportunity for teachers to make their pupils familiar with taskspecific skills according to the E8 Standards Test format. Additionally, they provide
a guideline on how teachers could administer speaking tests. In order to give the
teachers the opportunity to become familiar with the assessment of speaking performances, in-service training should be offered. It is also hoped, that more materials
that support the teaching and learning of oral production and spoken interaction in
accordance with E8 Bildungsstandards and E8 Standards Tests will be published in
the future, which would help create the desired positive washback on teaching and
learning through making the learners familiar with the instructions, and the task
types.
Bibliography
Alderson, J.C. & Wall, D. 1993. Does washback exist? In: Applied Linguistics, Vol.
14, No. 2, pp. 115128.
Alderson J.C., Clapham C. & Wall, D. 2004. Language Test Construction and
Evaluation. Cambridge: Cambridge University Press.
Bachman, L. F. 1990. Fundamental Considerations in Language Testing. Oxford:
OUP.
Bachman, L. F. & Palmer, A. 1996. Language Testing in Practice. Oxford:
OUP.
Berry, V. 1994. The Assessment of Spoken Language under Varying Interactional
Conditions. Washington D.C.: ERIC Document ED386065. Available online:
http://eric.ed.gov/PDFS/ED386065.pdf.
Bmukk, 2009a. Verordnung: Bildungsstandards im Schulwesen. Available at:
http://www.bifie.at/sites/default/files/VO_BiSt_2009-01-01.pdf
Bmukk, 2009b. Verordnung: Bildungsstandards im Schulwesen, Anlage. Available
at: http://www.bifie.at/sites/default/files/VO_BiSt_Anlage_2009-01-01.pdf
Bmukk, 2009c. Lehrplan Lebende Fremdsprache. Available at:
http://www.bmukk.gv.at/medienpool/17135/lp_hs_lebende_fremdsprache.pdf
Brock, R. Horak, A., Lang-Heran H., Moser, W. , Schatzl, Z., Schlichtherle, B.
Schober, M. 2008. Leistungsfestesellung auf der Basis des Gemeinsamen Europischen Referenzrahmens fr Sprachen (GERS). Praxisreihe: Heft 8. Graz: SZ
Brooks, L. 2009. Interacting in pairs in a test of oral proficiency: Co-constructing a
better performance. Language Testing, 26(3), 341366.
Brown, K. 1999. Developing critical literacy. Sydney, Australia: National Centre for
English Language Teaching and Research.
Brumfit, C. J. & Johnson K. (eds.) 1998. The Communicative Approach to Language Teaching. Oxford: Oxford University Press.
Canale, M. & Swain, M. 1980. Theoretical bases of communicative approaches to
second language testing and teaching. Applied Linguistics, 1(1): 1-47.
Carter, R. & McCarty, M. 2006. The criteria for a spoken grammar. In:
McCarthy, M. 2006. Explorations in Corpus Linguistics. Cambridge: CUP, pp.
2752.
McCarthy, M. 2006a. Explorations in Corpus Linguistics. Cambridge: CUP. Available at: http://www.cambridge.org/other_files/downloads/esl/booklets/McCarthyCorpus-Linguistics.pdf
McCarthy, M. 2006b. Fluency and Confluence: What fluent speakers do. In:
McCarthy, M. Explorations in Corpus Linguistics. Cambridge: CUP, pp. 16.
57
58
Cspes, I. & Egyd G. 2004. Into Europe. The Speaking Handbook. Budapest:
The Teleki Lszl Fondation.
Council of Europe (Ed.) 2001. Common European Framework of Reference for
Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University
Press.
Davies, A., Brown A., Elder, C. Hill, K., Lumley, T. & McNamara T. 1999.
Dictionary of language testing. Cambridge: CUP.
Ebsworth, M. 1998. Accuracy Vs. Fluency: Which Comes First in ESL Instruction?
ESL Magazine. 1:2, 24-26. March/April 1998.
Egyud, G. & Glover, P. 2001. Oral Testing in pairs: A secondary school perspective.
ELT Journal, 55(1), 7076.
Fulcher, G. 2003. Testing Second Language Speaking. London: Longman.
Hanny, R. J. 2000. Assessing the SOL in classrooms. College of William and Mary.
Available at: http://www.wm.edu/education/SURN/solass.html
Henning, G.1987. A Guide to Language Testing. Cambridge, MA: Newbury House.
Hymes, D. 1972. On Communicative Competence. In J.J. Gumperz & D. Hymes
(eds.), Sociolinguistics. Harmondsworth: Penguin Books, pp. 269293.
Hymes, D. 1974. Foundations of Sociolinguistics: An Ethnographic Approach. Philadelphia: University of Pennsylvania.
Johnson, K. & Johnson H. (eds). 1999. Encyclopedic Dictionary of Applied
Linguistics. Malden/Oxford/Victoria: Blackwell Publishing.
Kahn, G. 2008. The social unfolding of task, discourse, and development in the
second language classroom. Unpublished doctoral dissertation. Teachers College:
Columbia University.
Kerlinger, F.N. 1973. Foundations of Behavioral Research. New York: Holt, Rinehart & Winston.
Krashen, S. D. 2003. Explorations in Language Acquisition and Use, Portsmouth
NH: Heinemann.
Krashen S. D. & Terrell T. D. 1988. The Natural Approach. New York: PrenticeHall.
Kunnan, A. J. 1995. Test taker characteristics and test performance. A structural
modeling approach. Cambridge: Cambridge University Press.
Lado, R. 1961. Language Testing. London: Longman.
Luoma, S. 2004. Assessing Speaking. Cambridge: CUP.
Moskal, B. M. 2000. Scoring rubrics: What, when and how? Practical Assessment,
Research & Evaluation, 7 (3) Available at: http://pareonline.net/getvn.asp?v=7&n=3
Richards, J.C. 2008. Moving beyond the Plateau: From Intermediate to Advanced
Levels in Language Learning. Cambridge: CUP. Available at: http://www.cambridge.
org/other_files/downloads/esl/booklets/Richards-Beyond-Plateau.pdf.
Taylor, L. 2001. The paired speaking test format: recent studies. Research Notes, 6,
1517. Cambridge: University of Cambridge ESOL.
Thornbury, S. 2009. How to Teach Speaking. Harlow: Pearson Longman.
Thornbury S. & Slade D. (2006) Conversation. From Description to Pedagogy.
Cambridge: Cambridge University Press.
Widdowson, H.G. 1978. Teaching Language as Communication. London: OUP.
Widdowson, H. G. 1983. Learning Purpose and Language Use. Oxford: OUP.
Wilkins, D.A. 1976. Notional Syllabuses. London: OUP.
Wong, J. & Waring, H.Z. 2010. Conversation Analysis and Second Language
Pedagogy. New York : Routledge.
59
60
Appendix
Schlerinformation und Interviewleitfaden
Mndliche Information der Schler/innen vor der Prpilotierung der
Prompts im Rahmen der Interlocutor/Assessor-Schulung
Begrung, Vorstellung des Teams und der Begleitpersonen (Aufsichtspersonen)

Danke, dass ihr bereit seid, an der Pilotierung der E8 Speaking Tests und am Training
der Lehrerinnen und Lehrer, welche diese Tests durchfhren werden, teilzunehmen.
Ihr helft heute beim Training der Lehrerinnen und Lehrer und bei der Beurteilung
der Qualitt der Aufgaben.
Die Tests werden in 4 Rumen durchfhrt. Frau/Herr wird euch durch diesen
Vormittag fhren.
Bei jedem Test werden auch Lehrerinnen und Lehrer zuhren. Sie ben das Beurteilen von Sprechleistungen und machen Aufzeichnungen darber, was sie ihren Kolleginnen und Kolleginnen zur Verbesserung der Gesprchsleitung nach dem Training
sagen werden.
Nun zu euch: Ihr werdet immer zu zweit getestet und der Test besteht aus drei Teilen. Zuerst werden euch zum Aufwrmen ein paar Fragen gestellt. Danach whlt ihr
ein Thema aus drei Mglichkeiten aus, zudem ihr mglichst viel sagen knnt und
sprecht darber ca. 2 Minuten lang. Danach sprecht ihr gemeinsam mit eurem Partner ber ein Alltagsthema ca. 12 Minuten lang und dann ber ein zweites Thema
ca. 5 Minuten lang.
Der Test ist nicht schwer und niemand braucht nervs zu sein. Eure Ergebnisse werden nicht weiter gegeben und nur fr unsere Arbeit verwendet.
Wenn ihr sprecht msst ihr euch keine groen Sorgen machen, falls gelegentlich
Fehler passieren. Wenn euch ein Wort fehlt, versucht es zu umschreiben oder eure
Idee anders auszudrcken. Es ist wichtig, dass ihr mglichst viel sprecht, dass wir
verstehen was ihr sagen wollt und dass ihr mglichst flssig sprecht.
Vor dem Monolog und dem langen Dialog habt ihr eine Minute Zeit euch gedanklich vorzubereiten.
Wir wschen euch viel Spa bei der Arbeit und freuen uns schon auf euer Feedback
nach dem Test.
Interview mit Schlerinnen und Schlern nach der Prpilotierung der

Prompts
Ich mchte dich noch ein paar Dinge zu den vier Tests fragen. Deine Antwort ist fr
uns wichtig um zu wissen, ob die Tests noch verndert werden mssen oder ob wir
so weiter arbeiten knnen.
1.

2.

3.

Wie hast du dich bei den vier Tests gefhlt?

ev. War das (Gefhl) bei allen Durchgngen gleich?
Hast du das Gefhl, dass du bei einem der vier Tests besser oder schlechter warst?
Wenn ja, warum?
ev. War eine der Aufgaben schwerer oder einfacher als die anderen? Wenn ja,
warum?
Verrtst du mir zuletzt bitte auch noch, warum du bestimmte Monologe ausgesucht oder lieber nicht gewhlt hast?
Danke fr deine Hilfe bei dieser Prpilotierung der E8 Standards Speaking Tests.
61

TR Speaking 130805

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

TR Speaking 130805

Enviado por

Direitos autorais:

Formatos disponíveis

Testing Speaking

for the E8 Standards

Testing Speaking for the E8 Standards

Technical Report 2012

Bundesinstitut fr Bildungsforschung, Innovation & Entwicklung

Testing Speaking for the E8 Standards.

Autorinnen und Autoren:

14 3.1 Issues of standardisation

35 4 E8 SPEAKING TEST SPECIFICATIONS

35 4.1 Purpose of the test

Austrian National Curriculum (sterreichischer Lehrplan)

Testing Speaking for the E8 Standards

CANCODE = the Cambridge and Nottingham Corpus of Discourse in English

Testing Speaking for the E8 Standards

Testing Speaking for the E8 Standards

2.1 Models of communicative competence

Testing Speaking for the E8 Standards

1. Setting and Scene

Testing Speaking for the E8 Standards

2.2 Communicative competence in the CEFR

of Europe 2001, p. 108)

Testing Speaking for the E8 Standards

2.2.1 Linguistic competences

Testing Speaking for the E8 Standards

Grammatical competence is defined in the CEFR as the

(Summarised from Council of Europe 2001, pp.116117)

(Summarised from McCarthy 2006b, p.5)

Testing Speaking for the E8 Standards

Phonological competence as described above is assessed in the dimension Clarity

Testing Speaking for the E8 Standards

Design competence describes the ability to sequence turns according to inter

2.3 The nature of language in unplanned speech

mostly in reaction to other speakers;

Testing Speaking for the E8 Standards

Testing Speaking for the E8 Standards

Testing Speaking for the E8 Standards

3.1 Issues of standardisation

3.2 Standardising the content

Testing Speaking for the E8 Standards

purpose of communication (see Speaking Purpose/Communicative Function, p. 35ff),

Testing Speaking for the E8 Standards

Testing Speaking for the E8 Standards

Testing Speaking for the E8 Standards

Expository discourse presents a topic. It does not report events or focus on a

Testing Speaking for the E8 Standards

In informal conversations personal information is exchanged between people who

Testing Speaking for the E8 Standards

3.3 Standardising the setting

Testing Speaking for the E8 Standards

3.3.1 Interlocutor/Assessor characteristics

Phase One: Face-to-Face (F2F) Meeting 1

Phase One: F2F Meeting 1

Testing Speaking for the E8 Standards

Phase Two: Online Training 1

Testing Speaking for the E8 Standards

Figure 2: Test seating arrangement

The interlocutors arrange

3.4 The test takers

Testing Speaking for the E8 Standards

Testing Speaking for the E8 Standards

3.5 Standardising the construct: construct validation

Testing Speaking for the E8 Standards

Figure 3: Developing assessment criteria

This comprises the following dimensions: