Escolar Documentos
Profissional Documentos
Cultura Documentos
M A Y 1997
To obtain more information about TOEFL products and services, use one of the following:
Foreword
The TOEFL Monograph Series features commissioned papers and reports for TOEFL 2000 and other Test of English as a Foreign Language program development efforts. As part of the foundation for the TOEFL 2000 project, a number of papers and reports were commissioned from experts within the fields of measurement and language teaching and testing. The resulting critical reviews and expert opinions were invited to inform TOEFL program development efforts with respect to test construct, test user needs, and test delivery. Opinions expressed in these papers are those of the authors and do not necessarily reflect the views or intentions of the TOEFL program. These monographs are also of general scholarly interest, and the TOEFL program is pleased to make them available to colleagues in the fields of language teaching and testing and international student admissions in higher education. The TOEFL 2000 project is a broad effort under which language testing at ETS will evolve into the 21st century. As a first step in the evolution of TOEFL language testing, the TOEFL program recently revised the Test of Spoken English (TSE ) and announced plans to introduce a TOEFL computer-based test (TOEFL CBT) in 1998. The revised TSE test, introduced in July 1995, is based on an underlying construct of communicative language ability and represents a process approach to test validation. The TOEFL CBT will take advantage of the new forms of assessments and improved services made possible by computer-based testing while also moving the program toward its longer-range goals, which include the development of a conceptual framework that takes into account models of communicative competence a research agenda that informs and supports this emerging framework a better understanding of the kinds of information test users need and want from the TOEFL test a better understanding of the technological capabilities for delivery of TOEFL tests into the next century
It is expected that the TOEFL 2000 efforts will continue to produce a set of improved language tests that recognize the dynamic, evolutionary nature of assessment practices and that promote responsiveness to test user needs. As fiature papers and projects are completed, monographs will continue to be released to the public in this new TOEFL research piiblication series. TOEFL Program Office Educational Testing Service
Abstract
11 I
Discussion of TOEFL 2000 in the TOEFL Committee of Examiners' (COE) meetings resulted in a framework representing components believed to be relevant to defining language use in an academic context. 'nae framework, called the COE Model, is comprised of aspects of the context of language use as well as hypothesized capacities of the language user. The COE Model suggests that test development should begin by examining the types of academic contexts in which language is used in order to hypothesize what those abilities may be for any specific context of interest. COE discussions of TOEFL 2000 were motivated by a broad range of validity concerns (e.g., content validity, comtruct validity, and the social consequences of test use), and the Model may have implications for how validation of TOEFL 2000 is conceived. 'Ihe COE model is described to serve as a record of past discussion which can inform future work.
Acknowledgment
II III I I I I I I I
'hie authors are grateful to the members of the TOEFL Committee of Examiners and ETS staff who participated in discussions of TOEFL 2000.
Table of Contents
I I
II
II
Page
1. 2. Introduction ................................................................................................. Background and Assumptions ................................................................. 2.1 W h y a model of language use in context? ................................................................................ 2.2 COE assumptions about a definition of language ability .......................................................... 2.3 Assumptions about testing language ability ............................................................................. 3. COE Model .................................................................................................................................. 1 2 2 2 3 4
3.1 Context .................................................................................................................................. 6 3.1.1 Situation ...................................................................................................................... 7 3.1.1.1 Setting .................................................................................................................. 8 3.1.1.2 Participants ........................................................................................................... 8 3.1.1.3 T a s k ................................................................................. ; ................................... 8 3.1.1.4 Text ...................................................................................................................... 8 3.1.1.5 Topic .................................................................................................................... 9 3.1.2 Performance ................................................................................................................ 9 3.1.3 Conclusion ................................................................................................................ 10 3.2 Internal 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 Operations ................................................................................................................. Internal Goal Setting .................................................................................................. Verbal Working Memory ........................................................................................... Verbal-ProcessingComponent................................................................................... Language Competence ............................................................................................... World Knowledge ...................................................................................................... Internal-Processing Output ......................................................................................... 10
11
11 13 14 16 16
3.3 Model of Communicative Language Applied ........................................................................... 17 3.3.1 The Skills Described "llarough the Model .................................................................... 17 3.3.2 Using the Model for Describing Language Use ........................................................... 19 3.4 Conclusion ............................................................................................................................. 4. Implications for Test Development ................................................................................................ 4.1 Using the C O E Model for Test Development .......................................................................... 4.1.1 Identify and Analyze the Academic Context of Interest ............................................... 4.1.2 Hypothesize the Abilities Required in the Context ....................................................... 4.1.3 Construct Relevant Item/Task Formats .......................... , ........................................... 4.1.4 Establish a Scoring Rubic .......................................................................................... 20 21 21 21 23 24 24
Page 4.2 Issues Raised by the COE Model for Test Development .......................................................... 4.2.1 W h y Not Just Give 'Ilaem "Authentic" Academic Tasks? ............................................ 4.2.2 What About the "Four Skills"? .................................................................................. 4.2.3 W h a t Is a"Situation"? ............................................................................................... 4.2.4 What Is Correct? ....................................................................................................... 25 25 26 26 27
4.3 Conclusion ............................................................................................................................. 28 5. Implications for Validation ........................................................................................................... 29 5.1 Construct Validity Evidence ................................................................................................... 5.1.1 Content Evidence ....................................................................................................... 5.1.2 Empirical Item and Task Analysis .............................................................................. 5.1.3 Internal Structure of Tests ......................................................................................... 5.1.4 External Structure of Tests (correlational evidence) .................................................... 5.1.5 Experimental Manipulations ...................................................................................... 5.2 'Ilae Consequences of Testing ..................................................................... ............................ 5.2.1 Evidence Concerning Relevance and Utility ................................................................ 5.2.2 Value Implications ..................................................................................................... 5.2.3 Social Consequences .................................................................................................. 31 32 32 33 33 34 35 35 35 36
5.3 Conclusion ............................................................................................................................. 36 6. Evaluation and Evolution of the COE Model ................................................................................. 38 References ................................................................................................................................. 39
List of Figures
I I I I I I
Figure 1 (The Working Model of COE Use in an Academic Context) ..................................................... 5 Figure 2 (Validation) .......................................................................................................................... 30
Figure 3 ('nae Working Model of COE Use in an Academic Context, April 1992) ................................ 61
List of Appendices
Appendix A (A Chronological Development of the TOEFL 2000 Model at COE Meetings) ................. 48 Appendix B (A Working List of Definitions of Terms for Language Testing) ....................................... 51 Appendix C ('nae May 1992 Working Model and Lists) ...................................................................... 5 3
1. Introduction
III
Over the past several years, the TOEFL Committee of Examiners (COE) has discussed TOEFL 2000, a test whose tentative purpose is the following: TOEFL 2000 is a measure of communicative language proficiency in English and focuses on academic language and the language of university life. It is designed to be used as one criterion in decision making for undergraduate and graduate admissions. Because the intended purpose of TOEFL 2000 is to test communicative language proficiency for academic life, the COE discussions of TOEFL 2000 have focused primarily on how to define "communicative language proficiency for academic life." These discussions have produced a framework for such a definition that has been codified as a schematic diagram representing components believed to be relevant, as well as hypothesized relations among the components. This diagram, called the "COE Model," has been useful within the COE meetings to focus discussion on how to define what TOEFL 2000 is intended to measure, and it may be useful for discussion beyond the COE. The purpose of this paper is to introduce the COE Model. We first present the background of the COE Model and the assumptions committee members brought to these discussions. 'lhe major portion of the paper explains the COE Model, defining its components and how they are hypothesized to work together; it also addresses many unresolved issues. We then suggest implications of the model for test development and for validation of TOEFL 2000. We conclude by restating the Model's purposes, which should continue to motivate and direct its evolution.
III
III III
From the first discussions of TOEFL 2000 at COE meetings, all those involved have eagerly anticipated possible solutions to questions about the new test. What would it look like? Would it test the "four skills"? Would it provide different versions for students in different subject areas? What would the item formats look like? How could technology be used? How could score meanings at different levels be defined? These questions have consistently directed discussion to two fundamental questions: What is the intended use of TOEFL 2000? What is TOEFL 2000 intended to measure?
'Ihe COE addressed the first question by drafting the tentative statement of intended use for TOEFL 2000 (see Section 1). This statement, in turn, became a guiding assumption for considering the second question. Discussions of the second question have resulted in a framework for defining communicative language proficiency in academic contexts, called the "COE Model." The COE Model explained in this paper is a descriptive sketch that reflects discussions at several meetings, as well as additional minor elaborations by individual COE members. (Appendix A contains more details about this process.) The Model attempts to summarize existing research and current assumptions by researchers in cognitive psychology, applied linguistics, and language testing. The COE does not view the Model described in this paper as a definitive or final version of the framework that will prove most useful for test development and validation. Instead, it represents an interpretation of communicative language use in a form that should facilitate future discussion.
I I
IIIIIII
III II
knowledge of graramatical well-formedness; (3) strategic competence, referring to the strategies one uses to compensate for imperfect language knowledge or other limiting factors (such as fatigue, distraction, and in,~ntion); and (4) discourse competence, comprising knowledge of the connections among utterances in a text to form a meaningful whole. A useful extension of this work for language testing is Bachrnan' s (1990) description of a more specific model of language ability, which hypothesizes how Canale and Swain's four competencies work together in language use and which expresses an explicit relationship between "context" and the competencies. The COE Model, presented in Section 3, follows directly from the Hymes, Canale and Swain, and Bachman conceptions of language.
3. COE Model
~ I I I
The schematic diagram in Figure I identifies significant variables that affect language use (both comprehension and production) in academic contexts. 'Ibis model distinguishes the context (above the line) from the individual language user (below the line). The context (3.1, in the nonshaded area above the line) includes those elements of language use external to the lznguage user, many of which are observable to others in the act of communication (e.g., the setting in which communication takes place and the language that the individual contributes to that setting). Below the line are the it~vidual's capacities (3.2, internal operations) which work in concert to interpret and produce language in context. We will describe the model and how it works by beginning with the features of the context that we believe call on specific capacities defined within the internal operations.
SITUATION (3.1.1) SETTING (3.1.1.1) PARTICIPANTS (3.1.1.2) TASK (3.1.1.3) TEXT (3.1.1.4) - key - act sequence - norms of interaction & interpretation
-instrumentalities - genre TOPIC (3.1.1.5)
PERFORMANCE (3.1.2)
-~
INTERNAL OPERATIONS
(3.2)
. . . . .
LANGUAGE COMPETENCE
-linguistic -
3.1 Context
Throughout applied linguists' discussion of communicative competence, the notion of context has been emphasized as an essential element for understanding language and language ability. Reflecting this perspective, the COE Model specifies that all language processing is initiated in some way by the context. The individual is in a given situation and must use language to communicate, whether the communication is due to a conversational partner, a task assigned by mother person, a need to respond to a remote event or topic, or a need to inform or entertain oneself. The primary assumption is that all language use is, at least remotely, based on a need to communicate (even if with oneself; cf. Crystal, 1987). The context in which communication takes place is crucial throughout language development. Native speakers of a language develop their communicative competence through participation in the social and cultural life of their family, friends, teachers, and neighbors. Speakers develop the ability, referred to here as communicative competence, to communicate appropriately and grammatically correct in a variety of situations. 'ntis ability is complex, consisting of many interacting components or abilities. 'lhese interrelated abilities are activated by various features of the environment surrounding a language user. "nae users, whether interpreting discourse through reading and listening, or expressing themselves through writing or speaking, are engaged in an ongoing and dynamic process of assessing relevant information available in the environment or in negotiating the meanings expressed. Given the crucial role of context in communication, in defining communicative language proficiency we must address these questions: What do we mean by the word "context"? What features of context are relevant to language use? "context" refers to the environment of a text. Both concrete and abstract aspects of context are relevant to communicative competence. Concrete aspects of context include the physical setting, the specific place where communication occurs, and those observable features that represent a "concrete" sense of context. Abstract aspects of context refer to such features as the status and roles of the participants (e.g., the instructor and student), knowledge that the participants share, the verbal and nonverbal actions of the participants (e.g., listening to a lecture, writing answers to a quiz, carrying out an experiment), and the effects of the verbal actions, or the changes they bring about as a result of a participant having said a particular thing ~e.g., a certain step being taken in an experinaent, clearer understanding as a result of an instructor's answer to a student's question). Features of the concrete context (such as test tubes and beakers in a chemistry lab, an outline on a blackboard, or the chairs that have been moved into a circle for a class discussion) also may be part of the abstract context, but only if they directly influence the activity the participants are involved in. 'naese. features of a speech event ~ both the abstract and the relevant concrete features ~ are referred to as the "context of situation". 'ntis abstract sense of context is associated with a Firthian approach to linguistics (Firth, 1957; Halliday, 1978; Halliday & Hasan, 1989; see also Malinowski, 1923). The abstract sense of context is important because the performance of a language user may not necessarily be tied to a physical setting. A context consists of more than the observable. Moreover, the physical setting of a situation is not always relevant to communication. For example, a student's performance may be represented in a letter of complaint to a car-rental agency, although the physical setting may be a lecture in an auditorium where all other students are taking notes. Here, the product and
'
II
III
II
III II
II
its effectiveness as a letter of complaint are indep~mdent of the physical setting. A further example of the possible irrelevance of physical setting is an exchange between a professor and student about an aspect of the day's lecture. 'nae exchange can take place anywhere ~ in a care, on the street, over the phone, or via electronic mail. Here the precise physical setting may be irrelevant to the participants' communication~ More relevant may be their status visa vis one another and the goals each establishes for communication. The abstract sense of context covers the possible independence of the physical setting from performance. Because of the importance of context in communicative language proficiency, the COE Model identifies specific features of context that allow us to define context and, subsequently, to analyze specific academic contexts of interest to TOEFL 2000 users. The features in the model are based primarily on those identified by Hymes (1972): (1) setting; (2) participants; (3) ends; (4) act sequence; (5) key; (6) instrumentalities; (7) norms of interaction and interpretation; and (8) genre, 'naese eight categories (which can be remembered with the mnemonic "SPEAKING", remain the most useful analysis of context and have been elaborated only slightly (Saville-Troike, 1989; Kramsch, 1993). 'Ihe use of the features is illustrated and discussed in Subsection 3.1.1. In the COE Model, the context of interest is the academic context. Academic contexts can be seen as of two types: (1) those relating to university life; and (2) those of scholarship/the classroom. 'naose of university life are comparable in many respects to situations of daily life off as well as on campus. For example, students meet and converse with others, establish and maintain relationships, and get and give information. One salient feature about the use of language on campus may be the use of vocabulary generally associated with campus and student life (for example, dorm living, registering for classes, dropping a class, flunking an exam, or getting an "A"). This vocabulary marks the interaction as belonging to the campus context. Other features (such as a familiar and friendly tone between students) also may mark the discourse in this way, but it is the use of this vocabulary that is most salient. 'Ihe other kind of academic context, the classroom/scholarship context, is marked in a variety of forms, as the contrasting examples of a lecture and a faculty office appointment show. These two examples do not, of course, represent all situations in which students find themselves, especially since the two illustrations primarily involve oral language. A great deal of student use of language and language ability is involved in the i~rpretation and expression of meaning through written texts. Furthermore, classrooms and faculty offices are not the only settings for linguistic interaction, nor are listening and notetaking. Neither is a request for help and assurance on an assignment the only norm of interaction and interpretation in which students and instructors engage. Because academic contexts differ from one another in important ways, the COE Models specifies a set of features which are important for defining "context."
3.1. !. Situation
In the "Context" section of the model, the left-hand side is labeled "situation." Situation is defined here as including those aspects of the academic situation that are likely to influence academic language use: "setting," "participants," "task," "text," and "topic." The situations "lecture" and "office appointment" are used to illustrate these features.
3.1.1.1. Settina
Setting describes the physical location where communication takes place, where participants are located. 'nae setting for the lecture is typically a classroom or lecture hall; the lecturer delivers the lecture in front of the audience, who may be seated in rows of chairs or at desks. The lecturer may use any of a variety of visual aids (blackboard, overhead projector, or slides). 'Ihe time devoted to the lecture may be more or less than the class period. The office appointment takes place in a room in an office building or complex. The room usually has a desk and at least two chairs, bookshelves, books, and other standard faculty office items. 'Ihe instructor is seated at or behind the desk, the student may be seated beside or facing the instructor. They may be looking together at a textbook or piece of paper with an assignment or quiz on it.
3.1.1.2. Participant..s.,
Participants are the individuals involved in the language event. In academic contexts, participants are generally some combination of instructors (professors or teaching assistants) and students (either graduate or undergraduate). Each participant is associated with institutional status and role characteristics. Moreover, these institutionally defined characteristics may be colored by personal features such as age, gender, level of experience, nationality, and familiarity with the other participant(s).
3.1.1.3. Task
A task is a piece of work or anactivity with a specified goal (see Long and Crookes, 1992). The definition of "task" in most applied linguistics work refers to getting something done, although some applied linguists are working to refine definitions of "task" (e.g., Duff, 1993; Skehan, 1992; Bachman & Palmer, 1996). 'nae goal, or "ends" in Hymes' terminology, of the lecture is to lransmiffreceive information on a range of points to be used by the students for a future assignment. 'Ihe goal of the office appointment is to provide/obtain individual attention that will help the student to understand material needed to write a paper, present an oral report, or take a test.
3.1.1.4. Text
The term "text" refers to the type of language used to complete a task. A task might be completed, for example, through a formal or an informal conversation, a written or orally presented story, or an interview or debate. Text types (e.g., engineering reports, letters of complaint, question-answer exchange sequences, academic advising sequences) can be analyzed using the following of Hymes' features: Key. 'lhe key, or tone, of the lecture is likely to be scholarly, serious, and formal, and perhaps even humorous at times. 'Ihe consultation may be less formal and scholarly, but is likely to be mostly serious. It also may have a sympathetic tone if the student is concerned and/or upset (for example, about performance in the class). 'Ibis feature is often referred to in the literature as "register" or "style" (Halliday, 1978; Joos, 1962).
II
IIII
II
Act Sequence. 'Pnis refers to the form and content of the speech event. Here the lecture and office hour differ; the lecture is likely to contain a large number of facts, illustrations, and examples, as well as the general points being made. 'nae office exchange may also contain facts, but they are likely to be ones repeated from the lecture; new illustrations, however, may be given. In addition, more questions are likely to be raised in the office exchange. Norms of lnteraction and Interpretation. 'naese refer to the rules for language use that apply in this particular event and the information about the speech community and its culture that is necessary for the participants to understand the event. Norms for the lecture in North American culture are likely to be polite listening on the part of the audience, with note taking and perhaps some hands raised for questions. The lecture is generally one-way--- from instructor to student. In the office, the participants generally take turns speaking, and either the student or the instructor may initiate the exchange. Standards set by the surrounding speech community determine what is and is not appropriate and acceptable behavior for the particular event. Instrumentalities. Code and channel are the considerations here. The language of the lecture and the office visit will be oral in channel but may differ in code. 'Ihe lecture may be delivered in a more formal language or dialect of the community, while the office appointment may be accomplished in the less formal code. Genre. ~ genres represented by each of the example speech events are "lecture" and "consultation." Other genres include editorials, scientific abstracts, book reports, business letters, and talk show interviews.
3.1. ! .5 Topic
Topic refers to the specific content information that is being addressed by various participants, and to various tasks and texts in the situation. Different topics impact a student's performance on many types of tasks, reflecting different levels of linguistic competence and language-processing abilities. For example, a student who is asked to critique a text, and who has relatively little knowledge of the topic, may rely more heavily on linguistic and textual strategies to compensate for weaker topical knowledge.
3.1.2. Performance
The other element in Figure 1 that is part of "Context" is labeled "Performance," or linguistic and behavioral output. 'ntis is the contribution that the language user makes to the context. 'nais contribution may be verbal and in the form of a text (writing an essay or asking a question) or nonverbal (turning to a designated page or following along on a map). 'nae broken line between situation and performance indicates that, while these two elements of context can be analyzed separately, they are interrelated notions. Performance occurs within a situation; a situation can be described in part by linguistic and nonlinguistic performance, or behavior.
3.1.3. Conclusion
'Ibis understanding of context is important in constructing a theoretical foundation for TOEFL 2000 because of applied linguists' view of the role of context in defining language use. Each feature of context described previously is important in understanding why language performance is as it is and why a particular text or discourse takes the form it does, has the intent it has, and performs one function and not another. Each aspect of context also illustrates ways in which sentences out of context are not typical of language use. Meaning in language use is derived from the complex of features that describe a situation. The whole of a discourse is more than the sum of the parts. It is insufficient to look at the parts of the discourse and to decode the meaning of the words and the syntax in order to determine the meaning. The meaning is dependent on each sense of context; meaning can be determined from the physical setting and from the relationship of the participants to the situation, to one another, and to the task and text relevant to the situation.
section, we define each of these parts in greater detail and discuss unresolved issues associated with specific components.
be carried out within working memory. Working memory, in this model, is represented by tlae entire internal-processing unit. This follows the assumption that working memory is situated within long-term memory, involving processing mechanisms, metacognitive processes (available throughout verbal working memory space), and resources activated from long-term memory networks. 'nae term "verbal working memory" was chosen on the basis of arguments given by Barsalou (1992), who argues that the traditional autonomous multistore model of short-term memory encounters a number of problems in explaining language-processing results (Barsalou, 1992: pp. 92-115). Language processing (as a limited-capacity activity) is more likely to be constrained in activating information and procedures by the limitations of the central processor operating within long-term memory than by ~ e limits of a separate processing component called short-term memory (cf. Cowan, 1993; Kintsch, 1993; Shifffrin, 1993). Alternatively, the preference of working memory lies w i g its primary emphasis on activation raflaer than on retrieval and storage, its preference for coordinating both storage and computation, and its preference for allowing parallel processes ratlaer than a purely serial processing (Anderson, 1990; Harrington & Sawyer 1992; Kintsch, 1993; Just & Carpenter, 1992). Just and Carpenter (1992) explained ~ e requirements of working memory as follows: A somewhat more modem view of working memory takes into account not just the storage of items for later retrieval, but also ~ e storage of partial results in complex sequential computations, such as language comprehension. The storage r~ttr" ements at the lexical level during comprehension are intuitively obvious .... But storage demands also occur at several other levels of processing. The comprehender must also store the theme of the text, the representation of the situation to which it refers, ~ e major propositions from preceding sentences, and a running, multilevel representation of the sentence that is currently being read (Kintsch & van Dijk, 1978; van Dijk & Kintsch, 1983). 'naus, language comprehension is an excellent example of a task that demands extensive storage of partial and final products in ~ e service of complex information processing. Most recent conceptions of working memory extend its function beyond storage to encompass the actual computations themselves .... 'naese processes, in combination with the storage resources, constitute working memory for language .... We present a computational theory in which both storage and processing are fueled by the same commodity: activation. In this framework, capacity can be expressed as the maximum amount of activation available in working memory to support either of ~ e two functions. In our theory, each representational element has an associated activation level. An element can represent a word, phrase, proposition, grammatical structure, flaematic structure, object in the external world, and so on~ The use of ~ e activation level construct here is similar to its widespread use in other cognitive models, both symbolic (e.g., Anderson, 1983) and connectionist (e.g., McClelland & Rumelhart, 1986). During comprehension, information becomes activated by being encoded from written or spoken text, generated by a computation, or retrieved from long-term memory. As long as an element's activation level is above some minimum threshold value, that element is considered part of working memory; it is available to be operated on by various processes (Just & Carpenter, 1992:121-122). 12
II
Metacognitive processing includes strategic processes that are directed by goal setting, problem solving, and multiple (and sometimes conflicting) informational sources. For example, the need to adjust speech production to conform to a superior's new expectations requires a balancing of elements from sociolinguistic and discourse competence, along with various outcome scenarios that are potentially available from world knowledge. 'nais situation will require the individual to direct strategic attention to the speech output to carefully monitor the context and make adjustments. Metacognitive processing will also include the strategies associated with strategic competence in Canale and Swain's (1980) framework (i.e., processes that enhance the message and repair perceived miscommunication). Metacognitive processing is typically seen as requiring extensive demands on working memory capacity. The more complex the task (or the more unfamiliar the topic, the more difficult the vocabulary, the more unusual the setting, the more anxiety-provoking the context), the more demands are placed on metacognitive processing in working memory.
Further issues associated with metacognitive strategies include the debatable value of distinctions such as cognitive strategies versus metacognitive strategies and strategies versus skills. 'naese distinctions may not be useful to maintain in any strict sense and are not assumed by the Model. Moreover, following Baker (1991) and (Paris, Wasik & Turner, 1991), the distinction between cognitive strategies and metacognitive strategies is argued to be variable by topic, task, and individual. For example, the need to read five pages of Chomsky's latest article will impose severe demands on an individual's processing; many processes that might otherwise be on-line (such as proposition integration with new vocabulary) will require directed attention and problem-solving routines as part of comprehension. An inverse example is that of summarization. This ability is typically nominated as a metacognitive process, yet an individual's regular updating of the plot to a mystery novel does not require the directed attentional processing that Chomsky' s article would. Thus, what might be a metacognitive strategy in one situation will only invoke minimal on-line processing demands (a procedural routine) in another situation. It is therefore difficult to specify a universal set of skills versus strategies or a set of cognitive versus metacognitive strategies.
On-line processing refers to the basic skilled processing that (for native speakers) does not require extensive attentional resources, such as word recognition, initial parsing, and nondemanding processing related to propositional formation and integration into a text model. It also reflects those aspects of mental model processing that are not "directed" for any particular purpose or goal. Thus, on-line processing represents not only potentially encapsulated activities, but also those activities not placing serious demands on metacognitive processing or attentional resources. Generally, this view of on-line processing conforms with the sketch of working memory noted by Just and Carpenter in Subsection 3.2.2.
In much the same way that a task may not overwhelm resources for a native speaker's on-line processing, the advanced learner of a second language may be sufficiently skilled and have efficient 13
processing routines so on-line processing works well. For less skilled second-language learners, such as those taking the TOEFL 2000 test, the on-line processing may not be sufficiently skilled to process the information without great resource demands (which are also dependent on topic, tasks, internal goal-setting, etc.). For these individuals, aspects of on-line processing will not be much different from the resourceintensive strategic processing typical in metacognitive processing. 'Ihus~ one major source of L2 test-taking variation may well be the limits of on-line processing in working memory (again a prediction of Just and Carpenter's Capacity Theory).
Grammatical competence includes phonological/orthographic, morphological, lexical, structural, and semantic knowledge. It includes knowledge of possible structures, word orders, and words. 'Ihe specific grammatical knowledge required in a given context depends on the grammatical features that the language user must comprehend and prochlce to accomplish the goals he or she sets.
Many issues remain concerning how best to represent grammatical knowledge, but the most difficult aspect of what is defined here as grammatical knowledge is the nature of the lexicon. The model includes lexical knowledge as a part of grammatical knowledge, even though the lexicon most likely contains more than formal linguistic features. The problem is that the lexicon's relation to any other language-processing component is not simple or straightforward. While everyone can agree that a lexicon is necessary, it is not entirely clear where it should be located, what it should encompass, and how it should interact with other processing components. For example, it is not clear to what extent the lexicon is linked with knowledge of the world-----to what extent is knowledge of the world simply knowledge of the terms and concepts primarily stored in the lexicon itself (cf. Paivio, 1986)? From this, many other questions arise. To what extent is procedural knowledge linked to the lexicon (perhaps as generic script entries)? To what extent are schemas and knowledge frames represented in the lexicon as some set of generic defaults for declarative knowledge concepts? To What extent does the lexicon obviate the need for an independent syntactic processing component? To what extent are sociolinguistic knowledge and discourse structural knowledge keyed to terms and concepts of the lexicon? To what extent are intentions, purposes, and plans keyed to lexical concepts and terms? To what extent is ~ e L2 lexicon distinct from the L1 lexicon? All of ~ese questions point to the undefined nature of ~ e lexicon in relation to other processing components and the need for additional work in this area.
,
Discourse competence refers to the language user's knowledge of how language is sequenced and how it is organized above llae syntactic level. This component includes knowledge of exchange sequences in
14
II
interaction, genre and register markers, coherence markers and coherence relations, topic development, links between informational units, and the structuring of informational flow. Discourse competence also includes knowledge of genre structure to account for the fact that people recognize whole genre forms in many instances. As with respect to grammatical competence, the specific discourse knowledge needs will depend on the features of the context (particularly the features of texts defined in Subsection 3.1.1.4.).
Sociolinguistic competence includes knowledge of language functions and language variation. Functions include, for example, knowledge of language for greeting, convincing, apologizing, criticizing, and complaining. In any given setting, the language user will need to know some combination of functions to participate. This component of language knowledge is activated by goal setting (Subsection 3.2.1) since the functions follow directly from goals, attitudes, and purposes. Functional knowledge, in turn, activates the specific linguistic knowledge ne~led to produce or interpret the relevant functions. For example, a student who disturbs a small class by entering late must perceive the situation as one that requires an apology so his or her "goal-setting" component can set the goal of apologizing. To actually apologize, however, the students functional knowledge must know how to make an apology in English, (which specific words and syntactic patterns to use, as well as how much of an excuse to provide and how much detail to include).
Knowledge of language variation consists of knowledge of dialect diversity (e.g., regional differences such as midwestem versus southern), of naturalness (e.g., archaic forms and vocabulary versus contemporary colloquial speech), of cultural references (e.g., "to meet one's Waterloo"), and of figures of speech (e.g., "to have been around the block"), as well as knowledge of numerous configurations of register variation. Register variation is defined as knowledge of the language appropriate for the following contextual situations: (1) one or many in the intended audience; (2) familiar or distant relationship among participants; (3) informal or formal occasions; (4) subordinate or superordinate relation to participant(s); (5) general or topical content; and (6) relative background knowledge of participants. Each dimension of register variation defines an dement of context that influences language use; therefore, knowledge of the language associated with the combinations of dimensions is an important component of language competence. For example, the student entering the class late would choose different language to express apology in a small class than in a large class. 'nae student's language would be different in a class comprised entirely of friends than in a class of strangers. It would be different if the instructor were there than if the instructor were absent. Our understanding that language varies across these dimensions of register is the result of empirical research in sociolinguistics, but the nature of native speakers' linguistic variations across the contexts of interest remains an important research area. The Model states that the types of knowledge defined in these three major subcomponents of language competence--- grammatical, discourse, and sociolinguistic---- are in combination, the major components of language necessary for communicative language use in context. The answers to questions about how each of these general areas of language competence can be specified for the academic contexts of interest to TOEFL 2000 test users await further research. Also of interest would be developmental definitions of each of these areas of language knowledge. For example, is it generally the case that learners know how to use greelings before they learn to complain? A third issue involves the question of the level of socioUnguistic knowledge a learner must obtain to be able to work effectively in academic contexts. 15
completion, the monitoring either rejects the output and tries again through another processing cycle or is satisfied with the match to internal goal setting and ends the iterative cycle for that particular task or subtask. The monitoring could also respond to a nonmatch with frustration and end the processing cycle, even though the output does not match the goal setting. 'Ihe comparison of goals with output, or "monitoring," is not discussed extensively in cognitive psychology and psycholinguistics but is important in applied linguistics (cf. Buck, 1991; Krashen, 1985; Morrison & Low, 1983; Pawley & Syder, 1983; Schmidt, 1992), and composition (Bereiter & Scardamalia, 1987; Hayes et al., 1987). Monitoring is an essential process in language use and in specific task performance. In the sense discussed here, the "monitor" is basic to language processing. This description should not be confused with Krashen's (See, for example, Krashen, 1985) use of the term in his discussion of his input theory.
individual visually perceives the text and engages in word recognition. For aural language, lexical activation is keyed by auditory perception and, perhaps, other cues in the context. Written language production typically is initiated through goal setting and initial activation of plans. Spoken language production will, at times, be initiated by goal setting and planning, although spoken language will often also often be initiated by relatively automatic response patterns. Beginning with a discussion of reading comprehension, the skills and processes the Model assumes will be outlined. As a reader begins to read, and the first word of a sentence is activated for working memory, the semantic and syntactic information from the word is used to begin parsing the incoming sentence. The additional incoming words are accessed and combined in terms of general parsing principles, relying on semantic and syntactic information attached to each word and, at some point, pragmatic and contextual information. The word and the growing syntactic structure are also interpreted as a propositional structure, representing the meaning of the sentence. 'Ihe proposition is integrated as the reader reaches the end of the sentence. 'nlis proposition is then "sent" to be incorporated in a text model (within working memory), which synthesizes the incoming proposition with an existing (or created) propositional network. At the same time that the new propositional structure is being integrated into the text model, the words from the next sentence are being activated and assembled in a new parsing representation. Meanwhile, the proposition being integrated into the text model probably will require one or more bridging inferences to assist the coherent and thematic restructuring of the text model (van Dijk & Kintsch, 1983; Kintsch, 1993; Perfetti, 1993; Singer, 1993; cf. Graesser & Kreuz, 1993). Thus, by the end of the proposition construction and integration, inferential processes are being used to fit the proposition into the text model. 'nae text model (as a summary of the information in the text) and necessary bridging inferences will be constrained to represent consistently nominated information more strongly, as well as information which has been marked in one of several ways as thematic. While the text model is being constructed and reconstructed, an interpretive model of the text is also being constructed. This mental model, or situation model (Barsalou, 1992; van Dijk & Kintsch, 1983; Fincher-Kiefer, 1993; Kintsch, 1988; Singer, 1990) represents the reader's interpretation of the text (beyond the comprehension of the text model). As an interpretation of the text information, it will include additional processing information, such as explanations for the information, evaluations of the information, connections to other sources of information, emotional responses, adequacy of information assessments, and appropriate purpose assessment (Graesser & Kreuz, 1993). This interpretation may not be a complete or accurate representation of the text, but it is the reader's individual interpretation. While this latter stage of comprehension and interpretation is going on, the other processes of word recognition, parsing, semantic interpretation, and text-model building continue. Speaking and writing must be explained somewhat differently because they follow partially developed plans that reflect an internal text model; in the COE Model, this process would be associated with the goal-setting component. 'nae initial procedures for lexical activation are different for speaking because they are internally driven, instead of driven by an external language source. Speaking is also likely to involve a different set of demands on the processing output in terms of a heightened monitoring, if nothing else.
18
II
Writing processes combine the internally driven activation of information with the more reflective interpretive analysis used to create a more elaborate mental model. While both speaking and reading usually require relatively fast on-line processing in terms of lexical access and parsing, writing almost always demands more reflective operation of both. That is, writing, as it improves, requires the penetration of automatic production with concerns for appropriate word and structure choices, as wen as concerns for organization, reader expectations, writing purposes, emotional signaling, and attitude to task and topic (Bereiter & Scardamalia, 1987). It is possible to see the Bereiter and Scardamalia models of knowledge telling and knowledge transforming in writing processes as types of inverses of the text model and mental model created by reading processes. While the text model for the reader and knowledge telling for the writer represent relatively automatic processing and thus operate similarly, the more reflective and problem-solving counterparts move in different directions, at least in goal setting and output matching. The processing difficulty for the writer is that the text produced, to the extent that it is slanted to a writer's particular purpose or reflects attitudes, emotions, or evaluations, must be tempered because of its potentially public nature. In contrast, readers, in constructing their mental models, can be much more aggressive in their interpretations, evaluations, and emotions toward a text and not have to be judged. Other important differences exist between reading and writing, as between writing and speaking, but a discussion of these would take us beyond the scope of the present report. Recognizmg the differing demands that each language skill makes for language processing, the goal of the TOEFL 2000 model, particularly the internal operations, is to take into account these differences and construct a simple model that can be shaped to language processing in any of the four skill areas. When we look at academic contexts, we see that skills work together to accomplish goals. Two brief examples in the next subsection illustrate this integration.
19
As the student begins to read, he or she will be trying to understand in general what the article is about and to jot down some specifics that may be important to include in the class report. As the student reads, lexical access processing will activate linguistic knowledge and world knowledge. Both of these sets of knowledge will already have been activated to some extent during preliminary goal setting. The linguistic knowledge activated here will include the specific morphological, lexical, structural, and semantic knowledge. Discourse knowledge will include knowledge of how Newsweek articles typically are organized and knowledge of how local coherence relations are established. Sociolinguistic knowledge will include the knowledge of the dialect and cultural referents in the article and knowledge of the text type of the article in terms of the dimensions of register variation. As the on-line and metacognitive processing continues, new world knowledge and linguistic knowledge will be cycled into the processing component. 'rims, the three subcomponents are seen as operating in tandem, without presupposing a strictly linear set of operations. At certain points, the individual will want to monitor the progress of the processing and the match between the potential output and the internal goal setting (completion of the article). When the individual gets tired of processing, or when the results develop toward completion, the internal-processing output will match the results to the goal setting. If the match is satisfactory, the results will be sent to the production output or (as in this example) will be stored for later retrieval. The individual may also stop the processing, because the internal goal setting is not a strong model for comparison or because the individual is incapable of making a good match and stops trying. The second example also occurs in a classroom setting. The text type is a question/response sequence. The task is to respond to the teacher's question on the basis of lecture notes from the previous week. The internal goal setting is based on previous experiences with this routine, academic training, and the likelihood of retrieving the relevant information. 'Ihe specific goal will be either to respond appropriately and to be recognized as knowing past information, or to respond in a sufficiently vague manner so as to satisfy the teacher and not betray total ignorance. The goal setting begins to activate whatever world and linguistic knowledge can be recalled from the past week's lectures. 'nais information is to be combined with whatever world and linguistic knowledge can be usefully inferred from the teacher's question. To formulate a response, the activated world knowledge and linguistic knowledge are processed. The language knowledge will include consideration of the functional purpose of the response and appropriate register information. Became the response time will be relatively rapid, a fair amount of the response processing will depend on set routines for assembling an answer. Delays due to metacognitive strategy feedback may lead the responder to produce filler sounds and phrases until some linguistically well-formed response is sent to the internal-processing output. This then may or may not be matched to the goal setting component. 'nae response is then generated as performance, which others in the context can observe.
3.4. Conclusion
TOEFL 2000 discussions have so far focused on the Model itself: what components are essential for describing language use, how to express them effectively, and how to translate our currem understanding of skills to applied linguists' perspectives on communication. 'Ihe ultimate success of these efforts will depend on their usefulness in test development and validation. Anticipating the need to examine the Model in light of its intended uses, the following two sections speculate on some of the model's implications for test development and validation. 20
S:
2 3 4 5 6 7 T: S:
Can you explain why it's doing this? The ray entering the block refracts away from the normal to the surface. What was your question?
I s- we see why it does that but why ~ I mean we see that it
does that, but why does it do that? T: S: Why ~ why does it bend toward the normal? Yeah.
21
II
II
8 9
T:
Uh, that's because of the difference of the index of refraction... OK ...between air and... Is that the same thing for why this does that away from it? Yes, that's the same reason but, depending on from where to where the beam goes, there's a change in angle... OK ...direction of the angle (writes the response down as the answer to the question on the lab report) All rigtn. So to bend toward the normal to the surface, does the index of refraction should be greater than one, or smaller than one?
10 S: 11 T: 12 S: 13 T: 14 15 S: 16 T: 17 18 19 S: 20 T: 21 22 (pause) 23 S: 24 T: 25 26 S: 27 T: 28 S: 29 T: 30 S:
What'd you say? To bend to the normal to the surface, um is the index of refraction of this plastic...should... Is greater, isn't it? Greater than one? Should it greater than one? Is that right or not? (clatter from object dropped on neighboring table) Huh? Yes, that's fight? OK (Searls, 1991, pp. 45-46)
22
II IIII
The researchers return from their fieldwork with plenty of classroom data. Examining all the data the researchers have collected, the test developers frequently see the type of conversation illustrated here. 'Ihe discussion consists of questions and answers centered around a lab report that must be completed. For the purpose of analysis, completion of the lab report would be the "task." When the test developers and researchers look through a lot of conversations in labs centered around completing a lab report, they see that sometimes a Teacher's Assistant (TA) is present (as in this example). Sometimes the dialogue is between lab partners, and the difference in participants changes the nature of the discourse. A lab partner, the test developers see in other data, never uses the long, thought-provoking questions that the TA attempts in this example 0ines 20-25). Much of the language in the data refers to the lab equipment present in the setting ~ a physics lab, which the researchers have documented in great detail. 'Ihe text is oral, consisting of short Q-A turns between two participants of unequal status and knowledge. 'Ihe researcher and test developer could do a more thorough discourse analysis of the text, citing the expressions and syntax used and the functions and sociolinguistic characteristics of the exchanges. This is the type of data that would be used for hypothesizing the language abilities required for performance in science labs.
Goal Setting: The student needs to understand the instructor as he explains the goal of setting up the
experinaent and completing the lab report. Understanding requires language abilities to read and listen to instructions. The student must also set a goal of producing written answers to questions in the lab report.
language Processing: Metacognitive processes keep the student working toward the large goal, constructing subgoals as needed during the course of the conversation. On-line processes retrieve specific aspects of language knowledge as needed to accomplish goals, which include both production (asking questions and wilting responses in the lab report) and comprehension (listening to the TA and reading the lab questions). Language Competence: Linguistic competence includes knowledge of the following: syntax of
questions (both yes/no and why), statements in the present tense, a limited range of morphological forms associated with present, concrete language, and phrases used in response to questions. Vocabulary knowledge includes knowledge of simple and concrete words (e.g., explain, enter, block, difference, thing, reason), as well as technical words directly associated with the materials and the basic concepts of light refraction (e.g., ray, index, refraction).
23
Discourse Competence: This includes knowledge of question/answer turn taking (including interruption sequences), associated cohesive devices, and knowledge of language referring to the physical setting. Discourse competence also includes knowledge of the genre, "lab report." Sociolinguistic Competence: 'ntis includes knowledge of the language associated with a variety of spoken language functions used in questioning. The student has to know standard English (for reading the lab materials) and a nonnative variety of English to communicate with the international TA. To engage in the conversation with the TA, he has to know the language used for one-to-one conversation on a specific topic with a specialist on the topic with whom he has a distant and subordinate relationship, in a somewhat formal setting. Also relevant is the fact that the TA is "primary knower" (Berry, 1987), meaning that he knows the answers to the student's questions. The language used to complete the lab report would require different sociolinguistic knowledge.
4.1.3. Construct Relevant Item/Task Formats
Having defined what should be measured, the test developer can construct a test task (or tasks) to measure it. For example, using the fieldwork from the science lab to look at the tasks used in context, the test developer can get ideas for test tasks, even though the resulting test will not be exactly like having the learner perform in the context. Again, one would want to consult a large set of science lab data to look for potential item/task formats, but for the purpose of the example, the item/task format "filling in a lab report'" can be used. From the abilities analyzed on the basis of the transcript, the test developer would recognize the need to test the student's ability to use language to understand a goal. A good item/task would require the test taker to activate metacognitive processes to update goals and use on-line processing efficiently to retrieve the necessary language knowledge for both production and comprehension. Moreover, the student should have to call on the linguistic knowledge identified: questions and answers with simple present, concrete forms, knowledge of turn taking, and language referring to objects in the physical context. On the basis of this analysis, a simple lab report format can be designed, one that visuals of an experiment would support. However, how can this format be extended to test the oral interaction ability so important to this task? Test item/tasks may have difficulty assessing all of the abilities included in a construct definition. At this point, one has to make compromises ~ while noting where and what kind of compromises these are. If it is not possible to produce a format requiring the test taker to complete her lab report through oral interaction with a TA, the test developer might consider other interactive formats through which the test taker could gain information (e.g., a computer program that would allow the student to query a database). "Ilae reason for hypothesizing abilities required in the academic context of interest (as illustrated in Subsection 4.1.2) is that potential constraints and compromises of the test can be recognized and understood. These constraints and compromises will influence the abilities we can actually test in relatibn to the abilities we want to measure and are therefore a relevant object of validity inquiry.
of the responses on the actual report. If the report has seven blanks that must be completed with short answers, it might most simply be scored by accepting only the completely correct answers. However, one purpose of the explicit construct definition should be to create a more meaningful scoring method. For example, if the test developer had test takers gain information to include in the lab report by querying a database, the ability to engage in this process should be assessed, in addition to the final product. The scoring rubric should distinguish the student who asked a lot of good questions (but never the right one) from the student who was unable to form a question at all. Both students would end up with a "product" score of "0," but their "process" scores would be different. 'Ihe current COE Model provides little guidance on this important issue. Moreover, the fieldwork the Model suggests would only reveal the nature of the academic context, describing the setting, participants, task, text, and topic. 'Ihese aspects of the context will affect what it means to be successful, but the Model does not provide guidelines on how to analyze success (or levels of success) in a given situation. This issue obviously requires more work to attempt a principled method of evaluating performance (cf. Subsection 4.2.4). The data in the example of the student in the lab did include the language of a student who was successfully filling in his lab report. To do it he was extracting information from the TA, despite the TA's desire to make him think about what he was doing. To obtain that information successfully, he had to use the interactive oral/aural ability defined previously. For the student in this setting, the analysis of the abilities required to successfully complete the lab report should help in evalualing the abilities important in this context. 'Ibis type of construct definition provides some guidance about what might be scored; however, much work remains to be done on the question of what should be scored and how to define levels of abilities in academic contexts. 'Ibis is only one of the test development issues the Model raises.
25
Because test situations are inlaerently different from the contexts about which we want to infer test takers' ability, students' performance on a test is likely to offer a distorted picture of the ability they would use in "authentic" contexts. The issue is, then, how we can use the picture of ability obtained by test performance to make inferences about abilities in other contexts. In order for tests to be used appropriately, it is the responsibility of test developers to demonstrate, and test users to consider, evidence concerning the meaning of test scores (i.e., construct validity evidence). To investigate construct validity, it is necessary to hypothesize the construct that the test is intended to measure. Developing a test whose validity can be justified is the primary objective of TOEFL 2000; therefore, the COE has devoted its time to articulating a model to be used for construct definition, which is necessary for construct validation. We will elaborate on this point in Subsection 5.1.
required for performance in that situation. In theoretical terms, this is a useful conceptualization for defining language ability. When it comes to empirical research (the fieldwork mentioned in Subsection 4.1) and test development, however, it is apparent that the "situation" will require an operational definition. To take the example of the science lab, when does the situation begin and end? The goal of the task was identified as completing the lab report, but was that the only relevant goal in that lab situation? Would the TA agree on the goal identified for that session? The science lab situation consisted of more than one text: the oral text of the dialogue between TA and student and the written language of the lab report. How many texts can a situation contain? How many tasks? Should we attempt to define situations by the numbers of other elements they contain? These questions point to the need for theoretically informed empirical work whose objective is to construct an operational definition of "situation." Such research would investigate the nature of academic situations and their associated language. It would be similar to, and informed by, research that investigates reading or writing in academic contexts. It would differ from such research by not starting with predefined categories of "reading" and "writing," but would instead begin without preconceived units into which data would be placed. It would use multiple perspectives, including those of instructors and students, to help obtain a realistic account of the data. In these senses, such research would draw on ethnographic methods. The objectives, however, would have to include constructing a definition of situation that test development staff could use for additional research on academic situations and for guidance in selecting task/item formats.
27
II
4.3. Conclu.sion
'Ihe four steps in test development (Subsection 4.1) and the four issues raised (Subsection 4.2) are intended to provide a starting point for discussions about implications of the Model for test development. Much remains to be said and researched as the practical concerns of test development are viewed from the perspective of a model reflecting theory in applied linguistics. This discussion and research will, in turn, help in modifying, specifying, and better understanding subsequent versions of the Model. As the test development perspective will highlight some aspects of the Model, validity research will also make requirements of and provide contributions to the Model's evolution. In the next section, we initiate discussion of the Model's role in validity research.
28
29
VALIDITY
EVIDENCE
CONSEQUENCES
INTERPRETATION
USE
construct validity
value implications
social consequences
30
III
Messick's definition offers a coherent perspective for conceptualizing validity. To move from perspective to research, however, it is necessary to identify specific types of evidence and consequences that will allow us to investigate TOEFL 2000 interpretation and use. In addition, we need to suggest research methods for identifying the relevant evidence and consequences. Some researchers have begun to explore a range of criteria and approaches for validity inquiry (e.g., Linn, Baker, & Dunbar, 1991). 'Ihe TOEFL 2000 research program will eventually be in a good position to contribute to these explorations. We begin by considering Messick's suggestions for validation research. 'Ibis report does not attempt to cover these issues comprehensively, but only to outline some of the specific types of justifications that TOEFL 2000 researchers may investigate and to speculate how the COE Model might apply to each.
31
developmentally is an important unresolved issue in language testing (e.g., Canale, 1988; Brindley, 1991; Pienemann, Johnson, & Brindley, 1988). With respect to understanding learners' problem-solving processes, the COE Model ~ with an explicit "goal-setting" and "verbal-processing component" ~ can be used to guide research. 'Ilae question will be: To what extent do the strategies/processes required for TOEFL 2000 fall within the defirti~on of communicative language proficiency in academic contexts? 'This question becomes important when we define language proficiency as consisting of strategies/processes, as the COE Model does. The COE Model suggests that the language strategies required for success in academic contexts are an important part of the proficiency to be measured. Moreover, the Model suggests that the strategies required for performance depend on the situation. Because the test situation is not the same as the academic situation, investigation of the strategies required in both becomes an important validity question.
which a measure relates more highly to different methods for assessing the same construct than it does to measures of different constructs assessed by the same method" (p. 46). Results from this kind of study in language testing have provided validity data for particular tests by identifying the influence of test methods (Stevenson, 1981; Bachman & Palmer, 1982). To design such a study, the researcher must begin by defining the trait that is to be measured by multiple methods. How does the COE Model pertain to the design of this research? The COE Model provides a means for developing a definition of language ability in a given setting, but can we call "language ability in a given setting" a trait? In the strict sense of the term, and in the sense used by multi-trait, multi-method researchers, no. In the strict sense, a trait is defined independently from the method used to measure it; in the multi-trait, multi-method research design, trait effects on performance are good, and method effects on performance are bad. The COE Model, in contrast, defines "language ability in a given setting." In other words, setting (i.e., method) effects are not bad; they are expected. The COE Model reflects an interactionalist (rather than trait) perspective (Messick, 1989, p. 15; Zuroff, 1986) of construct definition, attributing performance to three sources: (1) the context; (2) the capacities of the individual, and (3) an interaction between the two. Trait-oriented correlational research on TOEFL 2000 should be cautious to take into account the interactionalist orientation of the COE Model. The second type of correlational research investigates nomothetic span, "the network of relationships of a test to other measures" (Embretson, 1983, p. 180). This type of study requires the researcher to hypothesize strengths of correlations (based on distances in the nomothetic span of constructs) with other measures. Fundamental to this type of research is a construct definition of what is measured by the test under investigation and the hypothesized strengths of relationships expected between that construct and the others in the study. The COE Model, which includes multiple components, would make this type of research interesting to consider. Theoretically guided discussion of this type of research (i.e., asking which constructs we expect the abilities we define for TOEFL to relate to and to what degree) may help inform evolution of the Model.
34
IIIII
III
Not reflected in the Model, but a subject of discussion by the COE, is the need for experimental research to focus on different subject populations who would take TOEFL 2000. In particular, native speakers should perform very well, and there should be no significant difference between the performance of graduate and undergraduate students if the use of the same test for both groups is to be justified. Planning and conducting a variety of experimental research will help to improve both the COE Model and TOEFL 2000.
beliefs. Nominations for a "tradition" consistent with applied linguists' perspectives include "naturalistic-ethical," emphasizing test methods requiring natural language use in context and the test developer's social responsibility (Canale, 1988) and "communicative-psychometric," emphasizing the need for linguists and psychometricians to work together to develop and evaluate communicative language tests (Bachman, 1990). Beyond a test's value-laden foundations, however, we can investigate the values that a test expresses to those it affects. A test can be thought of as a social event that conveys messages about applied linguists' views of language to test takers, instructors, applied linguists, academics in other disciplines, and other members of societies (Canale, 1987). One such message that the COE has discussed is the value TOEFL 2000 might communicate about the privileged status of particular varieties of English. This concern revisits and adds a validity-relevant dimension to the difficult issue of the relative nature of correctness, discussed in Subsection 4.2.4. Language that is correct in one context may be considered nonstandard in North America. Varieties of English exist in numerous English-speaking contexts, such as India and Singapore, and speakers of these other Englishes will be among those taking the TOEFL test. Moreover, the fiature language-use situations of the test takers and the purposes of attending North American universities are also relevant concerns. Many students will spend their university lives and future professional lives interacting with the international community of scholars and professionals, not all of whom are speakers of American English. The issue underlying these situations is whether Englishes that do not conform to American or Canadian English standards can be defined as acceptable. Can the English norms that the TOEFL test assumes represent an international English rather than exclusively a North American variety, thereby not penalizing educated speakers of English whose competence is not a perfect match to educated American varieties? TOEFL 2000's answer to this question will display values of test developers toward different varieties of English.
5.3. Conclusion
As we begin to look at the implications of applied linguists' views of communicative language proficiency for validity justifications, we can foresee some of the dilemmas Moss (1992) points out with respect to performance assessment across disciplines: 36
Performance assessments present a number of validity problems not easily handled with traditional approaches and criteria for validity research. These assessments typically present students substantial latitude in interpreting, responding to, and perhaps designing tasks; they result in fewer independent responses, each of which is complex, reflecting integration of multiple skills and knowledge; and they require expert judgement for evaluation. Consequently, meeting criteria related to such validity issues as reliability, generalizability, and comparability of assessments - - at least as they are typically defined and operationalized- becomes problematic (Moss, 1992, p. 230). As we suggested previously, the COE expects such problems to be encountered as validity issues for TOEFL 2000 continue to be explored. However, we also believe that TOEFL 2000 provides a unique and valuable opportunity in applied measurement for realizing ideals for validity inquiry, as well as for pioneering efforts to establish alternative and practical validity criteria for the profession.
37
Any psychologist, psycholinguist, educational measurement researcher, or applied linguist will have questions and suggestions for improving the COE Model. The needs of test development clearly point to areas that require additional wore some of which we point out in Subsection 4.2. In addition, planning and conducting theoretically motivated construct validity research will provide input for the Model, as pointed out in Subsection 5.1. As the Model continues to evolve, it will be important to keep in mind its purposes: Informing test development (see Section 4) Supporting content analyses of item/task formats (see Subsection 5.1.1) Guiding and interpreting empirical validity research (see Subsections 5.1.2 through 5.1.5) Informing inquiry pertaining to utility, values, and consequences of testing (Subsection 5.2)
38
References
III
I II I
Abraham, R., & Chapelle, C. (1992). The meaning of cloze test scores: An item difficulty perspective. Modem Language Journal, 76(4), 468-479. Alderson, J. C. (1993). The relationship between grammar and reading in an EAP test battery. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing research, (pp. 203-219). Arlington, VA: TESOL Publications. Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied Linguistics, 14, 115-129. Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Anderson, J. R. (1990). Cognitive psychology and its implications, 3rd ed. New York: W. H. Freeman and Co. Anderson, J. R. (1993). Problem solving and learning. American Psychologist, 48, 35-44. Anderson, N. J., B achman, L., Perkins, K., & Cohen, A. (1991). An exploratory study into the construct validity of a reading comprehension test: Triangulation of data sources. Language Testing, 8(1), 41-66. Bachman, L. F. (1982). The trait structure of cloze test scores. TESOL Quarterly, 16(1), 61-70. Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press. Bachman, L. F., & Palmer, A. S. (1982). 'Ihe construct validation of some components of communicative competence. TESOL Quarterly, 16, 449-465. Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University Press. Baker, L. (1991). Metacognition, reading, and science education. In C. Santa & D. Alvermann (Eds.), Science learning: Processes and applications (pp. 2-13). Newark, DE: International Reading Association. Barsalou, L. (1992). Cognitive psychology. Hillsdale, NJ: Lawrence Erlbaum Associates. Bereiter, C. (1990). Aspects of an educational learning theory. Review of Educational Research, 60, 603-624. Bereiter, C., & ScardamaUa, M. (1987). The psychology of written composition. Hillsdale, NJ: Lawrence Erlbaum Associates. Berns, M. (1990). Contexts of competence: Social and cultural considerations in communicative language teaching. New York: Plenum Press. 39
III
III
Berry, M. (1987). Is teacher an unanalyzed concept? In M. A. K. Halliday & R. P. Fawcett (Eds.), New developments in systemic linguistics. Volume 1" Theory and description (pp. 41-63). London: Pinter Publishers. Brindley, G. (1991). Defining language ability: the criteria for criteria. In Anivan, S. (Ed.), Current developments in language testing (pp. 139-164). Singapore: SEAMEO Regional Language Center. Buck, G. (1991). The testing of listening comprehension: An introspective study. Language Testing, 8(1), 67-91. Campbell, R., & Wales, R. (1970). The study of language acquisition. In J. Lyons (Ed.), New horizons in linguistics (pp. 242-260). London: Penguin. Canale, M. (1983). From communicative competence to communicative language pedagogy. In J. Richards & R. Schmidt (Eds.), Language and communication (pp. 2-27). London: Longman. Canale, M. (1987). Language assessment: 'nae method is the message. In D. Tannen & J. E. Alatis (Eds.), The interdependence of theory, data, and application (pp. 249-262). Washington, DC: Georgetown University Press. Canale, M. (1988). The measurement of communicative competence. Annual Review of Applied Linguistics, 8, 67-84. Canale, M., & Swain, M. (1980). 'naeoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1, 1-47. Carroll, J. B. (1989). Intellectual abilities and aptitudes. In A. Lesgold & R. Glaser (Eds.), Foundations for a psychology of education (pp. 137-197). Hillsdale, NJ: Lawrence Erlbaum. Carver, R. (1992). Effect of prediction activities, prior knowledge, and text type on amount comprehended: Using Rauding theory to critique schema theory research. Reading Research Quarterly, 27, 164-174. Chapelle, C. A. (1994). Is a C-test valid for L2 vocabulary research? Second Language Research, 10(2), 157-187. Chapelle, C., & Douglas, D. (1993). Foundations and directions for a new decade of language testing researchl In D. Douglas & C. Chapelle (Eds.), A new decade of language testing research (pp. 1-22). Alexandria, VA: TESOL Publications. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Cohen, A. (1984). On taking language tests: What the students report. Language Testing, I(1), 70-81.
40
IIII
II II
Cohen, A. (1993). The role of instructions in testing summarizing ability. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing research (pp. 132-160). Alexandria, VA: TESOL Publications. Cohen, A. (forthcoming). Strategies and processes in test taking and SLA. In L. Bachman & A. Cohen
(Eds.), Interfaces between second language acquisition and language testing research.
Cowan, N. (1993). Activation, attention, and short-term memory. Memory and Cognition, 21, 162-167. Crookes, G., & Schmidt, R. (1991). Motivation: Reopening the research agenda. Language Learning, 41, 469-512. Crystal, D. (1987). The Cambridge encyclopedia of language. New York: Cambridge University Press. Douglas, D. (forthcoming). Testing methods in context-based second language acquisition researclx In L. Bachman & A. Cohen (Eds.), Interfaces between second language acquisition and language testing
research.
Duff, P. (1993). Tasks and interlanguage performance: An SLA perspec~ve. In G. Crookes & S. M. Gass (Eds.), Tasks and language learning: Integrating theory and practice (pp. 57-95). Philadelphia: MultUingual Matters. Dweck, C. (1989). Motivation. In A. Lesgold and R. Glaser (Eds.), Foundations for a psychology of education (pp. 87-136). Hillsdale, NJ: Lawrence Erlbaum Associates. Embretson, S. (1983). Construct validity" Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179-197. Faarch, C., & Kasper G. (Eds.). (1983). Strategies in interlanguage communication. London: Longman. Feldmann, U. & Stemmer, B. (1987). Thin_._ aloud ~ retrosprctive ~ in C-te taking: Diffe languages--difl~ learners--sa__ approaches? In C. Faarch & G. Kasper (Eds.), Introspection in Second Language Research, (pp. 251-267). Philadelphia, PA: MultUingual Matters. Fincher-Kiefer, R. (1993). The role of predictive inferences in situation model construction. Discourse Processes, 16, 99-124. Firth, J. R. (1957). Papers in linguistics 1934-1951. London: Oxford University Press. Flower, L., & Hayes, J. (1980). The dynamics of composing: Making plans and juggling constraints. In L. Gregg and E. Steinberg (Eels.), Cognitive processes in writing (pp. 31-50). Hillsdale, NJ: Lawrence Erlbaum Associates.
/
41
III
II
Flower, L., & Hayes, J. (1981). 'nae pregnant pause" An inquiry into the nature of planning. Research in the teaching of English, 15, 229-244. Gardner, R. C., & Maclntyre, P. (1992). On the measurement of affective variables in second language learning. Language Learning, 43, 157-194. Graesser, A., & Kreuz, R. (1993). A theory of inference generation during text comprehension. Discourse Processes, 16, 145-160. Grotjahn, R. (1986). Test validation and cognitive psychology: Some methodological considerations. Language Testing, 3(2), 159-185. Grotjahn, R. (1987). On the methodological basis of introspective methods. In C. Faarch & G. Kasper (Eds.), Introspection in Second Language Research, (pp. 54-81). Clevedon, Avon: Multilingual Matters. Guthrie, J. (1988). Locating information in documents: Examination of a cognitive model. Reading Research Quarterly, 23, 178-199.
Guthrie, J., Britten, T., & Barker, K. G. (1991). Roles of document structure, cognitive strategy, and
awareness in searching for information. Reading Research Quarterly, 26, 300-324. Habermas, J. (1970). Toward a theory of communicative competence. Inquiry, 13, 360-375. HaUiday, M. A. K. (1978). Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold. Halliday, M. A. K., & Ruquiya, H. (1989). Language, context and text: Aspects of language in a social semiotic perspective. Oxford: Oxford University Press. Harrington, M., & Sawyer, M. (1992). L2 working memory capacity and L2 reading skill. Studies in Second Language Acquisition, 14, 25- 38. Hayes, J., Flower, L., Schriver, K. A., Stratman, J. F., & Carey, L. (1987). Cognitive processes in revision. In S. Rosenberg (Ed.), Advances in applied psycholinguistics, Volume 2, (pp. 176-240). New York: Cambridge University Press. Hidi, S. (1990). Interest and its contribution as a mental resource for learning. Review of Educational Research, 60, 549-571. Hudson, T. (1993). Testing the specificity of ESP reading skills. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing research (pp. 58-82). Alexandria VA: TESOL Publications.
42
Hymes, D. (1971). Competence and performance in linguistic theory. In R. Huxley & E. Ingrain (Eds.), Language acquisition: Models and methods. London: Academic Press. Hymes, D. (1972). Towards communicative competence. Philadelphia: Pennsylvania University Press. Kramsch, C. (1993). Context and culture in language teaching. Oxford: Oxford University Press. Joos, M. (1962). The five clocks. Bloomington, Indiana: Indiana University Research Center in Anthropology, Folklore, and Linguistics. Just, M., & Carpenter, P. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological review, 99, 122-149. Just, M. A., & Carpenter, P. A. (1987). The psychology of reading and language comprehension. Boston, MA: Allyn & Bacon. Kachru, B. B. (Ed.). (1992). The other tongue: English across cultures. Second revised edition. Urbana, IL: University of Illinois Press. Kintsch, W. (1988). 'Ihe role of knowledge in discourse comprehension: A construction-integration model. Psychological review, 95, 163-182. Kintsch, W. (1993). Information accretion and reduction in text processing: Inferences. Discourse Processes, 16, 193-202. Kintsch, W., & van Dijk, T. (1978). Toward a model of text comprehension and reproduction. Psychological Review, 85, 363-394. Krashen, S. (1985). The input hypothesis. New York: Longman. Lantolf, J. P., & Frawley, W. (1988). Proficiency: Understanding the construct. Studies in Second Language Acquisition, 10, 181-195. Larsen-Freeman, D., & Long, M. (1991). An introduction to second language acquisition research. London: Longman. Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, Nov., 15-21. Long, M., & Crookes, G. (1992). Three approaches to task-based syllabus design. TESOL Quarterly, 26, 27-56.
43
Lowenberg, P. H. (1992). Testing English as a world language: Issues in assessing normative proficiency. In B. B. Kachru (Ed.), The Other tongue: English across cultures, (2nd ed., pp. 108-121). Urbana, IL: University of Illinois Press. Maclntyre, P., & Gardner, R. C. (1991). Language anxiety: Its relationship to other anxieties and to processing in native and second languages. Language Learning, 41, 513-534. Malinowski, B. (1923). The problem of meaning in primitive languages. In C. Ogden & I. A. Richards (Eds.), The meaning of meaning (pp. 296-336). London: Trubner and Co. Mathewson, G. (1994). Model of attitude influence upon reading and learning to read. In R. Ruddell, M. Ruddell, and H. Singer (Eds.), Theoretical models and processes of reading. 4th ed. (pp. 1131-1161). Newark, DE: International Reading Association. McClelland, J. L., & Rumelhart, D. E. (Eds.). (1986). Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1. Cambridge, MA: MIT Press. McKenna, M. (1994). Toward a model of reading attitude acquisition. In E. Cramer and M. Castle (Eds.), Fostering the love of reading (pp. 18-39). Newark, DE: International Reading Association. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.) (pp. 13-103). NY: Macmillan Publishing Co. Morrison, D. M., & Low, G. (1983). Monitoring and the second language learner. In J. Richards & R. Schmidt (Eds.), Language and Communication (pp. 228-250). New York: Longman. Moss, P. A. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62(3), 229-258. Munby, John. (1978). Communicative syllabus design. Cambridge: Cambridge University Press. Myers, M., & Paris, S. (1978). Children's metacognitive knowledge about reading. Journal of Educational Psychology, 70, 680-690. Nelson, C. L. (1985). My language, your culture: Whose communicative competence? World Englishes, 4(2), 243-50.
Oakhill, J., & Garnham, A. (1988). Becoming a skilled reader. New York: Basil Blackwell.
Ortony, A., Clore, G., & Collins, A. (1988). The cognitive structure of emotion. New York: Cambridge University Press. Paivio, A. (1986). Mental representations. New York: Oxford University Press. 44
I II
I II
II
Paris, S., Lipson, M., & Wixson, K. (1983). Becoming a strategic reader. Contemporary Educational Psychology, 8, 293-336. Paris, S., Wasik, B., & Turner, J. (1991). The development of strategic readers. In R. Barr, et al. (Eds.), Handbook of reading research, Volume 2 (pp. 609-640). New York: Longman. Pawley, A. (1992). Formulaic speech. In B. Bright, et al. (Eds.), Oxford international encyclopedia of linguistics, Volume 2 (pp. 22-25). New York: Oxford University Press. Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. Richards & R. Schmidt (Eds.), Language and communication (pp. 191-226). New York: Longman. Perfetti, C. (1989). There are generalized abilities and one of them is reading. In L. Resnick (Ed.), Knowing, learning and instruction: Essays in honor of Robert Glazer (pp. 307-334). Hillsdalel NJ: Lawrence Erlbaum Associates. Perfetti, C. (1991). Representations and awareness in the acquisition of reading competence. In L. Rieben and C. Perfetti (Eds.) Learning to read: Basic research and its implications (pp. 33-44). Hillsdale, NJ: Lawrence Erlbaum Associates. Perfetti, C. (1993). Why inferences might be restricted. Discourse Processes, 16, 181-192. Perfetti, C., & McCutchen, D. (1987). Schooled language competence: Linguistic abilities in reading and writing. In S. Rosenberg (Ed.), Advances in appliedpsycholinguistics, Volume 2 (pp. 105-141). New York: Cambridge University Press. Perkins, K., & Linnville, S. (1987). A construct definition study of a standardized ESL vocabulary test. Language Testing, 4(2), 125-141. Pienemann, M., Johnson, M., & Brindley, G. (1988). Constructing an acquisition-based procedure for second language assessment. Studies in Second Language Acquisition, 10, 217-243. Rayner, K., Garrod, S., & Perfetti, C. (1992). Discourse influences during parsing are delayed. Cognition, 45, 109-139. Rayner, K., & PoUatsek, A. (1989). The psychology of reading. Englewo~ Cliffs, NJ: Prentice Hall.
Reed, J., & Schallert, D. (1993). The nature of involvement in academic discourse tasks. Journal of Educational Psychology, 85, 253-266.
Renninger, K., Hidi, S., & Krapp, A. (Eds.). (1992). The role of interest in learning and development. HiUsdale, NJ: Lawrence Erlbaum Associates. 45
IIII
Sadoski, M., Goetz, E., & Fritz, J. (1993). Impact of concreteness on comprehensibility, interest, and memory for text: Implications for dual coding theory and text design. Journal of Educational Psychology, 85, 291-304. Sadoski, M., Paivio, A., & Goetz, E. (1991). A critique of schema theory in reading and a dual coding alternative. Reading Research Quarterly, 26, 463-484. Savignon, S. (1983). Communicative competence: Theory and classroom practice. Reading MA: Addison-Wesley. Saville-Troike, Muriel. (1989). The ethnography of communication: An introduction. (2rid ed.). New York: Basil BlackweU. Schmidt, R. (1990). 'Ihe role of consciousness in second language learning. Applied Linguistics, 11, 219-258. Schmidt, R. (1992). Psychological mechanisms underlying second language fluency. Studies in Second Language Acquisition, 14, 357-385. Schmidt, R. (1993). Awareness and second language acquisition. In W. Grabe, et al. (Eds.), Annual review of Applied Linguistics, 13. Issues in second language teaching and learning (pp. 206-226). New York: Cambridge University Press. Searls, J. (1991). The role of questions in the physics laboratory classes of two nonnative speaking teaching assistants. Unpublished master's thesis, Iowa State University, Department of English. Ames: Iowa State University. Shallice, T. (1988). From neuropsychology to mental structure. New York: Cambridge University Press. Shohamy, E. (1993). A collaborative/diagnostic feedback model for testing foreign languages. In D. Douglas & C. Chapelle (Eds.), A new decade of language testing research. Alexandria, VA: TESOL Publications. Shiffrin, R. (1993). Short-term memory: A brief commentary. Memory and Cognition, 21, 193-197. Singer, M. (1990). Psychology of language. Hillsdale, NJ: Lawrence Erlbaum Associates. Singer, M. (1993). Global inferences of text situations. Discourse Processes, 16, 161-168. Skehan, P. (1992). Second language acquisition strategies and task-based learning. Thames Valley University Working Papers in English Language Teaching, 1, 178-208. Smith, L. E., & Nelson, C. L. (1985). International intelligibility of English: Directions and resources. World Englishes, 4(3), 333-342. 46
Spolsky, B. (1978). Introduction: Linguists and language testers. In B. Spolsky (Ed.), Advances in language testing research: Approaches to language testing 2 (pp. v-x). Washington, DC: Center for Applied Linguistics. Spolsky, B. (1985). The limits of authenticity in language testing. Language Testing, 2(1), 31-40. Stanovich, K. (1990). Concepts in developmental theories of reading skill: Cognitive resources, automaticity, and modularity. Developmental Review, 1O, 72-100. Stanovich, K. (1991). Changing models of reading and reading acquisition. In L. Rieben & C. Perfetti (Eds.), Learning to read: Basic research and its implications (pp. 19-31). HiUsdale, NJ: Lawrence Erlbaum Associates. Stanovich, K. (1992). The psychology of reading: Evolutionary and revolutionary developments. In W. Grabe, et al. (Eds.), Annual review of Applied Linguistics, 12. Literacy (pp. 3-30). New York: Cambridge University Press. Stevenson, D. K. (1981). Beyond faith and face validity: 'Ihe multitrait-multimethod matrix and the convergent and discriminant validity of oral proficiency tests. In A. S. Palmer, P. J. M. Groot, & G. A. Trosper (Eds.), The construct validation of tests of communicative competence (pp. 37-61). Washington, DC: TESOL Publications Stillings, N. A., Feinstein, M. H., Garfield, J. L., Rissland, E. L., Rosenbattm, D. A., Weisler, S. E., & Baker-Ward, L. (1987). Cognitive science: An introduction. Cambridge, MA: MIT Press. van Dijk, T., & Kintsch, W. (1983). Strategies of discourse comprehension. New York: Academic Press. Walczyk, J., & Royer, J. (1993). Does activation of a script facilitate lexical access? Contemporary Educational Psycllology, 17, 301-311. Weaver, C., & Kintsch, W. (1991). Expository text. In R. Barr, et al. (Eds.),Handbook ofreading research, Volume 2 (pp. 230-245). New York: Longman~ Whitney, P., Ritchie, B., & Clark, M. (1991). Working-memory capacity and the use of elaborative inferences in text comprehension. Discourse Processes, 14, 133-146. Zuroff, D. C. (1986). Was Gordon AUport a trait theorist? Journal of Personality and Social Psychology, 51(5), 993-1000.
47
Appendix A
Chronological Development of the TOEFL 2000 Model at COE Meetings
1. April 1990, San Juan, Puerto Rico At the spring 1990 Committee of Examiners (COE) meeting, the topic of planning for TOEFL 2000 was raised. The first serious discussion of this topic with the COE was a part of the agenda. 2. January 1991, San Diego At the first TOEFL 2000 meeting, we began the process with ETS test-development staff presenting position papers on the current status and future directions for each section of the TOEFL test. At this meeting, we recognized the need for a statement of purpose for the TOEFL 2000 test and a definition of what the test would intend to measure. We wanted that definition to be consistent with current applied linguistics theory, so we began talking about how Canale and Swain's work might inform the definition. 3. January 1992, Los Angeles 'nais meeting began by developing a Statement of Purpose (see Section 1, page 1), which we reaffirmed in Quebec. We continued to discuss how we could define communicative language proficiency in a way that would be useful for test development and validation. We examined Bachman's model. We attempted to examine the implications of Bachman's model by attempting to use it to redefine the familiar four skills. We began to consider how language functions and forms could be listed for each of the four language skills as a starting point ~ even though we recognized that the test was likely to evolve into some variation on integrated skills modules. On the basis of a second day's discussion, we decided to define "context" in the model as texts, tasks, and settings and attempted to understand what grammatical, sociolmguistic, and discourse knowledge would mean in contexts associated with each of the four language skills. We agreed that strategic competence, as it is defined in much of the applied linguistics literature, was not what we wanted; so, instead of adopting Bachlnan's strategic competence, we discussed the need to have some procedural component that included basic notions of processing. 4. May 1992, Quebec
During this COE meeting, we revised the model categories and elements within categories. We also developed the schematic model that appears in the (May 1992) ETS draft document. At this meeting, we made many useful decisions about the development of the model:
B
We agreed that the academic setting did not mean the test would include any type of language that might be spoken on a campus; instead, it reflected language that would be relevant and needed for students to succeed at a university.
48
11
We recognized the need to distinguish the components of academic language that were observable in the academic context versus those processes/components that were internal to the language user (Academic Contexts versus Internal Student operations). The Academic Context included the Situation (task, text, setting) and the Production Output.
311
"Ihe Internal Operations had to be keyed to some kind of language user intentions, which then translated the situation into internal verbal processing in Verbal Working Memory. The Internal-Processing Output had to be monitored in some way to match the internal output with the language user's intentions to his or her satisfaction. When this match was made sufficiently, the Internal Output became the Production Output observable in the Academic Context. The major verbal processing done by the language user was within some complex language component that included access to world knowledge, some type of verbalprocessing component, and sulx~omponents of language knowledge (linguistic competence, sociolinguistic competence, discourse competence).
all
The committee agreed that the model being sketched was simply a descriptive schematic model. Specific questions, such as how Verbal Working Memory relates to Internal Goal Setting (if the latter also requires access to the language component), what exists outside verbal working memory in the shaded box, and how world knowledge fits in, were issues that could be considered later. This first model was noLan attempt at theory building based on an extensive review of the cognitive psychology research literature. The language competence component was to include linguistic, socioUnguistic, and discourse competence, although its specification was not elaborated far beyond what was discussed in the LA meeting in terms of language skills and language uses. However, some of these concepts were discussed and better clarified.
From these discussions, and some additional notes sent to ETS staff by Carol Chapene and Bill Grabe, ETS staff put together the Working Model of Communicative Language Use in an Academic Context (draft May 1992; see Appendix C). This ETS model included (1) a schematic diagram; and (2) elements of the model broken down into each of the four skill areas for ease of discussion.
49
II
II II
5.
At the May Policy Council meeting two weeks after Quebec, ETS staff presented the May 1992 draft of the working model for feedback from the Policy Council. This draft model was accepted as a tentative draft model to be used as a foundation document for further discussion~ 6. October 1992, Princeton
At a joint COE/Research Committee meeting, the May 1992 draft documents were discussed in some detail. In addition, a large group of ETS officers and staff joined in the discussion and assured both committees that this project was a high priority and would receive support. 7. May 1993, Sedona ETS produced a TOEFL 2000 document that outlined planning background, issues, and next steps. The COE reviewed this document for a day and made a number of recommendations for changing the document (and the priorities for test development). Specifically, the committee recommended that the testplanning documents be organized from a validity-driven standpoint. Computer and technology issues should not be ignored but should be treated in a separate document so that technology is not seen as the driving design. Other recommendations included position papers parallel to the one being developed for the Test of Spoken English on speaking, each of which would cover a language skill (reading, writing, listening). 'Ihese papers would cover current research, and particularly research related to assessing communicative competence through each skill (and across skills). The committee also recommended that a prose description of the model of communicative language use be drafted as a foundation document for future planning. Finally, the committee recommended that the reports and documents use a standardized set of linguistic terminology. This standardization will assist future users of the documents and clarify misconceptions among groups involved in the project. It was agreed that an overview of the model off communicative language use would be the place to present a set of terms and their definitions; this set could then be revised and used in further documents.
50
Appendix B
II I I I I I
communicative competence - the ability to use language to express oneself in context. Hymes is responsible for this term and its original definitio~ Cantle and Swain worked on ~ i f y i n g a definition of communicative competence that would be useful for second language teaching and testing.
communicative language ability - the ability to use language to express oneself in context. This term evolved with Bachman's model (which developed Canale & Swain's) to avoid the confusion associated with the multiple meanings of "competence."
communicative language proficiency - the term used to indicate what TOEFL 2000 is intended to measure. This term evolved as a combination of'~roficiency testing" (in contrast to placement, achievement, or diagnostic), "communicative competenc~" (Hymes, 1972), and "communicative language ability" (Bachman, 1990). The purpose of the COE Model is to elaborate what is meant by this term.
construct definition - a theoretical description of the capacity that a test is supposed to measure. A construct definition is formulated on the basis of judgments of experts in the field - - judgments that may be informed by a variety of evidence.
construct validity e v i d e n c e - judgmental and empirical evidence pertaining to what a test measures. This evidence is interpreted relative to a construct definition. content evidence - judgmental analyses of the knowledge and processes measured by test items/tasks.
discourse analysis - the study of oral or written texts focusing on the elements (e.g., vocabulary and syntactic patterns) contributing to the text, its overall structure, and its context. 51
Illll
Ill l
discourse competence knowledge of how language is sequenced and connected appropriately above the sentence level in terms of coherence, information flow, and cohesion. functional approach to language- an approach to the study of language that focuses on the use (or functions) of language in context rather than on linguistic forms.
social consequences - the effects, impact, or washback of tests on the perceptions and practices of those
affected. Applied linguists view social consequences as an important aspect of validity. sociolinguistic competence- knowledge of how sociological phenomena govern linguistic choices and knowledge of a variety of linguistic options appropriate for a range of sociological situations. strategic competence - the strategies or processes used to put language knowledge to work in context. Strategies include assessing the context, setting goals, etc. Strategic competence was defined by Canale and Swain as a part of communicative competence and included by Bacllman as a part of communicative language ability. In the COE Model, the functions performed by strategic competence are represented by "goal setting" and the "language-processing component." text model - the reader/listener' s representation of what a text is about. validity -justification for test interpretation and use. Justifications include construct validity evidence, evidence about relevance and utility, value implications, and social consequences. value implications - the academic and social values that underlie testing practices and that are conveyed through testing practices.
52
Appendix C
iHi i i i i i~i i
ii~~ili!~~~iiii~iiiiii!iiii!H!Ii
53
STATEMENT OF PURPOSE
TOEFL 2000 is a measure of communicative language proficiency in English and focuses on academic language and the language of university life. It is designed to be used as one criterion in decision-making for undergraduate and graduate admissions. It is assumed that independent validations will be carried out for other uses of the test.
54
Settings
Lecture hall Classroom Laboratory Extra-instructional settings (library,health clinic, student union, professor's office, museums) Interactive media library
Settings
Lecture haLl Classroom Laboratory Extra-instructional settings (library, health clinic, student union, professor's office, museums) With support/outside resources (dictionaries, references) Computer/word processor Pencil-and-paper/manual
SPEAKING Settings
Lecture hall Classroom Laboratory Extra-instructional settings (library, health clinic, student union, professor's office, museums) Interactive media library
,,WRITING
Settings
Lecture hall Classroom Laboratory Extra-instructional settings (library, health clinic, student union, professor's office,
museums)
With support/outside resources (dictionaries, references) Computer/word processor Pencil- and-paper/manual
Text Types Informal conversations Formal discussions (having a predetermined purpose) Interviews Impromptu monologue or speech Lectures (presented from outline or notes but not from written text) Lectures and academic papers (read from written text) Debates Newscasts (read from written text) Formal commentary (read from prepared text) Orally administered instructions Narration (read from prepared text) Story-telling (without a written text) Poetry or literary pieces Scripted dialogues (stage or film, radio plays)
Text Types
Textbooks Research reports Summaries Book reports, reviews Proposals Appeals, petitions Recommendations Lab reports Theses, abstracts, dissertations Journals Charts, graphs, maps Literary texts (fiction, poetry, autobiography) Newspapers Manuals Memoranda Technical and business texts Case studies Personal communications (letters, correspondence, professor's written comments) Notes/outlines Teacher's comments Classroom readings (assignment topics, instructions, lecture outlines, test questions, syllabi, course policy)
Text Types Informal conversations Formal discussions (having a predetermined purpose) Interviews Impromptu monologue or speech Lectures (presented from outline or notes but not from written tex0 Lectures and academic papers (read from written text) Debates Newscasts (read from written tex0 Formal commentary (read from prepared text) Orally administered instructions Narration (read from prepared text) Story-telling (without a written text) Poetry or literary pieces Scripted dialogues (stage or film, radio plays)
Text Types Essays Essay test questions Term papers, research papers, report papers Project reports Case studies Lab reports Theses, abstracts, dissertations Summaries Summaries Notes/outlines Book reports/reviews Letters Proposals Recommendations Appeals/petitions
I,J~STENING COMPREHENSION Tasks Identification of aspects of the code Orientation (tuning in, preparing to process message) Comprehension of main idea or gist Comprehension of details Full comprehension (main idea plus all details) Replication (focuses on fidelity of replication) Extrapolation Critical analysis Inference
READING COMPREHENSION Tasks Identification (specific details, recognition, discrimination, focus on the code) Orientation (author's attitude) Comprehension of main idea Comprehension of details Full comprehension (main idea plus all details) Replication (focuses on fidelity of replication) Extrapolation Critical analysis Inference
SPEAKING
Tasks Explain/inform/narrate Persuade Critique Synthesize Describe Summarize Demonstrate knowledge Support opinion Hypothesize Give directions Transcode from charts, graphs, maps, or other text types
WmT~,G Tasks Explainfinform/narrate Persuade Critique Synthesize Describe Summarize Demonstrate knowledge Support opinion Hypothesize Give directions Transcode from charts, graphs, maps, or other text types Write creatively
Procedml Competence Predicting Modifying/revising predictions based,on new input Attending to content words Tolerating ambiguity Guessing words from context Checking/indicating comprehension through turn-taking Judging relative importance of information Using extralinguistic cues (illustrations, charts, etc.)
Procedural Competence (enhancing or compensatory) Skimming Scanning Guessing words from context Predicting Adjusting reading speed Re-reading (recognizing misreading) Recognizing literal vs. nonliteral meaning Selective reading (skipping parts) Judging relative importance of information Using extralinguistic cues (illustrations, charts, etc.) Rephrasing, paraphrasing during the reading process
Procedural Competence Circumlocute Avoid, skip difficult language Elaborate Revise Organize Exemplify Use resources/quote Copy/imitate/reproduce Paraphrase/rephrase Use visual/graphic supports
Procedural Competence Circumlocute Avoid, skip difficult language Elaborate Revise Organize Exemplify Use resources/quote Copy/imitate/reproduce Paraphrase/rephrase Use visual/graphic supports Edit/proofread
LISTENING ,CQMPREHENSIO,~ Sociolinguistic Competence Register Variation Understand/recognize variations in language with respect to: The number of listeners in the intended audience Familiar or distantrelationship between speaker and audience Informal or formal requirements Subordinate or superordinate relationships General or topical content Lay person or specialistas intended audience Recognize paralinguisticcues Fulfillturn-taking requirements in conversational speech
SPEAKING ' Sociolinguistic Competence Register Variation Produce appropriate language with respect to: One or many in intended audience Familiar or distant relationship between speaker and audience Informal or formal requirements Subordinate or superordinate relationships General or topical content Lay person or specialist as audience FulfiU turn-taking requirements in conversational speech
WRITING Sociolinguistic Competence Register Variation Produce appropriate language with respect to: One or many in intended audience Familiar or distant relationship between speaker and audience Informal or formal requirements Subordinate or superordinate relationships General or topical content Lay person or specialist readers
LISTENING COMPREHENSION Linguistic Competence Recognize phonological features of spoken language Discriminate among forms or structures Recognize word order pattern, syntactic patterns and devices, lexical/semantic relations, variations in meaning
READING COMPREHENS|ON
Linguistic Competence Recognize orthographical features of written language Discriminate among forms or structures Recognize word order pattern, syntactic patterns and devices, lcxicai/scmantic relations, variations in meaning
SPEAKING Linguistic Competence Use appropriate pronunciation, intonation, and stress Combine forms and structures Use appropriateword order Use appropriateforms of words Use appropriate syntactic patterns and devices
WRITING
Linguistic Competence Use distinctive features of the language Combine forms and structures Use appropriate word order Use appropriate forms of words Use appropriate syntactic patterns and devices
Discourse Competence Understand streams of speech Recognize thought groups (prosodic patterns) Infer links between events, situations, ideas Recognize genre markings Recognize coherence relationships Recognize cohesive devices Follow topic development Analyze tone of discourse Recognize conclusion from parts
Discourse Competence Infer links between events (situations, ideas, causes, effects) Recognize genre markings (features of formal
discourse)
Recognize coherence relationships Recognize cohesive devices Follow a topic of the discourse Analyze tone from the various parts Recognize the parts leading to the whole Recognize conclusion from parts Draw conclusions (using multiple bits of information)
Discourse Competence Link situationsand ideas Use multiplepieces of information to support conclusions Use appropriate genre markings (features of formal discourse) Produce coherent speech Develop a topic Use appropriate tone Use appropriatecohesive devices
Discourse Competence Link situations and ideas Use multiple pieces of information to support conclusions Use appropriate genre markings '(features of formal discourse) Establish coherence Develop a topic Create appropriatetone Use appropriatecohesive devices
Oral/Aural
Informal conversation Formal discussion Interviews/talk show Impromptu speech Extemporaneous speech [prepared but no notes] Lectures [prepared notes, not read] Academic paper [read] Newscasts [prepared text] Oral editorial Oral instructions Narration Story-telling[no text] Debates Recitation of poetry or literature Stage or film productions Radio plays [scripted]
Reading/Writing Textbooks Research reports Journals Charts, graphs, tablos, figures Newspapers Literary texts (fiction, poetry, autobiography) Manuals Memoranda Technical/business texts Case studies Personal communication (letters, professor's comments) Notes/outlines Teacher comments Classroom reading (assignment topics,syllabi instructions,lectureoutlines,testquestions, blackboard notes)
On-line Processing Visual/Aural sensory input Lexical Access Propositional integration Text modelling Mental model interpretation
Metacognitive Processing
Skimming Scanning Predicting Adjusting processing speed Re-reading Tolerating ambiguity Summarizing Paraphrasing (ExempLifying) Awareness of organization Use of ex~a-linguisticcues Selective processing (skipping,listeningto two conversations,communicating in phrases) Judging relative importance of information Recognize literal and non-literal meanings Using resources (text, quotation) Editing [production only] Circumlocuting [production only] Elaborating [production only] Revising [production only] Copying/imitating/reproducing [production
Alternative to a separate list of Tasks for each skill, separating tasks into reception tasks versus production tasks:
only]
Listening/reading Identifying aspects of the code Comprehension of main idea Comprehension of detail Full comprehension Replication Extrapolation Critical analysis Drawing inferences
Speakingwri~g
Explain/inform/narrate Persuade Critique Synthesize Describe Summarize Demonstrate knowledge Support opinion Hypothesize Give direction
!~uo" n
Narration
Var~,hr~
Direc.uons Definition
Explanation
Orders
READING COMPREHENSION l~mctious Understand/Recognize: Description Narration Paraphrase Directions Definition Explanation Orders
Opinions
Summary Predictions Hypothetical hmguage Persuasive language Comparmon/contrast Cause/effect relationships Agreement and disagreement Criticism Approval/disapproval Exlx~ion offeelings/moods Suggestions/Recommendations Advice
Opinions
Summary Predictions Hypothetical language Pcrsuasiw language Comparison/contrast Cause/cffect relationships Agreemnt and disagreement Criticism Approval/disapproval Expression of feelings/moods Suggestions/Recommendations Advice Complaints Requsts
Comptttnts
Requests
SPEAKING Functions Describe Nan'am Inform Paraphrase Give directions Define Explain Give orders Give/support opinion Summarize Predict Use hypothetical,language Persuade Compare/comrast Cause/effect Disagree/agree Criticiz~ Approve/disapprove Exprss rulings/moods Suggest/rc~mmcnd Advise Complain Request Give feedback Elicit Invite/include others Negotiate Convince Interrupt
WRITING Functions Describe Narrate Inform Paraphrase Give directions Define Explain Give orders Give/support opinion Summarize Predict Use hypothetical language Persuade Compare/contrast Cause/effect Disagree/agree Criticize Approve/disapprove Express feelings/moods Suggest/recommend Advise Complain Request
Apologize
Sympathize/console Compliment Congratulate Make introductions Make "small talk" Express greetings/farewells
FIGURE 3 Working Model of Communicative Language Use in an Academic Context, April 1992
ACADEMIC CONTEXT (observable situation)
OUTPUT
~.
INTERNAL GOAL-SETTING
WORLD KNOWLEDGE
discourse sociolinauistic
'
I I II !1
6]
@
Gover Printed on Recycled Paper 58701-1395,2 Y57M,75 253705 Printed in U.S.A.