Weigle 2004

Assessing Writing 9 (2004) 2755
Integrating reading and writing in a competency test for non-native speakers of English
Sara Cushing Weigle
Department of Applied Linguistics and ESL, Georgia State University, P.O. Box 4099, MSC 4C1250, Atlanta, GA 30302-4099, USA
Abstract This paper reports on a test that is being used to fulll a university writing examination requirement for non-native speakers of English. The test, which requires students to read two passages, write short-answer comprehension and synthesis questions, and write an argument essay on a topic related to the passages, replaces a test that was based on a choice among several short prompts. In the paper, three aspects of test validation are considered. First, a comparison between the new test and the old test shows that rater reliability and consistency across test topics is substantially higher on the new test. Second, a review of transcripts of students who failed the new test three or more times reveals that these students are also having difculty in English and History courses. Finally, an interview with a test preparation course instructor reveals that the new test is having positive washback: that is, the course to prepare students for the new course focuses on skills that are more easily transferable to other academic courses than the course preparing students for the old test. The costs of preparing a test of this nature are addressed, and the argument is put forward that if students are required to pass an examination as a condition of graduation the institution must be willing to provide sufcient resources to test students fairly. 2004 Elsevier Inc. All rights reserved.
Keywords: Writing assessment; Reading; Competency testing; Nonnative writing; Reliability; Validity
Tel.: +1-404-651-3223. E-mail address: sweigle@gsu.edu (S.C. Weigle).
1075-2935/$ see front matter 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.asw.2004.01.002
28
S.C. Weigle / Assessing Writing 9 (2004) 2755
1. Introduction Since the 1970s, many colleges and universities in the United States have had a formal writing prociency requirement for all undergraduate students. White (1994) provides both a history and a critique of the movement to certify writing prociency, noting that this movement came about as a response both to the expansion of postsecondary education to groups who had previously been excluded and to complaints by employers that college graduates did not write well enough for their job requirements. White outlines a number of ways in which institutions have met the challenge of ensuring that their graduates can write well, from requiring students to pass a certain number of writing-intensive courses to instituting a portfolio requirement. Perhaps the least satisfactory course of action, in Whites view, is a standardized essay examination administered at the campus or system level. Such tests may take the form of an exit examination following rst-year composition courses, while others are rising junior examinations, taken before students begin upper division courses. In critiquing the campus or system-wide writing prociency test, White provides two main arguments, both of which have frequently been stated in the literature on writing assessment. First, no single test is reliable enough to be depended upon for making important decisions about students lives. Reliability is a particular concern for essay examinations on at least three counts: they rely on human ratings and are thus subject to rater bias, they consist of a single item (from a psychometric perspective), and the topic of the test introduces error variance that is difcult to measure and overcome. This is particularly true for timed impromptu essay tests, where students are asked to write on a topic that they have not seen prior to the test. Second, in practice, essay tests have a tendency to be divorced from writing instruction, thus having a deleterious effect on both assessment and instruction. The test becomes a hurdle for students to overcome rather than a true indicator of a students accomplishments as a writer, and teachers, feeling pressure for their students to pass the test, reduce writing instruction to a focus on the formal aspects of passing essays such as the familiar ve-paragraph essay format. Similar points are made by Hamp-Lyons (1991), Hamp-Lyons and Kroll (1996), and others. Despite almost unanimous agreement among both writing specialists and psychometricians that an essay examination is not appropriate as a single criterion for high-stakes decisions such as graduation, such examinations still exist at many institutions. A prominent example is the University System of Georgia, which requires all undergraduate students to pass a Regents Test in writing (as well as a test in reading) as a condition of graduation (Burk & Fitzpatrick, 1982). Unlike most institutions that have examination requirements, students cannot demonstrate prociency in reading and writing in alternative ways, such as by passing upper division reading- or writing-intensive courses. While the Regents requirement is unyielding for native speakers of English, there is, fortunately, some exibility when it comes to testing students who selfidentify as non-native speakers (NNSs), whether they be resident or international
29
students. All native speakers must take the same test, while individual campuses have the option of designing their own assessments for NNSs, as long as this assessment includes an essay. At Georgia State University, the assessment procedure for NNSs for many years was an adaptation of the Regular Regents Test, described in the following. Faculty involved in the scoring of the test for NNSs and in preparing students for the test found the test problematic, however, and when a decision was made in 2000 to move the assessment of NNSs away from the campus testing center to the Department of Applied Linguistics and ESL, an opportunity was created to improve the testing procedures for NNSs. This paper describes the new test and steps that have been taken to validate it. In the paper I argue that a test of writing that incorporates reading can provide more valid information about students ability to write at the college level than a prompt-based essay test.
2. Background As noted above, serious doubts have been raised over the years of the appropriateness of requiring students to pass an essay test as a single criterion for high-stakes decisions. The arguments against timed impromptu essay tests center around issues of reliability, authenticity, and validity. In terms of reliability, the two main sources of error variance in scoring essays are raters and tasks, as noted above. Error variance due to raters can be reduced through the use of multiple raters and appropriate training, but error variance due to the sampling of writing tasks can be a more serious problem, particularly if examinees write on a single task chosen by the examiner (Breland, 1996). Having students write on multiple tasks can help mitigate this problem, but this solution can be costly in terms of administration time and resources for scoring. As for authenticity, several scholars have pointed out that writing an essay on a previously unseen topic is an inauthentic task as far as academic writing is concerned, as students rarely if ever face this task in any of their content courses. Finally, in terms of validity, the snapshot approach to writing (Hamp-Lyons & Kroll, 1996) is very limited in its ability to assess the full range of a writers ability, as it does not provide opportunities for students to express themselves in more than a single genre for a single purpose. Furthermore, questions have been raised about the reductive nature of holistic scoring (Charney, 1984; Huot, 1990) because it reduces the complex multifaceted nature of reading to a single number. For NNSs, timed impromptu essays may be even more problematic, particularly when their essays are scored along with NS essays. Research suggests that raters who have not had training or experience with the writing of second-language users may give undue attention to sentence-level errors (Sweedler-Brown, 1985) and that faculty from different disciplines may use different criteria for judging NNS writing (Brown, 1991; Mendelsohn & Cumming, 1987; Santos, 1988). Furthermore, using a single scale to rate essays is particularly problematic for NNSs, who may develop different aspects of writing skill at different rates (Hamp-Lyons, 1991).
30
A number of studies have documented the difculties that many ESL students have in passing mandated writing prociency examinations. Ruetten (1994) reported that 62% of NNSs failed a writing test given at the end of the second-semester freshman composition course at the University of New Orleans, compared to 30% of NSs. Johns (1991) presented a case study of an otherwise successful NNS student who was unable to pass the writing prociency examination at San Diego State University. This student, a biochemistry major, had passed all his general education courses and had a high grade point average but his efforts to obtain his degree were hindered by his inability to pass the mandated writing examination. At Georgia State University, Byrd and Nelson (1995) reported a pass rate of 49% on the Alternate Regents Essay Test among rst-time test takers in 1991. Four years later, 8% of the students who had taken the test still had not passed the test. Byrd and Nelson found that a few of these students matched the prole of Johns (1991) case study: students in the sciences with above-average grades who nonetheless have difculty passing the test. For the majority of repeat failures, however, the failure to pass the test was part of an overall pattern of academic struggle and failure (p. 280); these students generally had low grades, including several withdrawals from courses or failing grades. An alternative to prompt-based impromptu writing test is a test that integrates reading with writing by having examinees read and respond to one or more source texts. While source-based writing tests cannot overcome many of the problems inherent in writing tests, they do have some advantages over prompt-based tests. First, a great deal of research on academic writing highlights the fact that academic writing is rarely done in isolation, but is virtually always done in response to source texts (Cumming, Kantor, Powers, Santos, & Taylor, 2000; Hale et al., 1996; Hamp-Lyons & Kroll, 1996; Horowitz, 1991; Leki & Carson, 1997; Weigle, 2002). That is, students in content courses (with the exception of composition) are rarely if ever asked to write essays based solely on their background knowledge; before they write on a given topic they are expected to read, discuss, and think critically about that topic. Another argument for using a source text as a basis for writing is that it provides a common information source for all test takers, putting them on a more equal footing in terms of the amount of background knowledge needed to respond to a writing task. Furthermore, a source text can serve to activate the writers knowledge or schemata around a topic, helping them generate ideas for their writing. Some research suggests that several short readings on a related topic may be even more effective than a single longer reading by providing several different ways of approaching the topic (Smith et al., 1985). Hamp-Lyons and Kroll (1996) argue for the view that academic writing should be viewed in the context of academic literacy: in order to be part of the academic conversation (p. 226), students must respond to texts that have been written and invite responses to their own texts. This view of academic literacy suggests that tests should include some integration of reading with writing, for example, by providing a source text for writers to respond to. A similar point is made by Leki
31
and Carson (1997), who note that writing in content courses is virtually always text-responsible, meaning that students are held responsible for conveying information from source texts accurately. The use of source texts is becoming more common in writing tests at the university level, both for native and non-native speakers. For example, in taking the Subject A examination, required of incoming freshmen throughout the university system of California, students read a 7001000 word passage and write an essay based on that passage in which they provide reasoned, concrete, and developed presentations of their points of view (http://www.ucop.edu/sas/sub-a/requirement.html, retrieved October 25, 2003). An undergraduate writing assessment used at the University of Michigan (Feak & Dobson, 1996) provides several quotations on a particular topic. Students read and take notes on the quotations, and then write an essay based on these quotations. The Canadian English Language Assessment (Jennings, Fox, Graves, & Shohamy, 1999) integrates listening, reading, and writing, as test takers write an essay that incorporates information from a lecture and readings. The Test of English as a Foreign Language (TOEFL) will be including integrated tasks in its new form, to be used operationally beginning in 2005 (http://www.toe.org/nextgenerationcbt.html, retrieved November 2, 2003).
3. Description of Regents Test and Alternate Regents Test The Regents testing requirement in the University System of Georgia has been in effect since 1972. Students must take the Regents Reading and Writing Tests as soon as they have completed 30 semester hours; if they have not passed both parts of the test by the time they have completed 45 hours they must enroll in the appropriate remedial course every semester until they have passed both the reading and the writing tests (http://www.gsu.edu/wwwrtp/semester.htm, retrieved October 15, 2003). The writing test consists of a 60-minute timed impromptu essay on one of four short prompts chosen from a bank of several hundred approved prompts, which students may access on-line (http://www.gsu.edu/wwwrtp/topics.htm, retrieved October 15, 2003). The prompts are of two types: those that require students to draw on their own experience (e.g., Has credit buying affected your way of life? Explain.); and those that ask students to discuss current events or issues (e.g., Discuss the most important characteristics an elected ofcial should have.) The prompts are solicited and reviewed by a statewide Testing Subcommittee to meet certain criteria; for example, prompts should not contain difcult vocabulary, require specialized knowledge, involve highly controversial or emotional subjects, or encourage students to identify their institutions in their essays. For each administration, test forms containing four prompts are assembled by program staff. In principle, each form of the test should contain at least one prompt that requires general knowledge of current issues and one that can be answered
32
on the basis of personal experience. (http://www.gsu.edu/wwwrtp/essadev.htm, retrieved October 15, 2003). As this description indicates, the prompts are chosen with a view towards providing students a variety of topics to choose from. However, the prompts are not pilot tested before being used operationally, nor does the Regents Testing Program publish data on which prompts are chosen most frequently or how students perform on individual prompts. The Regents Essay Test is rated on a four-point holistic scale, with any score 2 or above considered a passing score. Each test is rated by three raters; the reported score is the rating on which at least two out of the three raters agree, or the middle rating if no two raters agree. The tests from all institutions are scored together at several scoring sites each semester to ensure that similar criteria are used across the university system. As mentioned above, while the same statewide test is required for native speakers throughout the university system, each campus has some leeway in testing non-native speakers. Until Fall 2001, the alternative test for NNS at Georgia State University was administered by the campus testing center and scored by representatives from AL/ESL and English. The Alternate Regents Test was similar to the regular Regents Essay Test with the following modications:

Students were allowed 90 minutes rather than 60 minutes to complete their essays. Dictionaries were allowed during the last 30 minutes of the test. A small number of prompts were used, which had been screened for their appropriateness for ESL students. Raters included faculty from English and ESL rather than English only. Only the score points 1 (fail) and 2 (pass) were used in rating. Essays for training raters came from a pool of essays written by NNS, rather than the published training samples for native speakers.
Despite these modications, numerous problems remained with the test. In addition to the general concerns about the appropriateness of the test format for NNSs as discussed above, the faculty from the English and AL/ESL departments who scored the test over a period of years felt that many of the prompts were problematic for various reasons. As an example, one prompt reads Do you like shopping? Why or why not? Raters felt that this prompt was not only non-academic in nature but also tended to elicit poor essays, as students trying to t the ve-paragraph essay format struggled to come up with three different reasons for why they liked or did not like shopping. In addition, raters and teachers of ESL sections of the test preparation course were concerned about the scoring procedures, as the test was scored strictly on a pass/fail basis. Students would have no way of knowing whether their failure in the test was due to linguistic or rhetorical problems, and thus the test results were uninformative for diagnostic purposes.
33
4. Description of GSTEP Regents In 19992000, two inter-related events happened that prompted a change in Regents testing procedures for NNS. For a variety of reasons, the decision was made to move all NNS Regents testing at GSU from the campus testing center to the Department of Applied Linguistics & English as a Second Language (AL/ESL). Previously AL/ESL had only been in charge of placement testing for incoming international students and of providing an alternative for NNS students to the Regents Reading Test but not the Essay Test. At the same time, AL/ESL was in the process of revising the English prociency test required of all international students on campus. The test, called the Georgia State Test of English Prociency (GSTEP), had been developed in-house during the 1980s, with a series of minor revisions in the mid-1990s. The reading/grammar portion of the test consisted of four short (300-word) reading passages with 10 multiple-choice items each, 40 single-sentence grammar items, and a 30-minute timed impromptu essay, using retired prompts from the Test of Written English (ETS, 1989).1 Two forms of the reading/grammar portion had been in use for nearly 20 years, while six forms of the writing test (each with two possible topics) had been in use for about 10 years. The reading section of the test was administered separately every semester for NNS who wanted to use it as an alternative to the Regents Reading Test. The test revision was motivated by several factors. First, the test had remained essentially the same for many years, while the institution had changed considerably and the test was being used for many more purposes. For example, the TOEFL requirement for undergraduate students had been raised from 525 to 550, so that the highest level on the GSTEP no longer corresponded with the admissions TOEFL criterion. Furthermore, increased numbers of international students were coming to campus for graduate study, and the Department of AL/ESL had responded to this inux by introducing several credit-bearing courses for these students, and faculty felt that the GSTEP did not provide enough information to place students accurately into these graduate-level courses. There were also obvious concerns about test security, given the small number of forms and the length of time the test had been in use. Finally, the test did not reect current thinking about the nature of language prociency or the best way to test it. With the decision that AL/ESL was to provide assessment of writing as well as reading for Regents testing purposes, an opportunity arose to create a test that would be a more valid and useful measure of students ability to write authentic academic prose as well as to nd a way to give better diagnostic information to students who failed the test. AL/ESL had also begun teaching test preparation courses for students who failed the Regents Test, so the new test was designed with a view towards positive washback: that is, it was hoped that preparation for a
1 The GSTEP also contains a listening section and an oral interview, which are not relevant to the use of the test for Regents testing purposes and thus will not be discussed further in this paper.
34
more academically oriented test would give students skills that would more easily transfer to their other courses.
5. Test format The new reading/writing sections of the GSTEP were developed by a committee headed by an assessment specialist in consultation with other AL/ESL faculty, ESL teachers, and a working group consisting of faculty members from other disciplines within in the university. The format developed for the new test was designed to reect more accurately the kinds of reading and writing that students need to do in their academic work. In particular, the committee felt it was important in reading to be able to read for a specic purpose, obtain and synthesize information from texts, and use source materials to support an argument in writing. The new test has three sections. 5.1. Multiple-choice reading This section of the test includes three reading passages of approximately 500 words with 10 items each. The reading passages are taken from introductory textbooks or other sources that college students would be likely to encounter (e.g., general interest magazines). Test items are of four types: Reading for gist or authors purpose; reading for information/details; making inferences; understanding specic vocabulary. Since this portion of the test is not used for testing writing, it will not be discussed further in this paper. 5.2. Integrated reading/writing In this section of the test, students read two argumentative passages of approximately 300350 words each. These passages are related to a single theme (e.g., the use of computers in education, globalization) but take different perspectives on the theme. Students read the two passages and answer eight open-ended questions about the passages: three questions dealing with the main ideas and supporting details of each passage separately, and two questions asking students to compare or contrast the ideas presented in the passages. A sample test form is found in Appendix A. Note that this is a practice form, not an operational form for reasons of test security. There are three reasons for including short-answer comprehension questions in this section of the test. First, we wanted to test reading in a format other than multiple-choice, as questions have been raised about the validity of multiple-choice reading tests, especially for non-native speakers of English (Bernhardt, 1991; Gantzer, 1996; Nelson & Byrd, 1998) Second, short-answer test questions are an authentic form of writing in many academic courses (Carson et al., 1992; Hale et al., 1996) in that content-area faculty frequently include this format of question
35
on their tests. Finally, we wanted to provide a task that would help students prepare for the essay portion of the test, by having them identify the main points in each passage and compare the points of view presented in the two passages. For this section of the test, student responses are graded holistically on two scales: content and language (see Figs. 1 and 2). A unique feature of these scales is that they relate to the short-answer items as a whole rather than as individual items; that is, raters are asked to judge whether a students responses overall indicate various degrees of comprehension of the passages (in the case of content) or language prociency (in the case of language). The content scores are used in conjunction with scores on the multiple-choice questions as a measure of reading prociency, and the language scores contribute to the assessment of writing skills, including the ability to paraphrase.
Fig. 1. GSTEP short-answer scoring rubric: content.
Following the short-answer section of the test, students write an argument essay in which they must take a position on the topic discussed in the readings, using specic information from at least one of the two passages (see Appendix A). The essays are graded on an analytic scale comprising four aspects of writing, each with 10 points: content, organization, language: range and complexity, and language: accuracy (see Appendix B). The content and organization scores are added together to make a single Rhetoric score (maximum points = 20), and the two language scores are combined to make a single language score (maximum points = 20).
Fig. 2. GSTEP short-answer scoring rubric: language.
36
Essays are read by two raters, with a third rater adjudicating if the scores on Rhetoric or Language differ by more than three points (when the essays are graded for placement purposes) or when the two ratings fall on either side of the decision point (when the test is used for Regents testing purposes). The reported scores are the averages of the two ratings that are closest together. Two forms of the reading/writing test were developed for use in the 2001/2 academic year, with a third form introduced the following year. For security reasons, the specic content of the test cannot be disclosed here, but the general topics of the forms are (1) the use of computers in education; (2) the potential effects of overpopulation on the worlds resources; and (3) the advantages and disadvantages of globalization. 6. Score reporting Scores on all subtests of the GSTEP (including those that are not discussed in this paper) are reported as band scores from 1 to 7 (see Appendix C for a description of how the band scores are interpreted). For writing, students receive two band scores. The Rhetoric score is based on the content and organization scales, as mentioned above, while the Language score is the average of the language scores from the essay and from the short-answer section (the content scores on the short-answer section are factored into the Reading score but not the Writing score). For Regents testing purposes, a band score of 6 (minimum raw score 13) is required in Rhetoric and a band score of 5 is required for Language. Minimum scores for band 5 in Language are 10 for the language score on the essay and 2.5 for the language score on the short-answer section. These minimum scores were arrived at through standard-setting meetings with representatives of the English and AL/ESL department who had expertise in scoring the regular Regents Essay Test and the Alternative Regents Essay Test. The standard-setting group felt strongly that the standards for Language should not be as high as those for Rhetoric, since the main goal of the Regents testing program is to ensure that students are able to organize and develop an essay appropriately, not to test sentence-level second-language prociency. 7. Research questions As Messick (1989) states, the process of test validation is on-going, involving an accumulation of evidence supporting the use of a test from a variety of perspectives. In this paper I will report on three areas that have been investigated to support the use of the Regents GSTEP to fulll the Regents writing requirement for NNS. These areas/questions are: 1. How does the new test compare to the old test in terms of (a) overall pass rates and (b) reliability across prompts and raters?
37
2. Do students who repeatedly fail the test have similar proles to those who did so on the old test? 3. How has the test preparation course changed in response to the new test?
8. Procedures Data from two years of the Alternate Regents Test (19982000) and two years of the GSTEP Regents were compiled for the study. Because of the differences between the two tests, different data were available for each test. For the Alternate Regents Test, data included: the specic form given to each student (1 through 13), the prompt chosen by each student, the ratings of each of three raters for each administration, and the nal result (pass or fail). Since each essay was rated three times and only two score points were used (1 = fail, 2 = pass) the possible scores range from 3 (all fails) to 6 (all passes); scores of 3 or 4 were failing, and 5 or 6 were passing. For the Regents GSTEP, data included the test form, the reported scores for the two composite essay scales (Rhetoric and Language) and for the Short Answer Language section, and the individual ratings on each scale for the rst two raters and any third raters. For students who failed the test, data were compiled which showed whether they had failed on the basis of rhetoric, language, or both. The data for individual ratings was available for all test administrations except the second administration from summer 2003. In addition, students who had taken the Regents GSTEP three times were identied and their transcripts examined for the following data: native language, major, grades in courses in English and History (representing reading- and writingintensive courses); and number of courses with grades of D, W (withdraw), WF (withdraw failing) or F. Finally, to investigate washback that is, the effect of changes in the test on the test preparation course the sole instructor in AL/ESL who had taught Regents writing courses both before and after the change in test format was interviewed regarding the changes made to the course following implementation of the new format.
9. Results 9.1. Comparison of Regents GSTEP with former Alternate Regents Essay Test As discussed previously, the alternative to the Regents Essay Test that had been available to NNSs prior to 2001 was a timed impromptu essay on a choice among four prompts. At every administration up to 13 different forms were used, so that, for example, if 130 students were tested at one time, approximately 10 students would be given any one form. One important question is how comparable the
38
scores on the new test are to those on the old test. In particular, given that raters of the Alternate Regents Test had the possibility of rating essays on 40 or more different prompts at any given administration, it is important to look at the scores with respect to the different prompts. To investigate the pass rates over two years, data were compiled on 13 forms that were administered during the academic years 19982000. Table 1 summarizes the data for these 13 forms, including the number of students who were given each form, the number selecting each of the four prompts on the form (these are listed from least popular to most popular prompts within each form), and the overall pass rate for the form. As the table shows, between 23 and 40 students were given each form of the test during these two years. Within every form, there was at least one prompt that was rarely if ever chosen (02 students). Form 7 is remarkable in this regard: three of the four prompts were rarely chosen, with the fourth prompt, which had to do whether it is an advantage or a disadvantage to have a job during college, being chosen by 25 out of 28 students. While the sample size for each form is not large, it does seem to be clear that some topics were much more attractive to students than others, and that some forms contained a greater number of attractive topics than others. In terms of the pass rates, these ranged from 0.60 (Form 9) to 0.92 (Form 7), with an overall pass rate of 74%. Again, the sample sizes for each form are somewhat small, but the data do suggest that differences among the forms themselves are a source of construct-irrelevant variance; that is, scores seemed to be at least partly a function of the particular form a student happened to be given rather than being strictly based on their writing abilities. Table 2 presents data from the new Regents GSTEP from Fall 2001 through Summer 2003. In contrast to the old Alternate Regents Test, in which several forms of the test were administered simultaneously, a single form of the test was administered on each date. In general, the intention is to administer a different form on each test date when the test is offered twice during the semester; however, this protocol was not followed for Fall 2002 or Summer 2003. The table presents average scores on the three scales (Rhetoric, Language, and Short Answer Language), along with pass rates for each administration. As the table shows, the pass rates overall are much higher on the Regents GSTEP than on the old Alternate Regents Test, with pass rates at around 90% except for the rst semester that the test was given. Furthermore, there is no consistent pattern of scores across topics. In fact, within semesters, scores were most different from each other when the same topic was used on both test dates (Fall 2002 and Summer 2003). After Fall 2001, the most consistent trend in the data is for the scores on the second test date in any given semester to be lower than those of the rst date. These results could either be due to differences in scoring from one test date to the next, or to differences in the ability levels of the students who choose to take the test earlier or later. In Fall and Spring the test dates were several days apart and ratings were done on two separate occasions; however, in Summer 2003 the administrations were on consecutive days and all the ratings were done at the same
Table 1 Prompts selection and pass rates on 13 forms of the Alternate Regents Essay Test (19982000) Test Form 1 N Number choosing each of the four prompts 24 2 3 8 11 7 17 0.71 2 24 2 4 5 13 8 16 0.67 3 40 2 5 16 17 8 32 0.80 4 26 2 4 9 11 6 20 0.77 5 30 2 5 10 13 8 22 0.73 6 29 3 5 10 11 5 24 0.83 7 28 0 1 2 25 4 24 0.92 8 24 2 2 8 12 6 18 0.75 9 25 0 5 9 11 10 15 0.60 10 27 2 4 4 17 7 20 0.74 11 34 2 8 12 12 12 22 0.65 12 23 1 5 5 12 6 17 0.74 13 28 0 3 11 14 7 21 0.75 94 268 0.74 Total 362
Number failing Number passing Percent pass (1.00 = 100%)
39
40
Table 2 Summary of results from GSTEP Regents Fall 2001Summer 2003 Semester Administered form N Rhetoric Language SAL Pass (%)a Fail-R (%)a 13.86 13.40 15.63 15.47 15.02 16.56 15.91 14.78 15.10 16.18 15.16 3.39 9 (0.50) 3.51 70 (0.69) 4.00 47 (0.90) 3.88 65 (0.90) 3.82 61 (0.94) 3.83 43 (0.96) 3.86 51 (0.88) 3.83 21 (0.91) 3.84 36 (0.90) 4.32 28 (1.00) 4.06 27 (0.84) 5 (0.28) 26 (0.25) 4 (0.08) 5 (0.07) 2 (0.03) 2 (0.04) 3 (0.05) 1 (0.04) 4 (0.10) 2 (0.06) Fail-L or B (%)a 4 (0.22) 6 (0.06) 1 (0.02) 2 (0.03) 2 (0.03) 4 (0.07) 1 (0.04) 3 (0.10)
Fall 2001 Spring 2002
1 Globalization 18 13.67 2 Computers 102 13.71 1 Computers 2 Globalization 52 15.88 72 15.71 65 15.49 45 16.53 58 15.52 23 15.74 40 14.86 28 16.25 32 15.09
Summer 2002 1 Computers Fall 2002 Spring 2003 1 Globalization 2 Globalization 1 Globalization 2 Population
Summer 2003 1 Population 2 Population

a
1.00 = 100%.
time. This would suggest that the latter interpretation is more plausible: that weaker students may postpone the test when possible. In any case the data do not show any consistent pattern of bias due to test topic; there is as much variation within topics as across topics. The table also shows that scores from Fall 2001 were noticeably lower than those in subsequent semesters. One possible reason for this result is that the test was new and students and teachers were unfamiliar with the test format. After the rst semester, more materials were disseminated to students and instructors described in detail the test format and the scoring rubric; this may account for the increase in overall scores following the rst semester. In terms of reasons for students failing, the table shows that students were more likely to fail on the basis of rhetoric than on the basis of language. The majority of students who failed the test did so on the basis of rhetoric, particularly in the rst year of the test. Language was an additional factor in failures in some cases, but for the most part language in and of itself was not the primary reason for failure. Only four students passed on the Rhetoric scale but failed on the Language scale; because of the small number of these cases they are not shown in the table. In addition to consistency of scoring across prompts, another important aspect of reliability is inter-rater reliability, or the degree to which raters agree on their scores. This is frequently reported as a reliability coefcient; however, in practical terms, the percentage of agreement between two raters is sometimes a more useful indicator of reliability. For the Alternate Regents Test, three raters read all essays and scored them either 1 (fail) or 2 (pass). Table 3 shows the percentage of essays on which the rst two raters agreed and the percentage on which all three raters agreed, by test form. As the table shows, the overall agreement rate between the rst and second raters was 79%, with rates varying by form from 58% to 88%. The
S.C. Weigle / Assessing Writing 9 (2004) 2755 Table 3 Rater agreement by test form, Alternate Regents Form 1 2 3 4 5 6 7 8 9 10 11 12 13 Total
a
41
N 24 24 40 26 30 29 28 24 25 27 34 23 28 362 1.00 = 100%.
Pass (%)a 0.71 0.67 0.80 0.77 0.73 0.83 0.92 0.75 0.60 0.74 0.65 0.74 0.75 0.74
Percent agreement (raters 1 and 2)a 0.88 0.83 0.85 0.58 0.77 0.79 0.82 0.83 0.64 0.78 0.79 0.82 0.82 0.79
Percent agreement (all raters)a 0.71 0.54 0.75 0.50 0.67 0.59 0.79 0.75 0.56 0.63 0.62 0.74 0.64 0.65
rate of agreement among all three raters is somewhat lower, with all three raters agreeing only 65% overall. By comparison, Table 4 presents two aspects of inter-rater agreement on the Regents GSTEP essay. As noted earlier, when the GSTEP is used for placement purposes (rather than Regents testing), a third rater is used when scores on either subscale (Rhetoric or Language) differ by three or more points. For Regents testing
Table 4 Percentage agreement between rst and second raters, GSTEP Regents essay Score differences <3 points Semester Fall 2001 Spring 2002 Form N Pass (%)b Both scores agree on pass/fail decision Percent (Language)b 0.94 0.96 0.98 0.97 1.00 1.00 0.97 1.00 1.00 1.00 n/a
Percent Percent Percent (Rhetoric)b (Language)b (Rhetoric)b 0.78 0.86 0.87 0.99 0.97 0.98 0.97 1.00 1.00 1.00 n/a 0.95 0.83 0.88 0.97 0.97 0.98 0.97 1.00 0.98 1.00 n/a 0.89 0.89 0.98 0.97 1.00 0.96 0.93 1.00 1.00 1.00 n/a
1 Globalization 18 0.50 2 Computers 102 0.69 1 Computers 2 Globalization 52 0.90 72 0.90 65 0.94 45 0.96 58 0.88 23 0.91 40 0.90 28 1.00 32 0.84
Summer 2002 1 Computers Fall 2002 Spring 2003 1 Globalization 2 Globalization 1 Globalization 2 Population
Summer 2003 1 Population 2 Populationa

a b
Individual rating data not available for this administration. 1.00 = 100%.
42
purposes, since the decision is pass/fail, a third rater is only used when raters scores differ across the decision point (13 points for Rhetoric, 10 for Language). The table shows the agreement rates on the essays using both these criteria. From the table it can be seen that overall the agreement rates on the GSTEP essay are quite high: over 95% for nearly every administration after the rst semester. There are several possible reasons for the higher agreement rates on the GSTEP Regents. The sheer number of different prompts in the Alternate Regents Test compared to the GSTEP Regents Test is probably a factor: the essays used in rater training for the Alternate Regents represent only a fraction of the total number of prompts, and raters do not have the opportunity to build consensus on the qualities of passing essays on each and every prompt. For the GSTEP Regents Test, there are training essays for each of the three forms of the test, and raters are normed using the specic form that they will be rating. The fact that the essay content is based on the readings rather than on students personal experience also makes the essays more directly comparable to each other, which may lead to greater reliability. Furthermore, the rating scale for the Alternate Regents consists of only two score points and as a holistic score does not allow raters to consider different aspects of writing separately. An essay that is strong rhetorically but somewhat weak linguistically might thus receive a passing score from a rater who is more oriented towards rhetorical factors and a failing score from a rater who focuses more on sentence-level concerns. The scales for the Regents GSTEP have a wider range of possible scores and are composites of two scores (content and organization for the Rhetoric scale, and accuracy and range/complexity for the Language scale), which tends to increase the reliability of ratings (Bachman & Palmer, 1996). 9.2. Failing students As noted earlier, Byrd and Nelson (1995) found that most of the students who repeatedly failed the Alternate Regents writing test were students who showed a pattern of failure in their university courses, not generally successful students apart from their inability to pass the test. In that study, the records of students who had not passed the test after four years were examined. In the current study, only two years of data on the test are available so it is not possible to replicate Byrd and Nelsons study exactly. However, data from those students who failed the GSTEP Regents three or more times from 20012003 can be examined for their similarities to Byrd and Nelsons data. During the two years covered by the study, nine students out of 476 took the test three or more times. Of these, four still had not passed the test after their third attempt. The records of these students were examined to see whether they t the proles of Johns (1991) student or the students who failed the Regents Test as described in Byrd and Nelson (1995). The results are summarized in Table 5. The following data were compiled for these students: (a) native language; (b) major; (c) GSTEP Regents scores. As the table shows, all four students failed on both rhetoric and language at least once, and all but one failed on both sections two or three times.
Table 5 Proles of students who have failed the GSTEP Regents three or more times GSTEP Regents results ID 1 2 3 4 L1 Vietnamese Korean Vietnamese Vietnamese Major Computer Science Mathematics Finance Risk Management Date entered GSU 1998 1998 1999 2001 Failed both 2 3 1 3 Failed R only 0 0 2 0 Failed L only 1 0 0 0 Courses in English and History Number 5 4 4 1 A or B 0 1 2 1 C 0 2 1 0 D or F 2 1 1 0 Withdrawal 3 0 0 0 Total Withdrawal
7 8 7 1
43
44
In order to look at performance in writing-intensive courses, students records in History and English courses were examined. The table shows how many History and English courses each student took, and of these, how many received As or Bs, Ds or Fs, and how many times students withdrew from the course without receiving a grade. In addition, the table shows how many total withdrawals before the semester midpoint each student has. As the table shows, all four students are Asian and have majors that are heavily mathematics oriented. Students 1 through 3 in particular seem to t the prole of students identied by Byrd and Nelson (1995): they received failing or minimally passing scores on half or more of their English and History courses and show a pattern of frequent withdrawals from courses. Student 3 is a particularly interesting case, as this student spent three years in GSUs Intensive English Program before beginning university courses. As noted in the table, the student failed the structure/composition course at the two highest levels of the IEP before eventually passing them, indicating perhaps that despite intensive language training the student lacked motivation, strategies, or ability to do college-level work, at least in some subject areas. As for student 4, this student had only attempted one English course in his rst two years at GSU, concentrating instead on mathematics courses. All four students have fairly low language prociency as measured by the test, and this may be a factor in their avoidance of or low performance in writing-intensive courses. This nding is similar to the results presented by Byrd and Nelson (1995). 9.3. Washback Students who fail the Regents Essay Test are required to register in (and pay for) a remedial writing course every semester until they can pass the test. Since the goal of the course is to prepare for the test, the format of the test clearly has implications for how the course is taught. Two years after the GSTEP reading/writing test replaced the former Alternate Regents Essay Test at GSU, I interviewed an instructor who had taught the course both before and after the change. Table 6 shows a comparison of the skills focus of each course, according to the instructor. As the table shows, some of the skills that were taught in the earlier version of the course are still an emphasis (practice essay writing, introductions and conclusions, paragraph unity and cohesion, self-editing). However, the emphasis has shifted away from the ve-paragraph essay format and from a focus on choosing among prompts, towards a focus on argumentation and the appropriate use of sources in support of an argument. While this is just one instructors perspective on the course, and different instructors teaching the course have a certain amount of leeway in constructing their own syllabus, it does indicate that the course content has changed in response to the change in test format, and in the predicted direction: towards a focus on skills that are arguably more relevant to other academic courses.
S.C. Weigle / Assessing Writing 9 (2004) 2755 Table 6 Topics taught in Regents preparation course for NNS in rank order of importance Topic Five-paragraph essay format Practice essay writing Introductions and conclusions Self-editing Strategies for choosing among prompts Paragraph unity and cohesion Deconstructing essay prompts Academic vocabulary Critical reading to identify authors argument and purpose Paraphrasing and summarizing Strategies for selecting information from Sources in supporting an argument Structuring an argument Alternate Regents 1 2 3 4 5 6 7 n/a n/a n/a n/a n/a
45
Regents GSTEP 11 2 5 4 n/a 1 10 9 7 8 6 3
10. Discussion Based on the data from the rst two years of the test, we are optimistic that the test is serving a useful purpose. The pass rate on the test is sufciently high that students do not seem to be unjustly required to take our writing courses if they do not actually need them. Furthermore, the pass rates are much more consistent than those of the test it replaced, where the data indicated a great deal of variability in scores due to specic topics or test forms. The scoring on the GSTEP Regents demonstrates very high consistency, indicating that there is very little score variance that can be attributed to the rating process. With the results presented in this paper, one might ask whether the standards set for the GSTEP Regents are too low, since on average 90% of students pass the test at any given administration, compared to 75% statewide who pass the regular Regents Essay Test. The published guidelines for Regents raters suggest that an appropriate failure rate is between 15% and 40%, and that raters who give failing scores to more or fewer students are not rating according to the Regents standards. While it is important to make sure that raters are using similar standards, there is the potential for these guidelines to encourage raters to fail students simply in order to keep their percentages in-line with other raters. In fact, raters who have scored the Regents Test have told me privately that they have done so on occasion for fear of seeming unduly generous in their scores. There is no such pressure to fail students on the GSTEP Regents, so one might turn the question around and ask whether the regular Regents Essay Test has an articially high failure rate. To my knowledge this question has not been investigated. The few students who fail the GSTEP Regents consistently have low language prociency, which seems to be affecting their progress towards their degrees, as evidenced by low grades on writing-intensive courses and frequent withdrawal from courses. Byrd and Nelson (1995) recommend that students who t this prole
46
receive intensive language instruction; however, at least in the case of one student (student 3), intensive English courses did not sufciently prepare the student for academic work. Currently students who complete the IEP with passing grades do not need to provide additional evidence of language prociency for university admission. While one students poor performance should not trigger a change in policy, it may be a worthwhile endeavor for the IEP to monitor their graduates progress in their university courses, particularly in cases where students require two or three attempts to pass their IEP courses. One of the most positive aspects of the new test is that the connection between assessment and writing instruction is clearer. The test seems to be having the desired positive washback effect, as the GSTEP Regents preparation course focuses more on critical thinking and text analysis skills and less on strategies for deciphering and choosing writing prompts, at least in one instructors interpretation. These skills should be useful to students in all of their academic courses (although whether the course is in fact useful to students is an empirical question worthy of further research). It is important to note that there are a number of challenges to be met in continuing to develop and administer a test of this nature. First, it is difcult to ensure that test forms are equivalent. Determining whether the reading levels of the text passages are equivalent is a difcult proposition, since text difculty can be affected by many linguistic and extra-linguistic factors (Fulcher, 1997). The short-answer questions themselves are similar across test forms, but it is difcult to gauge whether, for example, it is easier to contrast the authors viewpoints in one set of passages than in another set. The use of a common rating scale can mitigate some of the issues with text equivalence, since scoring is not done on individual items but across a whole set of items, and the criteria are the same across test forms, but that in itself is not a guarantee that a student who happens to get one form of the test would get the same score on a different form. Related to this issue is the question of topics and the fact that students are not given a choice of topic. This is a thorny issue that test developers and researchers have been dealing with for many years: is it more fair to provide several options that students can choose from, so that students can select a topic that will show off their best writing, or to ensure consistency of scoring by asking all students to write on the same topic? In a timed impromptu essay test, it is easy to provide a list of short prompts and to allow students to choose, as this can be done relatively quickly. However, as the results from this study show, giving students a choice of impromptu topics can introduce unwanted variance and can reduce the reliability of the test. Once the decision is made to integrate reading with writing, one loses the option of giving choices. It is then particularly important for test developers to ensure that topics are equivalent. There are several steps that can be taken to minimize differences due to test forms. At a minimum, careful specications for choosing reading passages and constructing test tasks should be written (see Lynch & Davidson, 2001, for a thorough discussion of specication writing) and prompts should be pilot tested on a sample of equivalent test takers before being used
47
operationally. Ideally two test forms would be equated by having the same students write on both forms and comparing their results, but random assignment of test takers to different forms is usually a more practical method of form equating. The need to produce and equate new forms on a regular basis requires a substantial commitment of resources. A test of this nature requires a dedicated testing staff to nd appropriate reading passages, write items, arrange for pretesting and pilot testing, administer and score the test, and communicate test results to various audiences. For pilot testing new versions, a group of test takers must be found who are equivalent in many respects to the actual test takers and can be convinced to take the test. Often participants in pilot testing must be compensated for their time as a motivation for them to participate. In our case we were able to pilot test two forms of the test at a similar institution in a different part of the country by asking ESL teachers to give the test to their students at the beginning of a term as a diagnostic. For one form, the pilot testing revealed problems with one of the reading passages that were only uncovered when the tests were rated. Before the test could be given operationally, another passage had to be found and the test re-piloted. This experience highlights the importance of pilot testing before a test is used operationally. Despite these costs, it is important to continue developing new forms of the test for test security purposes and to keep the test up-to-date. In times of economic uncertainty, test developers and researchers need to make their voices heard to ensure that appropriate resources are committed to testing endeavors. The question of whether writing competency examinations should be required of university students is one that is still open to debate. However, as long as they are required, we owe it to our students to make sure that the tests are valid and fair. Acknowledgments The test development project reported on in this paper was supported by an Instructional Improvement Grant from Georgia State University. I am grateful to Tom Brezinsky for his assistance with organizing and analyzing the data for the study, and to Diane Belcher and Liz Hamp-Lyons for their insightful comments on earlier drafts on this manuscript. Appendix A. Sample test form
Directions: Read the following two passages that are arguing for two different sides of the same issue. After you nish reading, you will write the answers to eight questions based on the reading passages. In this section you will be graded on content and language: what you say and how you say it. You will have 45 minutes.
48
A.1. A. Excerpt from an essay published by the BioDemocracy and Organic Consumers Association Genetic engineering is a radical new technology, one that breaks down fundamental genetic barriers not only between species, but between humans, animals, and plants. By combining the genes of dissimilar and unrelated species, permanently altering their genetic codes, new organisms are created that will pass the genetic changes onto their offspring. Scientists are now inserting animal and even human genes into plants or animals, creating unimagined transgenic life forms. For the rst time in history, human beings are becoming the architects of life. Bio-engineers will be creating tens of thousands of novel organisms over the next few years. The prospect is frightening. Genetic engineering poses unprecedented ethical and social concerns, as well as serious challenges to the environment, human health, animal welfare, and the future of agriculture. The following is just a sampling of concerns. Genetically engineered organisms that escape or are released from the laboratory could cause a great deal of harm to the environment. Genetically engineered biological pollutants have the potential to be even more destructive than chemical pollutants. Because they are alive, genetically engineered products are inherently more unpredictable than chemical products they can reproduce, migrate, and mutate. Once released, it will be virtually impossible to recall genetically engineered organisms back to the laboratory. Since scientists will never be able to ensure a 100 percent success rate, genesplicing will likely result in unanticipated outcomes and dangerous surprises. Researchers recently found that genetically altering plants to resist viruses can cause the viruses to mutate into new, more virulent forms, or forms that can attack other plant species. Furthermore, genetically altered plants could produce toxins and other substances that might harm birds and other animals. Eventually, within the next few decades, agriculture will move off the soil and into biosynthetic industrial factories controlled by chemical and biotech companies. Never again will people know the joy of eating naturally produced, fresh foods. Hundreds of millions of farmers and other workers worldwide will lose their livelihoods. The hope of creating a human, sustainable agricultural system will be destroyed. A.2. B. Excerpt of an essay entitled Kill the Frankenstein Myth by Robert W. Tracinski, a senior writer for the Ayn Rand institute in Marina del Rey, California In reality, anti-biotech environmental activists claims against genetically modied foods are based not in science, but in a superstitious fear of science and technology. It is revealing that environmental activists have chosen to smear genetically modied foods with the term frankenfood, invoking Frankenstein, the classic horror story of a mad scientist who tampers with natures secrets and unleashes a rampaging monster.
49
But this Frankenstein myth, and its theme of the dangers of science, has been thoroughly refuted in the nearly 200 years since it was rst published. Science and technology have improved human life in countless ways, from the steam engine to the pasteurization of milk, from electrical power to antibiotics. And genetically modied foods are just the latest step in this march of progress. Farmers have long modied the genetic makeup of their crops and livestock through selective breeding choosing to breed the prize bull, for example, or planting seeds from the highest yielding stalks of wheat. But genetic engineering has made this process much easier and faster. For example, one popular variety of genetically engineered corn contains a gene taken from a bacteria; that gene produces a chemical toxic to caterpillars, giving the corn an inbuilt defense against harmful insects. This new technology is already providing farmers with crops that bear higher yields, grow in drier climates, require fewer pesticides, and so on. The result has been bigger harvests and lower costs for American farmers. And scientists have also begun engineering plants that grow better under difcult conditions, such as drought promising a new green revolution for the Third World. Genetically modied foods are not merely safe they are an enormous advance, and we should be applauding the heroes of science who invented them.
Directions: Answer the following questions based on the readings. There are several possible ways to answer each question. You should use your own words as much as possible; you will not nd the answers written word for word in the readings. Your answers will be graded on content and accurate use of English. Be sure to write complete sentences. NOTE: Your answer will be marked down if it contains fewer than 10 words or if your answer consists mainly of words taken directly from the readings.
NOTE: On the actual test there is more room to write responses! 1. What is the main argument in passage A? 2. What are some of the most important effects of genetic engineering, according to passage A? 3. According to passage A, how do biological pollutants differ from chemical pollutants? 4. What is the main argument in passage B? 5. What are some of the most important effects of genetic engineering, according to passage B? 6. Why are genetically modied foods sometimes called Frankenfoods, according to passage B? 7. How do the opinions of the two authors compare with regard to the use of genetic engineering to make plants resistant to viruses or bacteria?
50
8. How might the author of passage B respond to passage As prediction that never again will people know the joy of eating naturally produced, fresh foods? A.2.1. Part 2: Writing an argumentative essay Directions: Write a well-organized academic essay on the topic below. Your essay will be graded on content, organization, and appropriate use of English. You may refer to the reading passages while you are writing. ESSAY TOPIC: Some people believe that genetically modied plants are dangerous to our health and to the environment. Others believe that genetic engineering is an important tool in feeding the worlds population. Which position do you support? Use specic information from at least one of the two articles to support your ideas. You may use this space for notes. Write your essay on the lined paper. You will have 45 minutes to write your essay.
Appendix B. GSTEP essay scoring rubric Rhetoric Content Organization Language Language: accuracy Language: range and complexity 910 Uses a variety of sentence types accurately.
910 910 910 The treatment of Clear and The essay is the assignment appropriate clearly written completely organizational with few errors; fullls the task plan. errors do not expectations and interfere with the topic is comprehension. addressed thoroughly. Includes Fully developed Effective introduction and consistently evidence for generalizations conclusion. accurate word and supporting forms and verb ideas/arguments tenses. is provided in a relevant and credible way.
Uses a wide range of academic vocabulary.
51
Rhetoric Content Organization
Language Language: accuracy Language: range and complexity
Uses ideas from Connections Word choices are Source text source text well between and accurate and language is used to support thesis. within paragraphs appropriate. sparingly and are made through accurately effective and incorporated into varied use of writers own transitions and words. other cohesive devices. 78 78 78 The essay is The treatment of Clear organizational clearly written the assignment fullls the task plan. but contains some errors that do not expectations interfere with competently and comprehension. the topic is addressed clearly. Evidence for Satisfactory The essay may introduction and contain some generalizations conclusion. errors in word and supporting ideas/arguments choice, word form, verb tenses, is provided in a relevant and and complementation. credible way. Ideas from Satisfactory source text used connections to support thesis. between and within paragraphs using transitions and other cohesive devices. 56 56 56 78 The essay uses a variety of sentence types.
Good range of vocabulary used with at most a few lapses in register. Some language from the source text may be present but is generally well incorporated into writers own words. 56
52
Language Language: accuracy Language: range and complexity
The treatment of Adequate but the assignment simplistic minimally organizational fullls the task plan. expectations; some aspects of the task may be slighted.
Is generally Somewhat comprehensible limited range of but contains some sentence types; errors that may avoid distract the complex reader; at most a structures. few errors interfere with comprehension. Somewhat Some relevant Introduction and The essay may and credible conclusion contain several limited range of present but may errors in word vocabulary. evidence for choice, word be brief. generalizations form, verb tenses, and supporting ideas/arguments and complementation. is provided. Connections May include Ideas from extensive between and source texts are language from included but within paragraphs may not be occasionally source text(s) missing. with an attempt to explicitly incorporate text acknowledged language with as such. own language. 34 34 34 34 The treatment of Organizational Contains many Uses a limited the assignment plan hard to errors; some number of only partially follow. errors may sentence types. fullls the task interfere with expectations and comprehension. the topic is not always addressed clearly. Evidence for Introduction and Includes many Vocabulary generalizations conclusion may errors in word limited. and supporting be missing or choices, forms, ideas/arguments inadequate. word forms, verb is insufcient tenses and and/or irrelevant. complementation.
53
Language Language: accuracy Language: range and complexity Extensive use of source text language with little integration with writers words.
May not include Connections ideas from between and source text, or within paragraphs may consist frequently primarily of missing. ideas from source text without integration with writers ideas. 12 12 12 Contains The treatment of No apparent the assignment organizational numerous errors that interfere with fails to fulll the plan. comprehension. task expectations and the paper lacks focus and development. Evidence for Introduction and Includes many conclusion errors in word generalizations and supporting missing or clearly choices, forms, word forms, verb ideas/arguments inappropriate. tenses and is insufcient complementation. and/or irrelevant. Few connections made between and within paragraphs.
12 Uses simple and repetitive vocabulary that may not be appropriate for academic writing. Does not vary sentence types sufciently.
May rely almost exclusively on source text language.
Appendix C. GSTEP band score interpretations Band 7 General interpretation Excellent user of English. Undergraduate No ESL. Graduate No ESL.
54
Band 6
General interpretation Very good user of English. May benet from advanced ESL courses in programs with intensive writing and/or speaking demands. Good user of English. May benet from ESL courses concurrent with regular courses. Fair user of English. Course program should consist primarily of ESL courses. Minimal user of English. Not ready for content courses in English. Very limited user of English. Not ready for content courses in English. Extremely limited user of English. Not ready for content courses in English.
Undergraduate No ESL.
Graduate One or more ESL courses recommended.
One or more IEP course recommended. Refer student to Intensive English Program. Refer student to Intensive English Program. Refer student to Intensive English Program. Refer student to Intensive English Program.
Two or more ESL courses recommended.
Refer student to Intensive English Program. Refer student to Intensive English Program. Refer student to Intensive English Program. Refer student to Intensive English Program.
References
Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford: Oxford University Press. Bernhardt, E. (1991). Reading development in a second language: Theoretical, empirical, and classroom perspectives. Norwood, NJ: Ablex Publishing Corp. Breland, H. (1996). Writing skills assessment: Problems and prospects. Princeton, NJ: Educational Testing Service. Brown, J. D. (1991). Do English and ESL faculties rate writing samples differently? TESOL Quarterly, 25, 587603. Burk, K., & Fitzpatrick, A. (1982). Regents testing program description and documentation. Georgia State University: Regents Testing Program. Byrd, P., & Nelson, G. (1995). NNS performance on writing prociency exams: Focus on students who failed. Journal of Second Language Writing, 4 (3), 273285. Carson, J. G., Chase, N. D., Gibson, S. U., & Hargrove, M. (1992). Literacy demands of the undergraduate curriculum. Reading Research and Instruction, 31 (4), 2550.
55
Charney, D. (1984). The validity of using holistic scoring to evaluate writing. Research in the Teaching of English, 18, 6581. Cumming, A., Kantor, R., Powers, D., Santos, T., & Taylor, C. (2000). TOEFL 2000 writing framework: A working paper. TOEFL Monograph Series, Report No. 18. Princeton, NJ: Educational Testing Service. Feak, C., & Dobson, B. (1996). Building on the impromptu: A source-based academic writing assessment. College ESL, 6 (1), 7384. Fulcher, G. (1997). Text difculty and accessibility: Reading formulae and expert judgment. System, 25 (4), 497513. Gantzer, J. (1996). Do reading tests match reading theory? College ESL, 6 (1), 2948. Hale, G., Taylor, C., Bridgeman, B., Carson, J., Kroll, B., & Kantor, R. (1996). A study of writing tasks assigned in academic degree programs. TOEFL Research Report No. 54. Princeton, NJ: Educational Testing Service. Hamp-Lyons, L. (1991). Scoring procedures for ESL contexts. In: L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 241276). Norwood, NJ: Ablex Publishing Corp. Hamp-Lyons, L., & Kroll, B. (1996). Issues in ESL writing assessment: An overview. College ESL, 6 (1), 5272. Horowitz, D. (1991). ESL writing assessments: Contradictions and resolutions. In: L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts. Norwood, NJ: Ablex. Huot, B. (1990). Reliability, validity, and holistic scoring: What we know and what we need to know. College Composition and Communication, 41 (2), 201213. Jennings, M., Fox, J., Graves, B., & Shohamy, E. (1999). The test takers choice: An investigation of the effect of topic on language test performance. Language Testing, 16 (4), 426456. Johns, A. (1991). Interpreting an English competency examination: The frustrations of an ESL student. Written Communication, 4, 4350. Leki, I., & Carson, J. (1997). Completely different worlds: EAP and the writing experiences of ESL students in university courses. TESOL Quarterly, 31, 3969. Lynch, B., & Davidson, F. (2001). Testcraft. New Haven, CT: Yale University Press. Mendelsohn, D., & Cumming, A. (1987). Professors ratings of language use and rhetorical organizations in ESL compositions. TESL Canada Journal, 5 (1), 926. Messick, S. (1989). Validity. In: R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13103). New York: MacMillan. Nelson, G., & Byrd, P. (1998). NNS student performance on a state-mandated reading prociency examination: Focus on students who fail. College ESL, 8 (2), 116. Ruetten, M. (1994). Evaluating ESL students performance on prociency exams. Journal of Second Language Writing, 3, 8596. Santos, T. (1988). Professors reactions to the academic writing of nonnative-speaking students. TESOL Quarterly, 22 (1), 6990. Smith, W., Hull, G., Land, R., Moore, M., Ball, C., Dunham, D., Hickey, L., & Ruzich, C. (1985). Some effects of varying the structure of the topic on college students writing. Written Communication, 2, 7389. Sweedler-Brown, C. O. (1985). The inuence of training and experience on holistic essay evaluation. English Journal, 74 (5), 4955. Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press. White, E. (1994). Teaching and assessing writing (2nd ed.). San Francisco: Jossey-Bass Publishers.

Weigle 2004

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Weigle 2004

Enviado por

Direitos autorais:

Formatos disponíveis

Assessing Writing 9 (2004) 2755

Tel.: +1-404-651-3223. E-mail address: sweigle@gsu.edu (S.C. Weigle).

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

Fig. 1. GSTEP short-answer scoring rubric: content.

Fig. 2. GSTEP short-answer scoring rubric: language.

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

Number failing Number passing Percent pass (1.00 = 100%)

S.C. Weigle / Assessing Writing 9 (2004) 2755

Fall 2001 Spring 2002

Summer 2003 1 Population 2 Population

N 24 24 40 26 30 29 28 24 25 27 34 23 28 362 1.00 = 100%.

Summer 2003 1 Population 2 Populationa

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

Regents GSTEP 11 2 5 4 n/a 1 10 9 7 8 6 3

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

S.C. Weigle / Assessing Writing 9 (2004) 2755

Uses a wide range of academic vocabulary.

S.C. Weigle / Assessing Writing 9 (2004) 2755

Rhetoric Content Organization

Language Language: accuracy Language: range and complexity

S.C. Weigle / Assessing Writing 9 (2004) 2755

Rhetoric Content Organization

Language Language: accuracy Language: range and complexity

S.C. Weigle / Assessing Writing 9 (2004) 2755

Rhetoric Content Organization

May rely almost exclusively on source text language.

S.C. Weigle / Assessing Writing 9 (2004) 2755

Graduate One or more ESL courses recommended.

Two or more ESL courses recommended.

S.C. Weigle / Assessing Writing 9 (2004) 2755

Você também pode gostar