An Objective Measure For Evaluating Efl Compo

An objective measure for evaluating EFL compositions Farhady & Farzanehnejad
An Objective Measure For Evaluating EFL Compositions*

Hossein Farhady
Iran University of Science and Technology
Ahmad Reza Farzanehnejad
Tehran University
Abstract
Since 1933, many scholars have devised various measures for the
objective evaluation of the compositions written in first or second
languages. Among these measures, the ones proposed by Hunt (1965) are
of classic value. However, neither Hunt nor others have considered writing
as a process which involves the logical chaining of one sentence to the
other within the context of a paragraph.
Since mature writers produce more organized pieces of writing, i.e.,

coherent paragraphs and compositions, and since cohesion is mainly
achieved through the use of transitional words and expressions, in the
present study, an attempt was made to devise an objective measure for
evaluating the writing ability when it is considered a thinking process. This
new measure is called the Measure of Cohesion. The results of the study
indicated that such a measure can best be computed in the following way:
Number of Cohesive Devices

Measure of Cohesion (MC) = —————————————
Number of Words
Moreover, the findings revealed that the proposed Measure of Cohesion

was superior to the previously-devised indexes in terms of validity.
Introduction
Writing is one of the most difficult skills to test because of its complex nature. Many
scholars including Harris (1969), Farhady (1980), McDonough (1985), Sako (1972),
and Wilkinson (1989), assert that there are many elements to be considered in
measuring the writing ability. These elements include form, content, grammar,
vocabulary, mechanics (including spelling and punctuation), handwriting, accuracy,
style, diction, relevance, originality, elaboration, layout, coherence, cohesion, unity,
organization, and logic.
Meanwhile, the main point of argument regarding testing the writing ability has been,
and still is, the ways through which this skill can be evaluated. Some scholars
advocate compositions and essay tasks (Oller, 1979; Oller & Perkins, 1978; Carroll,
1980; Jacobs, et al. 1981; Carroll & Hall, 1985; Heaton, 1988); others, including
Dunlop (1969), support objective tests of writing; and still others believe that a
combination of the two would be the ideal practice (Godshalk, et al. 1966; Harris,
1969; Ackerman & Smith, 1988).
298
Regarding composition-type tests, three major methods of scoring have been

introduced, employed, and advocated: holistic marking (Cooper, 1977; Rivers &
Temperley, 1978; Raimes, 1983; Carroll & Hall, 1985; Kammeen, 1989), analytic
marking (Harris, 1969; Madsen, 1983; Heaton, 1988), and frequency-count marking
(Jacobs, et al. 1981; Brown & Bailey, 1984; Hendrickson, 1984; Wilkinson, 1989).
Among these three measures of evaluating compositions, the frequency-count

marking is the most objective one. In fact, it is exactly what researchers have long
been investigating. That is, they have been trying to find a way to assess the writing
ability as objectively as possible.
In this respect, Labrant (1933) can be considered the first who developed objective
measures for the evaluation of writing tasks. She studied sentence length, clause
length, and subordinate ratio (the ratio of subordinate clauses to all clauses both
subordinate and main) and concluded that subordination ratio was an appropriate
index of maturity because it increased significantly with maturity in school children’s
writing. In other words, she found that of the total clauses written, older children
wrote more and more subordinate clauses.
Following Labrant, in a study conducted at three different levels, Hunt (1965)

proposed five ratio-based measures. The measures are as follows:
1. Clause length = Mean words per clause (WPC)

2. T-unit length = Mean words per T-unit (WPT)
3. Subordination ratio = Mean clauses per T-unit (CPT)
4. Sentence length = Mean words per sentence (WPS), and
5. Main clause coordination = Mean T-units per sentence (TPS)
At this point, the terms clause and T-unit, as employed by Hunt, need to be clarified.
According to Hunt (1970), “A clause is any expression containing a subject or
coordinated subjects and a finite predicate or coordinated predicates” (p.4).
A T-unit, however, is defined as “... one main clause plus any subordinate clause or
non-clausal structure that is attached to or embedded in it” (ibid, p. 4). Therefore, any
simple or complex sentence would be a T-unit; however, any compound or
compound-complex sentence would consist of two or more T-units.
Based on the findings of his research, Hunt concluded that the T-unit length was the
best measure among the indexes he proposed. Further, the results of the same study
revealed that the second best index was the clause length, the third best was the
subordination ratio, and the last two were the sentence length and the main clause
coordination indexes, respectively.
Although the above-mentioned findings are valuable, they have not considered
writing as a thinking process. Therefore, an alternative measure which would assess
writing ability more accurately is needed, especially when writing is considered as a
process which involves the logical chaining of one sentence to another within a
context. This implies that a superior writer has a good command of putting sentences
and paragraphs together in a coherent way using cohesive devices, particularly
transitional words and expressions. Hence, the main concern of the present study was
299
to construct a measure for assessing the writing ability as a logical chaining. The
measure proposed in this study is called the Measure of Cohesion (MC hereafter).
Different possible indexes of MC may be computed in the following ways:

MC1 = ————————————
Number of Words

MC2 = ————————————
Number of Clauses

MC3 = ————————————
Number of T-units

MC4 = ————————————
Number of Error-Free T-units
It should be noted that the term error-free T-unit in MC4 refers to a T-unit which is
errorless (Scott & Tucker, 1974; Larsen Freeman & Storm, 1977). Further, in all these
measures, only those cohesive devices which are used correctly will be considered.
Method
This study was designed to answer the following questions:
1. Is there any significant difference between the Measure of Cohesion index for the
objective evaluation of EFL writing tasks and the indexes proposed by Hunt in terms
of validity and practicality?
2. What is the most appropriate technique to compute such a measure?
1. Subjects
A total number of 90 Iranian university students were randomly selected from Tehran,
Shahid Beheshti, Allame Tabatabaee, Teacher Education, Tarbiat Modarres, and Azad
universities. The subjects were male and female juniors and seniors majoring in
English.
2. Instrumentation
The subjects were supposed to write a composition of at least 200 words on the given
topics within 50 minutes.
For validation purposes, the subjects were also required to perform on the Michigan
Test of English Language Proficiency (henceforth MELAB) within 60 minutes. The
time allocations were pre-tested to be sufficient for most students to produce an
adequate sample of their writing ability and to perform on the MELAB.
300
3. Procedures
Under strict testing conditions, the subjects wrote the compositions and performed on
the MELAB. To ensure anonymity, they were asked not to write their names, but to
write a code both on the composition papers and on the answer sheets in order to (a)
give them assurance that their performance would not interfere with their formal
academic records, and (b) match the pairs of scores after scoring the test papers.
Regarding the subjects’ performance on the MELAB, the evaluation was quite
objective because any item of the test had only one correct response. However, for the
evaluation of compositions, two different methods were utilized: an objective method
and a holistic one.
As for the objective evaluation, the compositions were first segmented into clauses, T-
units, and error-free T-units. Then, the number of words, clauses, T-units, error-free
T-units, and cohesive devices were counted. These counts were then used in
computing the measure of cohesion proposed in this study. It should be noted that
prior to the counting of the cohesive devices in each composition, a fairy exhaustive
list of transitional words and expressions was prepared. As for the holistic evaluation,
however, the compositions were rated by three experienced EFL instructors. The
raters attended the training sessions where examples of various types of errors
committed by the subjects were carefully examined. These sessions were held to
ensure consistent grading among the raters.
4. Analysis
In order to answer the research questions, the following analyses were carried out:
1. A correlational analysis was utilized to compute the inter-rater reliability of

composition ratings and also to estimate the degree of relationship between the
holistic evaluation (i.e., the subjects’ composition scores given by the judges) and the
objective evaluation (i.e., the subjects’ performance on the MELAB). Further, the
newly-computed measures, i.e., cohesive measures, were also validated against the
criterion measure.
2. A two-way analysis of variance (ANOVA) was conducted to investigate the effect

of the subjects’ sex and university on the study measures.
3. A factor analysis was conducted to investigate the underlying structures of the

measures used.
Results
The results of the analyses mentioned above are presented below.
1. The first analysis was conducted to compute the inter-rater reliability of the
composition scores assigned by the three raters and to estimate the extent to which the
holistic and the objective evaluations were in agreement.
301
The reliability of the holistic measurement, using the average correlations among the
three raters, was calculated to be .77. The correlation coefficients between the scores
given by the raters and the subjects’ total performance on the MELAB are presented
in Table 1.
The correlation matrix reveals that the average agreement between the judges, on the
one hand, and the total performance on the standard measure, on the other hand, is
.73. This moderate correlation is an indication of the relationship between writing
ability and language proficiency even though the MELAB did not have a writing
component. If it did, a higher correlation coefficient might have been obtained.
Table 1. The Correlation Matrix for the Scores Given by the Three Raters and the MELAB
Rater 1 Rater 2 Rater 3 MELAB
Rater 1 * .74 .79 .77

Rater 2 * .79 .58
Rater 3 * .73
MELAB *
2. Any good measure should have three characteristics, i.e., validity, reliability, and
practicality. Regarding reliability, all the measures employed in this study are equally
acceptable because all of them are count measures, and the reliability of any count
measure is almost perfect.
Concerning practicality, also, these measures are equally practical because the number
of computations used to calculate any one of them is the same: counting two elements
of the compositions and dividing one by the other. Since there is no difference among
these measures as far as reliability and practicality are concerned, it is logical to
conclude that the best measure will be the one which enjoys the highest degree of
validity, i.e., the highest correlation with the criterion measure. Hence, the newly-
computed measures, as well as the previously-devised ones, were correlated with the
MELAB. These measures were the clause length (CL), the T-unit length (TL), the
subordination ratio (SR), as well as MC1, MC2, MC3, and MC4. The correlation
coefficients are given in Table 2.
As the table shows, the measures proposed in this study, except MC4, enjoy higher
correlations with the standard measure than the previously-devised ones do.
Furthermore, the first Measure of Cohesion is, in some cases, as good as, and in most
cases, superior to other indexes, depending on the way through which it is computed.
The correlation matrix also reveals that MC1 shows the highest correlation with the
standard measure, and therefore, enjoys the highest amount of empirical validity. This
finding is the answer to the second research question. That is, the best way to compute
the Measure of Cohesion is to count the number of cohesive devices and divide it
by the number of words.
302
Table 2. The Correlation Matrix for the Study Measures
MELAB CL TL SR MC1 MC2 MC3 MC4
MELAB * .01 .03 .12 .33 .28 .31 .11

CL * .86 .10 .02 .44 .44 .49
TL * .59 .07 .30 .47 .49
SR * .06 .04 .27 .19
MC1 * .87 .82 .45
MC2 * .94 .59
MC3 * .63
MC4 *
3. The next analysis was a two-way ANOVA to investigate the relationship between
the subjects’ language proficiency, their sex, and their university. The results of the
analysis did not reveal any significant difference between the groups at .05 level of
significance. In other words, there was no relationship between the subjects’ sex,
university, and their performance on the MELAB.
4. The final analysis was a factor analysis in order to investigate the underlying
structures of the measures used in this study. The results of the factor analysis,
presented in Table 3, indicate that the variables which loaded on factor 1 were
grammar, vocabulary, reading comprehension, total scores on MELAB, and the scores
assigned by the three raters through holistic evaluation. This factor may be called the
general language proficiency factor, because all the variables loading on this factor
are measures of language proficiency.
There are heavy loadings on factor 2 from words, clauses, T-units, error-free T-units,
and cohesive devices. This factor may be called the count measure factor.
Moreover, factor 3 is loaded by MC1, MC2, MC3, and MC4. These abbreviations
stand for the different ways through which the measures of cohesion were computed.
These measures were devised under the assumption that the more mature the writers,
the more cohesive devices they would utilize to express themselves in an organized
way. Therefore, factor 3 may be called the logical organization factor.
There are heavy loadings on factor 4 from clause length and T-unit length. This factor
may be labeled as the syntactic maturity factor (Hunt, 1965). And finally, on factor
5, only subordination ratio has loaded. Since this is the only variable loading on this
factor, it may be called the subordination ratio factor.
303
Table 3. Varimax Rotated Factor Loadings of the Study Measures
F1 F2 F3 F4 F5
Grammar .791
Vocabulary .808
Reading Comp. .751
MELAB .931
Words .942
Clauses .958
T-units .943
Error-Free T-units .736
Cohesive Devices .728
Rater 1 .859
Rater 2 .709
Rater 3 .835
Clause Length .941
T-unit Length .853
Subordination Ratio .928
MC1 .931
MC2 .925
MC3 .895
MC4 .619
Note: Loadings less than .40 are not presented in the table.
Conclusions and Implications
In this study, a novel objective measure for evaluating EFL compositions was
presented in the belief that the previously-devised indexes might not be as practical
for assessing EFL students’ writing ability as they are for scoring the writing ability of
native speakers or ESL students. Moreover, the measures proposed before did not
consider writing as a task which involves the ability to establish a logical relationship
among the sentences and paragraphs.
The findings of this study suggest that the proposed Measure of Cohesion (MC) is
superior to Hunt’s measures in terms of validity. This, of course, does not mean that
Hunt’s measures are not applicable to EFL situations; however, there is no research-
based evidence to support such a point.
Further, the results indicate that the subjective evaluation of compositions enjoys
higher correlations with the standard measures of language proficiency than the
objective measures do. However, the practicality of objective measures may justify
their use.
The findings of this study have certain implications for TEFL practitioners. First,
material developers may incorporate activities which would help students to enhance
their ability to utilize cohesive devices, especially transitional words and expressions
304
in order to produce coherent and organized pieces of writing. Second, EFL teachers
can help their students to master the appropriate use of such devices to improve their
writing ability. And finally, test constructors should develop items which assess the
testees’ ability in utilizing such devices as an indication of their language mastery.
* This is the revised version of the paper printed in the Iranian Journal of Applied Linguistics (1996),
1(1).
305
Bibliography
Ackerman, T.A. & P.L. Smith (1988). A comparison of the information provided by
essay, multiple-choice, and free-response writing tests. Applied Psychological
Measurement, 12 (2), 117-128.
Brown, J.D. & K.M. Bailey (1984). A categorical instrument for scoring second
language writing skills. Language Learning, 34 (4), 21-43.
Carroll, B.J. (1980). Testing communicative performance. Oxford: Perganon Press

Ltd.
Carroll, B.J. & P.J. Hall (1985). Make your own language tests: A practical guide to
writing language performance tests. Oxford: Perganon Press Ltd.
Cooper, C.R. (1977). Holistic evaluation of writing. In C.R. Cooper & L. Odell (eds.),
Evaluating writing: Describing, measuring, judging (pp.3-31). Urbana, III:
National Council of Teachers of English.
Davies, A. (1990). Principles of language testing. Oxford: Basil Blackwell Ltd.
Dunlap, I. (1969). Tests of writing ability in English as a foreign language. ELT

Journal. 24 (1), 54-59.
Farhady, H. (1980). Justification, development, and validation of functional language

testing, Unpublished PhD dissertation, University of California, Los Angeles.
Godshalk, F.I., Swineford, F., & Coffiman, W.E. (1966). The measurement of writing
ability. ETS Research Monograph No. 6. Princeton: College Entrance
Examination Board
Harris, D.P. (1969). Testing English as a second language. New York: McGraw-Hill.
Heaton, J.B. (1988). Classroom testing. London: Longman Group Ltd.
Hendrickson, J.M. (1984). The treatment of error in written work In S. Mckay (ed.),
Composing in a second language (pp.145-159). Rowley, Mass.: Newbury
House Publishers, Inc.
Henning, G. (1987). A guide to language testing: Development, evaluation, research.

Cambridge, Mass.: Newbury House Publishers, Inc.
Hunt, K.W. (1965). Grammatical structures written at three grade levels. Urbana,
ILL: National Council of Teachers of English, Research report No. 3.
_____ (1970). Syntactic maturity in schoolchildren and adults. Monographs of the

society for research in child development, 35 (1), (Serial No. 134).
306
Jacobs, H.L., Zinkgraf, S.A., Wormuth, D.R., Hartfiel, V.F., & Hughey, J.B. (1981).
Testing ESL composition: A practical approach. Rowley, Mass.: Newbury
House Publishers, Inc.
Kammeen, P.T. (1989). Syntactic skill and ESL writing quality. In A. Freedman, I.
Pringle, & J. Yalden (eds.), Learning to write: First language/second language
(pp.162-170). New York: Longman, Inc.
Labrant, L. (1933). A study of certain language developments of children in grades 4-

12 inclusive. Genetic Psychology Monographs, 14, 387-491.
Larsen-Freeman, D. & V. Storm (1977). The construction of a second language

acquisition index of development. Language Learning, 27 (1), 123-134.
Madsen, H.S. (1983). Techniques in testing. Oxford: OUP.
McDonough, S. (1985). Academic writing practice. ELT Journal, 39 (4), 244-247.
Oller, J.W. Jr. (1979). Language tests at school. London: Longman Group Ltd.
Oller, J.W. Jr. & K. Perkins (1978). Language in education: Testing the tests. Rowley,
Mass.: Newbury House Publishers, Inc.
Raimes, A. (1983). Techniques in teaching writing. Oxford: OUP.
Rivers, W.M. & M.S. Temperley (1978). A practical guide to the teaching of English
as a second or foreign language. New York: Oxford University Press.
Sako, S. (1972). Writing proficiency and achievement tests. In K. Croft (ed.), Reading
on English as a second language (pp. 310-324). Mass.: Winthrop Publishers,
Inc.
Scott, M. & G. Tucker (1974). Error analysis and English-language strategies of Arab
students. Language Learning, 24 (1), 69-97.
Weir, C.J. (1990). Communicative language testing. New York: Prentice-Hall

International Ltd.
Wilkinson, A. (1989). Assessing language development: The creditor project. In A.

Freedman, I. Pringle, & Yalden (eds.), Learning to write: First language/second
language (pp.67-7). New York: Longman, Inc.
307

An Objective Measure For Evaluating Efl Compo

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

An Objective Measure For Evaluating Efl Compo

Enviado por

Direitos autorais:

Formatos disponíveis

An objective measure for evaluating EFL compositions Farhady & Farzanehnejad

An Objective Measure For Evaluating EFL Compositions*

Since mature writers produce more organized pieces of writing, i.e.,

Number of Cohesive Devices

Moreover, the findings revealed that the proposed Measure of Cohesion

Regarding composition-type tests, three major methods of scoring have been

Among these three measures of evaluating compositions, the frequency-count

Following Labrant, in a study conducted at three different levels, Hunt (1965)

1. Clause length = Mean words per clause (WPC)

Number of Cohesive Devices

Number of Cohesive Devices

Number of Cohesive Devices

Number of Cohesive Devices

2. What is the most appropriate technique to compute such a measure?

1. A correlational analysis was utilized to compute the inter-rater reliability of

2. A two-way analysis of variance (ANOVA) was conducted to investigate the effect

3. A factor analysis was conducted to investigate the underlying structures of the

Rater 1 Rater 2 Rater 3 MELAB

Rater 1 * .74 .79 .77

Table 2. The Correlation Matrix for the Study Measures

MELAB CL TL SR MC1 MC2 MC3 MC4

MELAB * .01 .03 .12 .33 .28 .31 .11

Table 3. Varimax Rotated Factor Loadings of the Study Measures

Conclusions and Implications

Carroll, B.J. (1980). Testing communicative performance. Oxford: Perganon Press

Davies, A. (1990). Principles of language testing. Oxford: Basil Blackwell Ltd.

Dunlap, I. (1969). Tests of writing ability in English as a foreign language. ELT

Farhady, H. (1980). Justification, development, and validation of functional language

Heaton, J.B. (1988). Classroom testing. London: Longman Group Ltd.

Henning, G. (1987). A guide to language testing: Development, evaluation, research.

_____ (1970). Syntactic maturity in schoolchildren and adults. Monographs of the

Labrant, L. (1933). A study of certain language developments of children in grades 4-

Larsen-Freeman, D. & V. Storm (1977). The construction of a second language

Madsen, H.S. (1983). Techniques in testing. Oxford: OUP.

McDonough, S. (1985). Academic writing practice. ELT Journal, 39 (4), 244-247.

Raimes, A. (1983). Techniques in teaching writing. Oxford: OUP.

Weir, C.J. (1990). Communicative language testing. New York: Prentice-Hall

Wilkinson, A. (1989). Assessing language development: The creditor project. In A.

Você também pode gostar