Você está na página 1de 12

Psychology in the Schools, Vol.

36(4), 1999
© 1999 John Wiley & Sons, Inc. CCC 0033-3085/99/040359-12

CHILDREN’S WRITING ABILITY: THE IMPACT


OF THE PICTORIAL STIMULUS
jason c. cole and jennifer s. mcleod
California School of Professional Psychology, San Diego

A previous study by Cole, Muenz, Ouchi, Kaufman, and Kaufman (1997b) demonstrated that the
type of stimulus used to elicit a written response had a marked impact for items measuring organi-
zation, unity, coherence, and so forth. However, the aforementioned study had a notable limitation:
the sample was comprised only of older adolescents and adults. The current study sought to repli-
cate the Cole et al. study with a more appropriate sample. The current sample was comprised of 29
randomly selected middle school students; the mean grade was 6.76 (SD ⫽ 0.74) and the mean age
was 12.35 years (SD ⫽ 0.90). Participants were asked to write two stories, one that was based on a
line drawing from the Peabody Individual Achievement Test–Revised (PIAT-R) Written Expression
Subtest, and one based on criteria by Hooper et al. (1994). Stories were scored on ten items that
measured writing mechanics, and ten items measured thematic, organizational items. As predicted
by the Cole et al. results, items measuring written mechanics did not show a difference between the
two prompt types, whereas items measuring thematic, organizational writing were significantly
higher for stories written to the Hooper et al. style prompt. © 1999 John Wiley & Sons, Inc.

Written expression assessment has long been plagued by a lack of theoretical development.
Hooper et al. (1994) noted that the measurement of written expression has numerous practical and
theoretical problems for past authors of written expression assessment batteries. The lack of a theo-
retical basis for written expression assessment, and the subsequent problems noted by Hooper et al.,
are devastating given Chapman’s (1990) assertion that a student’s ability to write is crucially involved
in his or her academic, vocational, social, and personal life.
Hooper et al. (1994) provided an extensive review and critique of the literature in written ex-
pression assessment. These authors also provided their own suggestions regarding the best way to
assess written expression. One of their suggestions was to carefully develop the stimulus used to elic-
it the written response. This is also reflected by the Diagnostic and Statistical Manual of Mental Dis-
orders–Fourth Edition (DSM-IV; American Psychiatric Association, 1994), as it asserts that the
proper diagnosis of the disorder of written expression should be conducted with the most appropri-
ate stimuli for eliciting writing samples. Hooper et al. recommended using a color photograph rather
than a line drawing, and recommended that the photograph contain (p. 388): (a) at least two charac-
ters, with perhaps one potential protagonist and potential antagonist, although this is not absolutely
necessary, depending on the situation depicted. (b) a depiction of some kind of “interesting” or “nov-
el” scene or event (e.g., children lost in a cave). (c) some kind of potential conflict between the an-
tagonist and protagonist that necessitates that the protagonist engage in a goal-based sequence of
events to resolve the conflict in which a main character must resolve a problem using some goal-
directed sequence of events.
According to Hooper et al., the aforementioned criterion should elicit a response that is more
cohesive and goal-directed than responses to stimuli without such criterion.
Cole, Muenz, Ouchi, Kaufman, and Kaufman (1997b) tested Hooper et al.’s (1994) aforemen-
tioned stimulus hypotheses. Initially, Cole et al. developed a color photographic stimulus that they
believed met all of Hooper et al.’s criteria. This stimulus, called the “cliff” prompt, depicts two men
on a set of rocky cliffs on the beach. One man, in need of help, is reaching to the other while two on-
lookers observe from below. After developing the cliff prompt, Cole et al. scored written responses

The authors sincerely thank the world’s best mentors, Alan and Nadeen Kaufman, as well as the administration at Yor-
ba Linda Middle School, American Guidance Service, and Tracy A. Muenz. Special thanks to Tom R. Smith, Nusheen Cole,
and James and Sherry McLeod.
Correspondence to: Jason C. Cole, 16340 Compton Palms Drive, Tampa, FL 33647. E-mail: JCole@parinc.com.

359
360 Cole and McLeod

from 50 participants. Each participant wrote a total of two stories: one story used the cliff prompt as
a stimulus, whereas the other story used a line-drawing from the Peabody Individual Achievement
Test–Revised (PIAT-R; Markwardt, 1989). Each story was scored using 10 items that assessed writ-
ten mechanics (capitalization, punctuation, and so forth) and 10 items that assessed structure (unity
of ideas, fluidity of story, and so forth). Cole et al. believed that mechanics items would not exhibit
a difference between the photographic and line drawing stimuli, and that structural items would ex-
hibit a difference between the prompts.
The results of the Cole et al. study (1997b) were as predicted by its authors. Yet, despite Hoo-
per et al.’s (1994) validated stimulus criteria, authors of written expression measures have failed to
implement their criteria. Notably, an extensive review of measures of written expression by Cole,
Haley, and Muenz (1997a) found that none of the 12 measures reviewed by these authors met all of
Hooper et al.’s stimulus criteria. Although the second and third revisions of the Test of Written Lan-
guage (TOWL-2; Hammill & Larsen, 1988; TOWL-3; Hammill & Larsen, 1996, respectively) met
all of Hooper et al.’s content criteria, these tests did not contain photographic stimuli. On the other
hand, two tests of written expression reviewed by Cole, Haley, and Muenz contained photographic
stimuli. Grill and Kirwin’s (1992) Written Language Assessment and Johnson and Hubly’s (1979)
Written Expression Test each contained photographic stimuli. However, Johnson and Hubly’s test
met only one other of Hooper et al.’s stimulus criteria, and the measure by Grill and Kirwin did not
meet any other of Hooper et al.’s criteria. Summarily, Cole, Haley, and Muenz found a strong need
for better developed stimuli for written expression assessment batteries.
Although the findings from Cole et al. (1997b) were important, an important exception existed
in the methodology: the participant pool had an average educational level of 16.14 years, and the
mean age was 26.48. These participants provided limited generalizability to participants typically
tested for a disorder of written expression. The goal of this study, therefore, was to confirm the find-
ings of Cole et al.’s (1997b) study on a younger sample, to expand the generalizability from older
participants to the more commonly tested younger participants. The findings from the Cole et al.
(1997b) study, combined with the review by Cole, Haley, and Muenz (1997a), suggest that written
expression assessment is not currently able to optimize a participant’s written response. A confirma-
tion of Cole et al.’s (1997b) findings for the intended age range of most written expression measures
would suggest that written expression does not optimally assess a child’s writing ability.
The authors of this study hypothesized that items measuring thematic content (organization,
story fluidity, and so forth) would be significantly higher for stories written for the cliff prompt than
stories written for a line drawing. On the other hand, the current authors hypothesized that items mea-
suring mechanical content (punctuation, capitalization, and so forth) would not differ amongst the
prompts. Similar to the findings from the Cole et al. (1997b) study, the current authors hypothesized
that the magnitude of effect for the difference in thematic items between the prompts would yield a
large coefficient.

Method

Participants
A total of 29 participants who all attended a middle school in Orange County, California, com-
prised the current sample. Participants were randomly selected from a pool of more than 200 stu-
dents who returned parental consent forms. Total subject acquisition was limited by time constraints
of the academic year. The participants ranged from 11 to 14 years, with a mean of 12.35 and a SD of
0.90. The grade range was from 6 to 8, with a mean of 6.76 and a SD of 0.74. The sample included
22 females and 7 males; the composition included 79.31% Caucasians, 10.35% were of mixed de-
scent, 6.90% Hispanics, and 3.45% Asian-Americans.
Children’s Writing Ability 361

All but six of the participants were in general education. Two participants were in classes for
gifted students while four of the participants were students with learning disabilities. The composi-
tion percentage of these aforementioned students was similar to the population estimates for the re-
spective categories. According to recent educational census data for public schools (U.S. Department
of Education–National Center for Educational Statistics, 1996, p. 66), approximately 5% of the Cal-
ifornia school age population is gifted and talented. This study had a slightly higher percentage of
participants in gifted classes: 6.90%. According to the DSM-IV (American Psychiatric Association,
1994, p. 47), upwards of 10% of the population have a learning disability. Similarly, data from the
U.S. Department of Education (U.S. Department of Education–National Center for Educational Sta-
tistics, 1996, p. 65) indicate that 9.1% of the U.S. school age population is learning impaired (this is
a combination of speech impaired, learning disabilities, and mentally retarded students). This study
contained a slightly higher percentage of learning disabilities participants: 13.79%.

Procedure
Participants used in this study were part of a national norming project by American Guidance
Service in the Spring of 1996. As such, the participants were individually administered a battery of
psychoeducational tests. Included in the standard battery of tests was the standardized administra-
tion of the PIAT-R Written Expression subtest Level II (Markwardt, 1989), the Kaufman Test of Ed-
ucational Achievement–Brief Form (Kaufman & Kaufman, 1985a), the Oral and Written Language
Scales (Carrow-Woolfolk, 1996), and the Visual–Auditory Learning subtest from the Woodcock
Reading Mastery Test–Revised (Woodcock, 1987). The line drawing from the PIAT-R was random-
ly determined to be presented as the first or penultimate test in the battery. The cliff prompt was al-
ways administered last so as to avoid contamination in the standardized battery. A battery was com-
pleted in one sitting, allowing the child to take a small break if needed.
The PIAT-R subtest has two different stimuli that are considered to be alternate forms (Mark-
wardt, 1989). According to Cole, Haley, and Muenz (1997a), the only stimulus criterion proposed
by Hooper et al. (1994) met by either of these prompts was that they each contained two or more
characters. A total of 10 participants were administered the “box” prompt from the PIAT-R, where-
as 19 participants were administered the “money in the street” prompt. This division between the
two prompts was a predetermined constraint from the test publisher conducting the national norm-
ing project. However, it is noted in the PIAT-R manual that the PIAT-R written expression stimuli
should produce equivalent responses from children. The “cliff” prompt, used from the Cole et al.
(1997b) study, was administered at the end of the standard normative battery. Administration proce-
dures from the PIAT-R Written Expression Subtest Level II were used for the administration of the
cliff prompt. Although a mixed sequence of administration for the prompts would have been desir-
able, the authors felt that maintaining the cohesiveness of the normative study was essential. More-
over, similar studies by Cole et al. (1997b) and Ouchi, Cole, Muenz, Kaufman, and Kaufman (1996)
found that order of stimulus presentation did not have a significant effect. These studies analyzed the
writing of participants that was based on the same stimuli used in the current study.
Twenty items from the PIAT-R and Wechsler Individual Achievement Test (WIAT; Psycholog-
ical Corporation, 1992) written expression scoring systems were utilized to assess writing ability.
Ten of the items—labeled thematic items—assessed unity, organization, and development of ideas,
while the other ten items—labeled mechanics items—assessed grammar, punctuation, legibility of
handwriting, etc. Categorization of the aforementioned item typologies was completed by a panel of
achievement-testing experts. At this time, validity of the categories beyond that of content validity
has not been explored. Some items in the aforementioned scoring system were modified to enhance
clarity by Cole et al. (1997b), and these modifications were retained in the current study. They were
made during the rater training process, such as suggested by Hartmann & Wood (1982 噛130). After
362 Cole and McLeod

initial training, the raters discussed unresolved scoring ambiguities and created enhancements to as-
sist in clarity; similar to what is recommended by Gelfand & Hartmann (1975 噛131). A summary of
item descriptions is presented in Table 1. Stories were independently evaluated by two raters (the
test administrators) who have scored over 100 written responses on both the PIAT-R and WIAT scor-
ing systems.
Consideration regarding labeling items as either thematic or mechanical was conducted on an
item by item basis. Some of the items may appear suitable for either category, such as item 16 from
the PIAT-R. Yet the criteria for rating a participant on this item is based upon the physical separation
of paragraphs, more than the quality of content within a paragraph. Item 13 from the PIAT-R and
item 2 from the WIAT are similar in their generic description. However, these items are very differ-
ent on their level of writing assessed. As noted by Muenz, Ouchi, and Cole (1999), these items ob-
tained markedly different psychometric properties. The criteria from the PIAT-R evaluate a student’s
ability to avoid tangents, whereas the WIAT item is more critical, analyzing variations in develop-
ment of story fluidity, cohesiveness, and development. Finally, it should be noted that spelling was
not contained within the mechanics list of items. Neither of the PIAT-R or the WIAT scoring systems
contain items assessing a student’s spelling ability. In fact, most measures of achievement evaluate
spelling independently of other academic tests, including the Kaufman Test of Educational Achieve-

Table 1
Thematic and Mechanics Scoring Criteria
Thematic items Mechanics items

1. Sentences are related to each other in content 1. Letters and words are legible (PIAT-R item 1).
(PIAT-R item 13).
2. The composition has and identifiable ending 2. Run-on sentences are avoided (PIAT-R item 6).
or conclusion (PIAT-R item 18).
3. The composition tells a story and does not just 3. Pronouns are used correctly (PIAT-R item 7).
describe the picture prompt (PIAT-R item 19).
4. The central characters or objects in the picture 4. Sentences begin with capital letters (PIAT-R item 8).
prompt are given a more prominent role in the
composition than the rest of the scene is given
(PIAT-R item 20).
5. The composition involves some interaction 5. Sentences end with correct punctuation marks (PIAT-R item 9).
among the characters (PIAT-R item 21).
6. The composition refers to events that happened 6. The composition is properly placed on the page, with some use
prior to the scene depicted in the picture promp of margins and spacing between words (PIAT-R item 11).
(PIAT-R item 22).
7. The composition refers to events that happened 7. The general meeting of sentences is understandable (PIAT-R
after the scene depicted in the picture prompt item 14).
(PIAT-R item 23).
8. Thoroughness of ideas and story development 8. Paragraphs are identifiable and each paragraph conveys a
(WIAT item 1). single thought (PIAT-R item 16).
9. Organization, unity, and coherence 9. Accuracy of grammar and word usage (WIAT item 5).
(WIAT item 2).
10. Holistic assessment including development, 10. Accuracy and variability of capitalization and punctuation
cohesion, mechanics, and fluidity (WIAT item 7). (WIAT item 6).
Note. From “The Impact of the Pictorial Stimulus on the Written Expression Output of Adolescents and Adults,” by
J. C. Cole, T. A. Muenz, B. Y. Ouchi, N. L. Kaufman, and A. S. Kaufman, 1997, Psychology in the Schools, 34, p. 3. Copy-
right 1997 by John Wiley & Sons, Inc. Reprinted with permission.
Children’s Writing Ability 363

ment (Kaufman & Kaufman, 1985b) and the Woodcock-Johnson Psycho-Educational Battery–
Revised (Woodcock & Johnson, 1989).

Data Analysis
Two repeated measures of ANOVAs were conducted with type of stimulus (line drawing or pho-
tograph) as the repeated independent variable. One ANOVA used the raw sum score (averaged be-
tween raters) for mechanics items as the dependent variable and the second ANOVA used the raw
sum score (averaged between raters) for thematic items as the dependent variable. ANOVA as-
sumptions of normality and homogeneity of variance were also assessed. Sphericity was not cor-
rected as it is not applicable when there are only two levels of a repeated variable (Keppel, 1991, p.
351). Independence, the remaining ANOVA assumption (Keppel, 1991), was controlled by random
selection. A total of two repeated measures analyses were conducted and, therefore, a Bonferroni cor-
rection was applied to the standard alpha level of .05: the alpha level used for the aforementioned
ANOVAs was 0.025.
Socioeconomic status, gender, and ethnicity were each assessed for their impact on the raw sum
score of thematic and mechanics items. Three sets of 2 (demographic variable) by 2 (stimulus) mixed
design ANOVAs were conducted to assess the effect of specific demographic variables; stimulus was,
again, the repeated measure. Socioeconomic status was determined by a child’s parents’ education-
al level (if two different educational levels were provided, the highest parents’ educational level was
used). This group was split into two categories: those with less than two years of college, and those
with at least two years of college. Ethnicity was also divided into two groups: Caucasian and non-
Caucasian (including those of mixed descent). Gender differences were also assessed in a dichoto-
mous manner. All analyses were evaluated at an alpha level of 0.01. Whereas this alpha level is less
stringent than a typical Bonferroni correction would recommend, the authors felt that a Type I error
would be more beneficial than a Type II error in this instance, given the power restraints from the
sample size. Although it would have been beneficial to assess all demographic variables in a single
multi-factorial ANOVA, not enough participants were assessed to provide adequate power. Yet, pre-
vious research has shown that demographic variables, such as the aforementioned, do not influence
the type of writing assessed in this study (e.g., Cole et al., 1997b).
The authors of this study believed that assessing the written expression interrater reliability was
pertinent for an unambiguous understanding of the results. Previous studies in written expression as-
sessment have shown what Anastasi (1988) would consider to be poor reliability coefficients. Specif-
ically, studies have typically had a range of average reliability coefficients between 0.60 and 0.65
(Benes, 1992; Benton, 1992; Grill & Kirwin, 1992; Hammill & Larsen, 1984; Markwardt, 1989;
Noyce, 1985; Woodcock, 1986). Furthermore, previous studies using similar methodology to this
study have also produced poor interrater reliability estimates (Cole et al., 1997b; Ouchi et al., 1996).
The raters for this study, also the authors of this study, had each scored over 100 written ex-
pression protocols prior to this study. Anastasi (1988) noted that reliability coefficients should be at
least 0.70. Therefore, the authors believed that an average interrater reliability of at least 0.70 should
be both necessary and obtainable for the unambiguous interpretation of this study’s findings. Inter-
rater reliability was assessed in four separate analyses: one analysis for each prompt (line drawing
or photograph) by each item type (thematic or mechanical items). As a total of four interrator relia-
bility analyses were conducted, the standard alpha was reduced with a Bonferroni correction to
0.0125, rounded to 0.01.

Results
Table 2 presents means and standard deviations for different item types (mechanics and the-
matic) by each stimulus type (line drawing and photograph). Table 3 presents normality data on the
364 Cole and McLeod

Table 2
Statistics for Mechanics and Thematic Items for Each Stimulus
Item type
Mechanics Thematic
Stimulus M SD M SD

Line drawing 36.07 5.99 34.17 10.01


Photograph 35.52 6.35 40.52 7.89

aforementioned groups. Keppel (1991) has noted that normality estimates should not exceed a z score
of 3. None of the skewness or kurtosis estimates exceeded a score of 3 and, therefore, normality was
adequate for the ANOVAs for the mechanics and thematic items. Homogeneity of variance was the
only other assumption that needed to be assessed. Again, Keppel recommends that homogeneity of
variance estimates, calculated using Hartley’s Fmax, should not exceed 3. For the mechanics items,
the variance for the line drawing was 35.88 and 40.32 for the photograph: Fmax equaled 1.22 for the
aforementioned groups. Similarly, the thematic line drawing group had a variance of 100.20, where-
as the photograph group had a variance of 62.25: Fmax ⫽ 1.61. Summarily, homogeneity of variance
and normality estimates were acceptable and unambiguous interpretations of the single factor
ANOVAs for the mechanics and thematic items were possible.
Tables 4 and 5 present the ANOVAs for the mechanics and thematic items, respectively. Each
of these tables presents the repeated measure ANOVA comparing the stimulus types, as well as each
of the demographic variables combined with the stimulus type. The hypotheses for this study were
supported. The mechanics items, as shown at the top of Table 4, did not differ significantly across
stimulus type; F (1,28) ⫽ 0.38, p ⬎ 0.025. Thematic items, as shown at the top of Table 5, differed
significantly across stimulus type, F (1, 28) ⫽ 25.53, p ⬍ 0.025. Whereas the F and p values of a
statistical analysis only allows one to ascertain if the null hypothesis is false (Vogt, 1993), the effect
size allows one to interpret “the degree to which the null hypothesis is false” (Cohen, 1988, pp. 9–
10). Keppel (1991) recommends using omega squared (␻2) for an effect size estimate on significant
ANOVA results. The effect size for the thematic items was ␻2 ⫽ 0.45. Hence, 45 percent of the vari-
ance of scores for thematic items was explained by the difference among the stimuli.
Tables 4 and 5 also present ANOVA results for demographic variables (socioeconomic status,
gender, and ethnicity), including the interaction between stimulus type and each respective demo-
graphic variable. None of the main effects for the demographic variables were significant for either

Table 3
Normality Statistics for Mechanics and Thematic ANOVAs
Items Skewness z Kurtosis z

Line drawing
Mechanics 1.21 0.31
Thematic 0.91 1.14
Photograph
Mechanics 1.18 0.94
Thematic 2.56 0.51
Note. Skewness and kurtosis z scores were calculated by dividing the
respective score by its standard error.
Children’s Writing Ability 365

Table 4
Results of ANOVAs Using Mechanics Items as the Dependent Variable
Source df Effect MS Effect df Error MS Error F p

Mechanics ANOVA
Stimulus 1 4.41 28 11.59 0.38 .54
Mechanics ⫻ SES ANOVA
SES 1 62.98 27 64.72 0.97 .33
Stimulus 1 0.48 27 11.91 0.04 .84
SES ⫻ Stimulus 1 2.96 27 11.91 0.25 .62
Mechanics ⫻ Gender ANOVA
Gender 1 41.93 27 65.50 0.64 .43
Stimulus 1 0.75 27 11.90 0.06 .80
Gender ⫻ Stimulus 1 3.24 27 11.90 0.27 .61
Mechanics ⫻ Ethnicity ANOVA
Ethnicity 1 113.96 27 62.84 1.81 .19
Stimulus 1 3.14 27 11.98 0.26 .61
Ethnicity ⫻ Stimulus 1 1.21 27 11.98 0.10 .75

item type. Further, none of the interactions between demographic variables and stimulus type were
significant for either item type. One should be cautious when comparing the different ANOVAs to
each other, as the ANOVAs were not orthogonal analyses.
Interrater reliability estimates are provided in Table 6, as well as the Ms, SD’s, and ranges for
each rater, for each of the item types by stimulus type combinations. Four interrater reliability analy-
ses were conducted; one for each item by stimulus type. Reliability correlations ranged from 0.67
(mechanics ⫻ photograph) to 0.81 (mechanics ⫻ line drawing), and all correlations were significant
(p ⬍ 0.01). The mean correlation, calculated through a Fisher’s z transformation, equaled 0.76 ( p ⬍
0.001). Moreover, the mean correlation and all but one of the individual correlations attained a mag-
nitude of effect of at least 0.70. Interrater reliability for this study was adequate and should not hin-
der unambiguous interpretation of the results. The descriptive statistics shown in Table 6 suggest that

Table 5
Results of ANOVAs Using Thematic Items as the Dependent Variable
Source df Effect MS Effect df Error MS Error F p

Thematic ANOVA
Stimulus 1 583.72 28 22.87 25.53 ⬍.01
Thematic ⫻ SES ANOVA
SES 1 87.53 27 141.54 0.62 .44
Stimulus 1 346.36 27 23.61 14.67 ⬍.01
SES ⫻ Stimulus 1 2.70 27 23.61 0.11 .74
Thematic ⫻ Gender ANOVA
Gender 1 127.70 27 140.05 0.91 .35
Stimulus 1 627.79 27 21.06 29.81 ⬍.01
Gender ⫻ Stimulus 1 71.65 27 21.06 3.40 .08
Thematic ⫻ Ethnicity ANOVA
Ethnicity 1 15.58 27 144.20 0.11 .75
Stimulus 1 553.42 27 23.71 23.34 ⬍.01
Ethnicity ⫻ Stimulus 1 0.11 27 23.71 ⬍.01 .95
366 Cole and McLeod

Table 6
Descriptive Statistics and Correlations for Each Rater by Each Prompt
Mechanics items Thematic items
Descriptives Correlation Descriptives Correlation
Stimulus n M SD Range n M SD Range

Photo
Rater 1 29 18.14 3.46 9–24 .67* 29 20.59 3.83 12–25 .77*
Rater 2 29 17.38 3.49 10–24 — 29 19.93 4.56 9–26 —
L.D.a
Rater 1 29 17.97 3.31 11–24 .81* 29 15.66 5.41 6–26 .76*
Rater 2 29 18.10 2.99 12–23 — 29 18.52 5.28 6–26 —
Note. Correlations are interrater reliability estimates between the pair of raters for each respective item type by stimulus type
set of score. The mean correlation between the raters, across item type and stimulus type, was r ⫽ .76, p ⬍ .001.
a
LD ⫽ Line-drawing.
*p ⬍ .001.

the interrater reliability estimates are representative of the agreement among the raters, given the
raters’ similar M, and SD’s.

Discussion
This study’s hypotheses were confirmed. A significant difference between the photograph and
line-drawing stimuli was found with the thematic items, and a non-significant difference between the
stimuli was found for the mechanics items. For the thematic items, participants received higher
scores for their written responses from the photographic stimulus. This study’s results closely re-
sembled the results found by Cole et al. (1997b), as well as the hypotheses by Hooper et al. (1994).
Whereas this study, as well as the Cole et al. study, found a significant difference for thematic-type
items, the specific cause of this difference is not clear. As noted in Cole et al. (p. 6), “The effect could
be primarily due to any of the following: the use of a photograph instead of a line drawing, the de-
piction of the interesting scene, the presentation of a clear-cut conflict situation, the need to use a se-
quence of actions to resolve the conflict, or some interaction among these factors.”
One noteworthy difference between this study’s findings and Cole et al.’s (1997b) findings was
the increase in magnitude of effect for the analysis on thematic items. Cole et al. found an eta-squared
equaled to 0.216, whereas this study found an omega-squared equaled to 0.450 (eta-squared was
equaled to 0.477). The effect size for the stimulus difference for thematic items more than doubled
for children, as opposed to adults. Hence, the importance of stimulus criteria is even more important
for children than it is for older adolescents and adults.
A student’s ability to properly use mechanics in written expression was not found to be impacted
by the stimulus used to illicit the response. However, given the poor reliability typically associated
with what has been referred to as direct assessment of written expression (Breland & Gaynor, 1979),
it would be more prudent to assess a student’s mechanics ability with an indirect measure of written
expression. For example, reliabilities for the PIAT-R Written Expression Subtest Level II range from
0.58 to 0.67 (median interrater reliabilities—see Kaufman, 1990, p. 627), and for the WIAT the re-
liabilities range from 0.79 to 0.89 (these appear to be spuriously inflated due to large heterogeneity
of age—see Cole, Haley, and Muenz, 1997a). Certain indirect measures of written expression thor-
oughly assess mechanics ability, and concomitantly possess desirable psychometric properties. For
example, see the Dictation and Proofing subtests from the Woodcock-Johnson Psycho-Educational
Battery–Revised: Achievement (Woodcock & Johnson, 1989). Therefore, when an examiner is sole-
Children’s Writing Ability 367

ly concerned with assessing a student’s mechanics ability, the examiner would benefit most from us-
ing an indirect measure of mechanics-type written expression assessment.
Thematic items on the other hand, necessitate a more direct assessment. This study demon-
strated that the selection of the stimulus is exceptionally important in measuring thematic writing
ability. Given the nature of thematic-type items used to assess story content and cohesiveness, direct
assessment of thematic writing ability is critical. Berninger et al. (1994) showed that writing ability
at the word level does not predict writing ability at the sentence or paragraph level, nor does sen-
tence level writing ability predict paragraph level writing ability. Cole et al. (1997a) have, therefore,
suggested that direct written expression should be evaluated with a multiparagraphical written re-
sponse. Auchter and Hatch (1990) and Breland and Gaynor (1979) have also noted that the direct
method of written expression assessment is a more valid assessment format.
The ramifications of stimulus selection for thematic writing ability assessment is even more
noteworthy when one considers previous studies on thematic writing ability. Expert writers are like-
ly to demonstrate organizational skills, fluid transitions, goal-directed writing, and a good under-
standing of the writing goal (Bereiter, 1980; Burtis, Bereiter, Scardamalia, & Tetroe, 1983; Fitzger-
ald, 1987; Flower & Hayes, 1981; Halliday & Hasan, 1976; Hayes & Flower, 1986; Hooper et al.,
1994; McCutchen & Perfetti, 1982; Scardamalia & Bereiter, 1986; Sommers, 1980). Moreover, poor
writers are apt to contain the following negative writing characteristics in their text: poorly organized
writing at both the sentence and paragraph level; greater likelihood of producing less interesting
stories; decreased probability of changing spelling, grammar, or the content of the text to improve
their communication of ideas (Anderson, 1989; Englert, 1990; Englert, Raphael, Anderson, Gregg,
& Anthony, 1989; Englert & Thomas, 1987; Graham & Harris, 1987; Graham & Harris, 1989; Gra-
ham, Harris, & MacArthur, 1990; Graham, Harris, MacArthur, & Schwartz, 1991; Hooper et al.,
1994; Wong, Wong, & Blakenship, 1989). Summarily, the aforementioned previous research indi-
cates that both expert and poor writers are characterized by their thematic writing ability. A measure
of written expression that does not maximize a student’s ability to demonstrate their thematic writ-
ing ability may, thus, be invalid.
Assessment measures of written expression contain a wide variety of stimuli from which the
writer may respond. This study focused on the prompts from the PIAT-R. We have claimed that the
stimuli from the PIAT-R meet only one of the criteria suggested by Hooper et al. (1994). It could be
argued that the stimulus labeled “Money in the Street” is both novel and interesting. However, as
noted by Cole, Haley, and Muenz (1997a), a measure of written expression which allows for one to
select from a variety of stimuli, must consistently meet the criteria across all stimuli, in order for the
stimuli to be classified as viably meeting Hooper et al.’s criteria. It should also be noted that the stim-
uli presented in this study are not necessarily indicative of stimuli from other measures of written
expression. For example, stimuli from the TOWL-2 and TOWL-3 meet nearly all of the criteria set
forth by Hooper et al.; color photography was the only criterion not met. The aforementioned review
by Cole, Haley, and Muenz found that, ultimately, none of the twelve measures they reviewed met
all of Hooper et al.’s criteria. This Machiavellian statement must be interpreted with caution. As of
this time, there has been no research to explore the differential impact of each of Hooper et al.’s stim-
ulus criteria. This area of research should be explored further.
Although the results of this study indicate a clear and large effect for the impact of the stimu-
lus on thematic items, specific concerns regarding this study must be addressed. As previously men-
tioned, this study only addressed the totality of Hooper et al.’s (1994) stimulus criteria. Future re-
search should attempt to assess the differential effects of each criteria proposed by Hooper et al. A
study recently conducted, however, has determined that a different Hooper-type stimulus produced
similar results to those contained herein (McLeod, Boyer, Ouchi, Cole, & Callaghan, in press). An-
other possible source of secondary variance was the raters’ knowledge of this study’s hypotheses (for
368 Cole and McLeod

example, see Rosenthal et al., 1964). Whereas previous research has found that rater knowledge of
a study’s hypothesis is not detrimental (Kent, O’Leary, Diament, & Dietz, 1974), a study is current-
ly underway to determine the impact of rater knowledge of the hypotheses for the current study. Fi-
nally, further research on the impact of the stimulus in written expression should employ a more di-
verse sample. For example, discriminative validity of the stimulus impact can be explored by
examining the differences between children with and without disorders of written expression. It
should be noted, however, that despite this study’s small sample size, this was not detrimental to the
study’s results as adequate power was obtained (Cohen, 1988).
Written expression assessment is fraught with many problems; poor interrater reliability, a lack
of theoretical development, method of assessment ambiguities (direct versus indirect), and item con-
tent validity are a few examples. Given the relationship between thematic writing ability and the clas-
sification of expert and poor writers, combined with the impact stimulus selection has on thematic
writing, this study provides important data to improve written expression assessment. The authors
of this study encourage developers of written expression assessment tools to not only create stimuli
with a theoretical background (such as Hooper et al.’s, 1994, stimulus criteria), but to also develop
scoring systems which allow for objective evaluation of a writer’s mechanical and thematic writing
abilities. Moreover, written expression assessment batteries should contain at least a direct assess-
ment portion (preferably both direct and indirect), as well as sound psychometric properties. Al-
though the aforementioned goals are not easily attainable, they are necessary for the proper assess-
ment of written expression.
School psychologists and other administrators of written expression examinations should also
consider the impact of the stimulus on the written output from children. Hooper et al. (1994) have
recommended specific criteria mentioned throughout this article. Yet, in order to obtain a full un-
derstanding of a child’s writing abilities, the child should be exposed to multiple writing tasks.
Whereas the Hooper et al. method has been shown to produce rich writing samples from children,
test examiners may also be interested in how a child writes to other types of prompts, such as oral,
fantasy, or black and white depictions.
References
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders, fourth edition. Washington,
DC: Author.
Anastasi, A. (1988). Psychological testing, (6th ed.). New York: Macmillan.
Anderson, P.L. (1989). Productivity, syntax, and ideation in the written expression of remedial and achieving readers. Jour-
nal of Reading, Writing, and Learning Disabilities International, 4, 115–124.
Auchter, J.C., & Hatch, M. (1990). Evaluating multiple writing literacies through multiple choice testing and direct assess-
ment. Paper presented at the 1990 Annual Conference of the National Testing Network on Writing, New York, NY.
Benes, K.M. (1992). Peabody Individual Achievement Test, Revised. In J.J. Kramer & J.C. Conoley (Eds.), The eleventh men-
tal measurements yearbook (pp. 649–652), Lincoln, NE; Buros Institute of Mental Measurements.
Benton, S.L. (1992). Test of Written Language, 2. In J.J. Kramer & J.C. Conoley (Eds.), The eleventh mental measurements
yearbook (pp. 979–981), Lincoln, NE: Buros Institute of Mental Measurements.
Bereiter, C. (1980). Development in writing. Hillsdale, NJ: Lawrence Erlbaum Associates.
Berninger, V.W., Mizokawa, D.T., Bragg, R., & Cartwright, A.C. (1994). Intraindividual differences in levels of written lan-
guage. Reading and Writing Quarterly: Overcoming Learning Difficulties, 10, 259–275.
Breland, H.M., & Gaynor, J.L. (1979). A comparison of direct and indirect assessment of writing skills. Journal of Educa-
tional Measurement, 16, 119–128.
Burtis, P., Bereiter, C., Scardamalia, M., & Tetroe, J. (1983). The development of planning in writing. Chichester, England:
John Wiley.
Carrow-Woolfolk, E. (1996). Oral and Written Language Scales. Circle Pines, MN: American Guidance Service.
Chapman, C. (1990). Authentic writing assessment, R–88–062003 (Vol. EDO-TM-90-4). Washington, DC: American Insti-
tutes for Research.
Cohen, J. (1988). Statistical power analyses for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates.
Children’s Writing Ability 369

Cole, J.C., Haley, K.A., & Muenz, T.A. (1997a). Written expression reviewed. Research in the Schools, 4(1), 17–34.
Cole, J.C., Muenz, T.A., Ouchi, B.Y., Kaufman, N.L., & Kaufman, A.S. (1997b). The impact of the pictorial stimulus on the
written expression output. Psychology in the Schools, 34(1), 1–9.
Englert, C.S. (1990). Unraveling the mysteries of writing through strategy instruction. New York: Springer-Verlag.
Englert, C.S., Raphael, T.E., Anderson, L.M., Gregg, S.L., & Anthony, H.M. (1989). Exposition: Reading, writing, and the
metacognitive knowledge of learning disabled students. Learning Disabilities Research, 5, 5–24.
Englert, C.S., & Thomas, C.C. (1987). Sensitivity to test structure in reading and writing: A comparison of learning disabled
and non-learning disabled students. Learning Disabilities Quarterly, 10, 93–105.
Fitzgerald, J. (1987). Research on revision in writing. Review on Educational Research, 57, 481–506.
Flower, L., & Hayes, J. (1981). A cognitive process theory of writing. College Composition and Communication, 32, 365–
387.
Graham, S., & Harris, K.R. (1987). Improving composition skills of inefficient learners with self-instructional strategy train-
ing. Topics in Language Disorders, 7, 66–77.
Graham, S., & Harris, K.R. (1989). A components analysis of cognitive strategy training: Effects of learning disabled stu-
dents’ compositions and self-efficacy. Journal of Educational Psychology, 81, 353–361.
Graham, S., Harris, K.R.,& MacArthur, C.A. (1990). Learning disabled and normally achieving students’ knowledge of the
writing process.
Graham, S., Harris, K.R., MacArthur, C.A., & Schwartz, S. (1991). Writing and writing instruction for students with learn-
ing disabilities: Review of a research program. Learning Disabilities Quarterly, 14, 89–114.
Grill, J.J., & Kirwin, M.M. (1992). Written Language Assessment. In D.J. Keyser & R.C. Sweetland (Eds.), Test critiques
(Volume 9, pp. 676–684), Kansas City, MO: Test Corporation of America.
Halliday, M.A.K., & Hasan, R. (1976). Cohesion in English. London, England: Longman Group.
Hammill, D.D., & Larsen, S.C. (1984). The Test of Written Language. In D.J. Keyser & R.C. Sweetland (Eds.), Test critiques
(Volume 1, pp. 688–711), Kansas City, MO: Test Corporation of America.
Hammill, D.D., & Larsen, S.C. (1988). The Test of Written Language, second Edition. Austin, TX: PRO-ED.
Hammill, D.D., & Larsen, S.C. (1996). The Test of Written Language, third Edition. Austin, TX: PRO-ED.
Hayes, J.R., & Flower, L.S. (1986). Writing research and the writer. American Psychologist, 41, 1106–1113.
Hooper, S.R., Montgomery, J., Swartz, C., Reed, M.S., Sandler, A.D., Levine, M.D., Watson, T.E., & Wasileski, T. (1994).
Measurement of written language expression. In G.R. Lyon (Ed.), Frames of reference for the assessment of learning
disabilities: New views on measurement issues (pp. 375–417), Baltimore, MD: Paul H. Brookes.
Kaufman, A.S. (1990). Assessing adolescent and adult intelligence. Boston, MA; Allyn and Bacon.
Kaufman, A.S., & Kaufman, N.L. (1985a). Kaufman Test of Educational Achievement, Brief Form. Circle Pines, MN: Amer-
ican Guidance Service.
Kaufman, A.S., & Kaufman, N.L. (1985b). Kaufman Test of Educational Achievement, Comprehensive Form. Circle Pines,
MN: American Guidance Service.
Kent, R.N., O’Leary, K.D., Diament, C., & Dietz, A. (1974). Expectation biases in observational evaluation of therapeutic
change. Journal of Consulting and Clinical Psychology, 42, 774–780.
Keppel, G. (1991). Design and analysis: A researcher’s handbook. (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall.
Markwardt, F.C., Jr. (1989). Peabody Individual Achievement Test, Revised. Circle Pines, MN: American Guidance Service.
McCutchen, D., & Perfetti, C. (1982). Coherence and connectedness in the development of discourse production. Text, 2,
113–119.
McLeod, J.S., Boyer, K.M., Ouchi, B.Y., Cole, J.C., & Callaghan, R.L. (In press). Validating the effect of the stimulus is writ-
ten expression. Psychological Reports.
Muenz, T.A., Ouchi, B.Y., & Cole, J.C. (1999). Item analysis of written expression scoring systems from the PIAT-R and
WIAT. Psychology in the Schools, 36, 31– 40.
Noyce, R. (1985). Woodcock Language Proficiency Battery. In J.J. Kramer & J.C. Conoley (Eds.), The ninth mental mea-
surements yearbook (pp. 1765–1766), Lincoln, NE: Buros Institute of Mental Measurements.
Ouchi, B.Y., Cole, J.C., Muenz, T.A., Kaufman, A.S., & Kaufman, N.L. (1996). Interrater reliability of the written expression
subtest of the Peabody Individual Achievement Test, Revised: An adolescent and adult sample. Psychological Reports,
79, 1239–1247.
Psychological Corporation. (1992). Wechsler Individual Achievement Test. San Antonio, TX: Psychological Corporation.
Rosenthal, R., Friedman, C.J., Johnson, C.A., Fod, K.L., Shill, T.R., White, C.R., & Vikan-Kline, L.L. (1964). Variables af-
fecting experimenter bias in a group situation. Genetic psychological monographs, 70, 271–296.
Scardamalia, M., & Bereiter, C. (1986). Research on written composition. (3rd ed.). New York: Macmillan.
Sommers, N. (1980). Revision strategies of student writers and experienced adult writers. College Composition and Com-
munication, 31, 378–388.
370 Cole and McLeod

U.S. Department of Education, National Center for Educational Statistics. (1996). Digest of Education Statistics (NCES 96–
133). Washington, DC: Author.
Vogt, W.P. (1993). Dictionary of statistics and methodology. Newbury Park, CA: Sage.
Wong, B., Wong, R., & Blakenship, J. (1989). Cognitive and metacognitive aspects of learning disabled adolescents’ com-
position problems. Learning Disabilities Quarterly, 15, 145–152.
Woodcock, R.W. (1986). Woodcock Language Proficiency Battery, English form. In D.J. Keyser & R.C. Sweetland (Eds.),
Test critiques (Volume 3, pp. 726–735), Kansas City, MO: Test Corporation of America.
Woodcock, R.W. (1987). Woodcock Reading Mastery Tests, Revised. Circle Pines, MN: American Guidance Service.
Woodcock, R.W., & Johnson, M.B. (1989). Woodcock-Johnson Psycho-Educational Battery, Revised. Chicago, IL: Riverside
Publishing.

Você também pode gostar