Você está na página 1de 15

Context-sensitive spelling

MARK ARONOPF

regularities in English vowel

and ERIC KOCH

Department of Linguistics, The University at Stony Brook, Stony Brook, New York, USA ABSTRACT. The predictive value of rime spellings in English was compared directly to other types of regularities beyond the level of the single letter. A computer-assisted analysis of a list of twenty-four thousand written words, each paired with its corresponding pronunciation, reveals that only a small number of rime spellings are highly regular in their pronunciations. The conventional division of vowel letter pronunciations into short and long in closed and open written syllables is the most reliable key to English pronunciation. Our findings support the notion that English spelling is based at least in part on syllable structure. In addition, prefixes and suffixes provide very reliable clues to pronunciation, which suggests that their regularity should be exploited in the teaching of reading. KEY WORDS: Decoding, Linguistic regularity, Pronunciation, Rhyme, Rime, Spelling, Syllable

Types of linguistic regularities in English spelling

Most linguists have long believed that the optimal writing system is an alphabet in which each letter stands for one speech sound and vice versa. In fact, no commonly used alphabet is phonemically optimal in this way, although a few come close, like those of Italian and Polish. Of all alphabetic writing systems, though, that of English is farthest from this ideal: no English letter corresponds without exception to one sound, with some letters corresponding to almost twenty sounds. In addition, the system of correspondences is not confined to individual letters but goes from one extreme to the other, from single letters to whole words. Written English words sometimes can be sounded out one letter at a time. So, up is /ap/ because (u) = /a/ in a closed syllable (a notion that we discuss in detail below) and (p) = /p/. (In Table 1 we have given all the phonetic symbols that we use in this paper.) Sometimes,
Table I. Phonetic symbols and their pronunciations Phonetic symbol a aj a? e ej Pronunciation hot ripe hat bare fate Phonetic symbol j 3 ow uw U Pronunciation yet more note rude Put

Reading and Writing: An Interdisciplinary Journal 8: 25 l-265 (June 1996) 0 1996 Kluwer Academic Publishers. Printed in the Netherlands.

252

MARK

ARONOFF

AND

ERIC

KOCH

however, an entire words pronunciation has to be memorized, because the pronunciation makes little or no sense in terms of one or more of the words individual letters. Thus, women is the only word in English in which (0) is pronounced /I/. The observation that some words can be sounded out completely letter by letter while others are idiosyncratic led early on to the long-standing and still unresolved struggle among reading teachers between proponents of the approaches to reading that are based on these types of words: phonics, based on regular letter-to-sound matches, and look-say, based on irregular words (Huey 1908). Modern teaching practice, though, recognizes that both sides are right. A person who learns to read and write must therefore master both the regular correspondences that are stressed in phonics and the irregular words that are stressed in look-say. It is an oversimplification, however, to conclude that there are just two ways to read. There are, in fact, several types of intermediate cases between phonically regular words likefad and idiosyncratic words like women. In order to understand these intermediate types, we may use the distinction between context-ji-ee and context-sensitive generalizations. Context-free generalizations apply regardless of what surrounds the element that the generalization is stated on, while context-sensitive generalizations depend on adjacent elements; they only apply in a certain context. Absolute regular letter-to-sound correspondences, in which a given letter always stands for a given sound, are context free, because they do not depend on what surrounds the letter in question. There are very few of these in English. The pronunciation of the letter (f) is one candidate: if we exclude the one word of, (f) is always pronounced /f/ in English, regardless of the letters that surround it. The letter (z) is similarly almost always pronounced /z/, regardless of context, and (d) is pronounced /d/, except for the suffix (ed), which is sometimes predictably pronounced /t/. English spelling is frequently criticized because it has so few context-free letter pronunciations, but English spelling is still not so hopeless as some reformers would have us believe, because it has many more context-sensitive spelling regularities, where the pronunciation of a particular letter or sequence of letters is regular when that letter or letter sequence occurs in a certain context. The exact nature of these regularities is the subject of our research. In particular, we are interested in the contribution of individual written syllable rimes, a notion that we define below. Although previous researchers from Venezky (1970) to Treiman (1992) have explored context-sensitive regularities in English spelling, none have explicitly framed their discussion in terms of this parameter. Note, as a first example of the value of context, that it is quite common for a sequence of letters to be pronounced regularly in a certain way, though the individual letters of that sequence are not regular in their pronunciation when looked at individually. The members of the sequence thus provide context for each other. Some common sequences of consonant letters that are highly regular are (tch), which is always pronounced as in match; (ck), which

ENGLISH

VOWEL

SPELLING

253

is always pronounced as in luck; and (dg), which is always pronounced as in judge. English vowel letters are especially irregular in their pronunciation if we look at them only out of context. Unlike consonant letters, no English vowel letter approaches context-free letter-to-sound correspondence. The letter (0) alone corresponds to seventeen different sounds (Venezky 1970). However, once we allow ourselves to look at contexts - sequences of vowel and consonant letters - then greater regularity emerges. For starters, sequences of vowel letters without consonant letters, vowel digruphs, are much more regular in their pronunciation than are individual vowel letters. So, while (0) has seventeen different pronunciations and (a) has ten, (oa) has only three: /owl (as in loaf). /a/, and /owej/. But the second occurs only in the words broad and abroad, while the last occurs only in one word, oasis. In the great majority of cases, (oa) is pronounced /ow/. In Tables 2 and 3 we list vowel digraphs as well as vowel digraphs followed by (r) along with their most common pronunciations. Sequences consisting of a vowel letter followed by one or more consonant letters are also often quite regular. So, if we were to demand regularity at the level of the individual letter, then we would say that tight is pronounced

Table 2. The pronunciation Vowel digraph ae ai au aw ay ea ee ei eo eu ew ey ie oa Table 3. The pronunciation Vowel digraph + r aer air aur ear eer

of vowel digraphs Phonetic value ij ej 3 3 ej ij ij ej ij juw juw ij ij ow of vowel digraph + r Phonetic value er er 3r ir it Vowel digraph + r eir eor eur oor our Phonetic value ir ir jur ur ar Vowel digraph
oe

Phonetic value
ow 4 ow

oi 00
OU

ow OY ue ui uo UY ye eau eye ieu

aw ow oj juw juw uw aj 4 ow 4 uw

254

MARK

ARONOFF

AND

ERIC

KOCH

irregularly, but tight, lighr, sight, fight and (except for eight) all of the over one hundred words in English which end in (ight) are pronounced with /ajt/, which is to say that the letter sequence (Cight) has a regular pronunciation, or that the letter (i) is regularly pronounced as /aj/ in the context of being followed by (ght). Interestingly, a regular letter sequence like this one consisting of a vowel followed by one or more consonants is not linguistically arbitrary, but rather corresponds to a well-known unit: the syllable rime. We will therefore call a sequence of a vowel letter followed by one or more consonant letters that corresponds to a syllable rime a rime spelling. By contrast, sequences that consist of a vowel letter preceded by one or more consonant letters, which do not correspond to a linguistically important subpart of a syllable, do not have any special role in English spelling. Other linguistic units besides the rime are important in understanding English spelling. Thus, one major morphological characteristic of English spelling that sets it apart from most other spelling systems is the fact that it systematically distinguishes homonyms.3 Thus, we have examples like pair, pure and pear, which are spelled differently, though all of them are pronounced /per/. In the case of homonyms, different spellings are used to distinguish lexical items that have the same pronunciation. Similarly, an affix will sometimes have a constant spelling even though it is pronounced quite differently in different contexts. For example, the spelling of the regular plural marker for nouns is almost always (s) and that of the regular past tense marker always (ed), even though the pronunciation of each varies quite systematically. The plural marker (s) may be pronounced /s/ (as in curs), /z/ (as in hills), or /Iz/ (as in horses), depending on the last sound of the word that it is attached to, but it is almost always spelled (s), regardless of its pronunciation. Similarly, the past tense marker (ed) is pronounced /t/ (stopped), /d/ (stubbed), or /Id/ (butted), again depending on the last sound of the word it attaches to, yet it is always spelled (ed). There is no good reason why one affix ((ed)) is always spelled with (e), regardless of whether the (e) is pronounced, while the other one ((s)) is usually not spelled with (e), but nonetheless, the spelling of these individual morphemes is very regular. The regularities in the pronunciation of written English words are thus found at several well-understood linguistic levels beyond the individual letter/sound correspondence that linguists expect to find in a simple ideal alphabet: lexical (whole words), morphological (individual affixes), vowel letter digraphs (vowel letter sequences), and rimes (vowel-consonant sequences). This system of simultaneous regularities is both what makes English spelling so difficult to understand and what makes it interesting to linguists. Long and short vowels in relation to syllable rimes Traditional spelling instruction already depends on the notion of syllable rimes, but only indirectly and by another name: the classification of vowel letter

ENGLISH

VOWEL

SPELLING

255

pronunciations as long and short. Old English had pairs of truly long and short vowel sounds that were identical except that one took up more time than the other. By the Middle English period, long and short vowel sounds occurred largely in separate environments: short vowel sounds mostly in closed syllables or closed rimes, in which the vowel sound is followed by one or two consonant sounds in the same syllable; long vowel sounds largely in open syllables or open rimes, in which there is no closing consonant sound at the end of the syllable, so that the syllable is said to be open. We will use the abbreviation CV for open syllables and CVC for closed syllables.4 In the period that divides Middle English from Modern English (14OO-1600), the phonetic value of the long vowel sounds was altered quite dramatically in the Great Vowel Shift, but their spelling did not change, resulting in the spelling system that we have today, in which the same letter stands for two quite distinct vowel sounds, which we call long and short, although they are only historically related to pairs of long and short vowel sounds and are otherwise quite different from one another phonetically. These are distributed more or less as their precursors were: short pronunciations in closed syllables and long pronunciations in open syllables. This system is laid out in Table 4, which also gives the pronunciations of vowel letters in two other types of syllables that we discuss in more detail below. Henceforth, we will refer to the system of long and short pronunciation of vowels in open and closed syllables as well as the system of vowel digraphs and vowel digraphs followed by (r) (laid out in Tables 2 and 3) as syllable (rime)-rype analysis, since it depends on the type of syllable rime that a vowel letter occurs in, without reference to individual consonants. One other sound change determined the spelling of historically long and short vowel sounds: the loss of unstressed final /e/. In a word like tale, the final (e) was originally fully pronounced and the word contained two syllables. The first syllable was open and so the vowel sound in that syllable was long. At about the time of Chaucer, word-final unstressed /e/ was dropped. The consonant sound that preceded this vowel sound then became the closing consonant sound of the preceding syllable, so that the word tale was now pronounced as a single closed syllable, CVC. One might have expected the
Table

4. Default pronunciations

of single vowel letters by syllable type

Letter

Pronunciation of the letter Open syllable


a

Closed syllable

Vowel + r# syllable

VCe# syllable ci ij aj ow juw 4

ij
a.i

ow juw a.i

256

MARK

ARONOFF

AND

ERIC

KOCH

vowel sound of this closed syllable to become short, in accordance with the general pattern of short vowel sounds in closed syllables and long vowel sounds in open syllables, but it didnt. Furthermore, the (e), while it was no longer pronounced, remained in the writing system as what we now call silent e, a kind of virtual vowel: although it is not pronounced, the (e) still serves as a placeholder, so that the (1) of tale can be said to form a purely written CV syllable with (e), and the first syllable is still effectively open orthographically, though not in its spoken form. Silent (e) thus functions in Modem English as a way of indicating preceding orthographically open syllables and long vowel sounds, even in cases where the spoken word consists of a closed syllable. This allows the writing system to preserve the pattern of so-called long vowels in open (written) syllables and so-called short vowel sounds in closed (written) syllables. We refer to this word-final silent (e) spelling pattern as VCe#. The opposite of silent (e) is consonant doubling. In the same way that silent (e) is used to indicate that the vowel sound of the previous written syllable is long, consonant doubling is used to indicate that it is short. There are no true phonetically double consonants in spoken English, except when the first word of a compound ends in a certain consonant sound and the second word begins in the same one (e.g., crabbait orfiallike). Otherwise, when we find a double consonant letter, its purpose is usually to ensure that the preceding syllable is orthographically closed, so that the vowel sound will be short. Consider the words hop and hopping. We double the (p) when we add the (ing) to hop in hopping, because without the double consonant letter, we would have hoping with an orthographically open first syllable, and the vowel sound would be long. Doubling the (p) ensures that the first syllable will remain closed orthographically so that the vowel sound will be short. The distinction between open and closed syllable rimes is therefore pervasive in English spelling. By and large, a vowel sound in an orthographically closed syllable rime is short, while a vowel sound in an orthographically open syllable rime is long. The system thus depends on syllable structure and in particular on the distinction between orthographically open and closed syllable rimes. But this pattern of orthographically open and closed syllable rimes is quite abstract and sometimes difficult to grasp. It is also distant from phonetic reality, since it depends on written syllables that are distinct from the actual spoken syllables that a child can learn to perceive with minimal training. Rime spelling As we noted above, traditional spelling instruction emphasized only the two extremes of individual letters and whole words. Other levels of regularity were largely ignored by educators. Current spelling instruction programs take advantage of letter/sound, whole-word and morphological regularities together. The question that provoked this study is whether individual rime spellings

ENGLISH

VOWEL

SPELLING

257

like the (ight) of light show their own special regularity, distinct from the general patterns of long and short vowel pronunciations that we find in open and closed written syllables. Since individual rime spellings are much more concrete than the open and closed syllable rime types that determine the distribution of long and short pronunciations of vowel letters, using individual rimes might be beneficial in teaching. But the first question that has to be answered is whether these individual rime spellings play a role in the system. The primary purpose of our study was, therefore, to investigate the predictive value of individual rime spellings compared directly to the general pattern of long and short vowel letter pronunciations in open and closed written syllable rimes, as well as compared to the other types of regularities that we have identified beyond the level of the single letter (morphemes and whole words). If rime spellings are indeed predictive on their own, then it makes sense to add them to the repertoire of teaching tools. We begin this study because we had noticed that certain rime spellings do indeed have predictive value. Compare the vowel digraph (00) and the rime spelling (ook). (00) is normally pronounced as /uw/ (e.g., boot, broom, cool), but (ook) is nearly always pronounced as /II/: book, cook, hook, shook, look, nook, snook, rook, brook, crook, forsook, and rook are consistent with the /U/ pronunciation, while only spook has the predicted /uw/ pronunciation. It would seem much easier for a child to learn one fact ((ook) at the end of a syllable is pronounced /VW) than to have to memorize separately each word that ends in (ook).

METHOD

In order to investigate the differential impact of the types of linguistic regularity in English spelling that we have identified, we undertook a computerassisted analysis of the spelling patterns found in a list of twenty-four thousand written words, each paired with its corresponding pronunciation. The list contained both very common simple words such as bat and less frequent complex words such as abnegation. We programmed a computer so that each written word on this list could be transformed automatically through a series of steps from spelling into as much pronunciation transcription as possible, using the four types of spelling regularity that we have identified: whole-word, morphological, rime, and syllable rime-type. Since the consonant letters of English do not present as great a problem to learners, only the vowel letters were examined. The amount of transformation from written word to pronunciation naturally depends upon the impact that each type of regularity has. For each type of spelling regularity, the number of affected syllables and the number of affected words in the entire list were recorded. Most importantly, the hypothetical pronunciation that was arrived at by using each type of regularity individually was then compared against the actual pronunciation and an accuracy percentage for each type of regularity was established. After

258

MARK

ARONOFF

AND

ERIC

KOCH

the types were examined individually, the four types were examined collectively and an accuracy percentage was established for all four types when used together. The priority order for the types used in succession was whole-word, morphological, rime and syllable rime-type. This order is based upon the projected size of the domain that each type of regularity should affect. Thus, whole-word analysis replaces an entire word with pronunciation whereas syllable rime-type analysis only replaces one syllable of a word at a time. Written syllables and rime spellings Both the traditional method of distinguishing between long and short vowel sounds and the rime-based method depend on being able to break a written work into written syllables. Most people are quite adept at breaking spoken words up into syllables and young children generally find it much easier to manipulate spoken syllables than to deal with individual speech sounds, as first shown by Savin (1972). However, this ability to break spoken words up into syllables, although useful for developing language awareness, cannot be translated directly into reading, for the simple but often overlooked reason that a person cannot divide a word into spoken syllables without first knowing how the entire word sounds. A child who is looking at a written word on paper and is trying to read it does not know how it sounds. Indeed, figuring out how the word sounds is precisely what we are trying to teach the child to do. The ability to break a spoken word up into syllables is therefore not directly transferrable at first to the act of reading an unfamiliar work, although it is useful in spelling unfamiliar words, which involves the inverse task of turning spoken words into written words (Goswami and Bryant 1990). If we are trying to measure the value of written syllables and rimes in decoding, then we must not look at spoken syllables but rather written syllables. If we can break a written word up into written syllables without relying on how the word sounds, and if the pronunciation of these written syllables is regular, then it may be useful to use written syllable structure directly in teaching children to read. But a method of written syllable division that depends on spoken words, although it might have some indirect value, is not what we are after. Stanback (1991, 1992), in her pioneering study of individual rime spellings, employs such a speech-dependent method of syllable division, based on Kenyon (1934). Kenyons rules, however, rely on the spoken form of a word, as well as its written form. For example, decade is divided syllabically by Kenyon as [dec]a[ade]o, but parade is divided as [pa]o[rade]o. One begins with an open syllable and the other with a closed syllable, even though the two words have exactly the same written structure: CVCVCe#. Closer inspection of Kenyons rules for syllable division explains the discrepancy: decade and parade differ in their spoken form; decade is stressed on the first vowel, while parade is stressed on the second vowel and the first is reduced. This difference in stress, which is not detectable from spelling, determines the difference in syllable structure.

ENGLISH

VOWEL

SPELLING

259

Since Kenyons rules require knowledge of stress within a spoken word, as well as finding particular speech sounds within a word, they do not meet our needs. A child who did not know how to pronounce a particular written word could not apply Kenyons rules to that word, whereas if the child knew how to pronounce the word, then she or he could apply Kenyons rules, but would have no need to learn the pronunciation. In dividing a written word up into open and closed written syllables to which we could apply our analysis, we therefore employed an algorithm that depended only on the written distinction between consonant letters and vowel letters: If a vowel letter is followed by at least one consonant letter at the end of a word or at least two consonant letters anywhere within the word (allowing for digraphs like (th) to count as one letter), the written syllable containing that vowel is defined as closed; in all other cases, the written syllable is defined as open.6 Following Stanback, we do not limit the categorization of the written syllable to open and closed. A sequence of vowel letters or a vowel letter followed by (w) or (y) is classified as a vowel digruph syllable and a vowel followed by (r) is classified as a vowel-r syllable. A minor syllable type is con~onanr -1e or -re, in which a vowel letter is followed by a single consonant letter, which is in turn followed by (le) or (re) (as in ruble or fibre). Syllables of this type pattern are like open syllables, in that the pronunciation of the vowel letter is normally long. The category silent e, whose name implies knowledge of silent letters in words, has been reinterpreted as one in which words end in a (VCe) letter pattern. This gives us the six types of written syllables shown in Table 5, which are essentially identical to Stanbacks.
Table 5. Written syllable types with examples Syllable type closed vowel digraph vowel + r Vce# consonant -1e or -re Example (using the letter o) hello spot road for note noble ogre

open

ANALYSIS

In order to permit us to perform the analysis, the spellings in our list of twentyfour thousand written words and their corresponding pronunciations were first altered so that all consonant letters were treated simply as vowel separators. The written words were then broken down into syllable types, using the simple algorithm discussed above. So, for example, the written word and pronunciation of the entry for candidate would undergo the following transformation: written word: candidate + [closed with (a)]o[open with (i)]o[(a-e#)]o spoken word: /kzndIdejt/ + ae-I-ej

260

MARK

ARONOFF

AND ERIC

KOCH

In this example, (a) in a closed written syllable corresponds to /a/, (i) in an open written syllable corresponds to /I/ and (a) before a word-final (e) corresponds to /ej/. The predictive value of rime spellings

By analyzing every written syllable of all twenty-four thousand written words, a correlation is established between the written syllable types and their actual pronunciation. Table 6 shows, for example, how (a) in a closed written syllable is pronounced (from highest percentage to lowest).
Table

6. The pronunciation

of the letter a in close syllables Percentage


61.40% 25.90% 05.00% 00.02%

Pronunciation

/a?/
/al /a/

law/

Since /a~/ is the most common (modal) pronunciation for (a) in a closed written syllable, it is the default pronunciation of (a) in a closed written syllable (the pronunciation used in the absence of information about which consonant letter follows (a) in a closed written syllable). To assess the predictive value of individual rime spellings in determining pronunciation, all non-modal pronunciations with a frequency of five percent or higher for a given vowel letter in closed written syllables are examined and compared against the default. We then look for rime-spelling patterns within each nonmodal pronunciation for each vowel letter. Altogether, hundreds of candidate rime spelling pronunciations can be found using this method. However, because our goal is simultaneously scientific and pedagogical (we want to know both how the system works and how learners can better take advantage of this system), we focus on instances where the rime-spelling pronunciation is both distinct from the modal pronunciation and is valid within the particular rime spelling for at least half the cases. This permits us to cast the rime-spelling pronunciation as a rule within its domain. After discarding all rime spellings that are either identical to the modal value for the vowel letter or inconsistent by these criteria, only twenty-seven independently regular rime spellings remain. However, even many of these are suspect. For example, the (oot) rime spelling is pronounced 60.7% of the time as /U/ and 39.3% of the time as /uw/, which is the modal pronunciation for (oot); however, except for soot, every instance in which (oot) is pronounced as /U/ is either a derivative of foot (e.g., footsie) or a compound word in which the word foot is included (e.g., footpad). In addition to the word foot, the following words create similar illusory rime spellings: child, mild, wild, pest, most, come, some, work, wood, good, hood, and give. Additionally, certain commonly

ENGLISH

VOWEL

SPELLING

261

occurring morphemes have irregular pronunciations which should not be attributed to rime spelling pronunciation: -some, -ceive, -hood, -ous, -age, -at-d, -ive. Furthermore, it seems that -age, are, a-d, -ive are not irregular pronunciations per se, but the results of vowel reduction in unstressed environments. After eliminating inconsistent rime spellings and the rime spellings that can be explained due to one or two common words, the rime spellings shown in Table 7 remain. As there are approximately twenty-four thousand words in the list examined, the six hundred eighteen words in Table 7 account for only 2.6% of them. This contrasts with Stanbacks conclusion, which is based on spoken syllables and does not factor out the contribution of individual high-frequency words and default pronunciations of vowel letters.
Table 7. Consistent rime spellings Rime ai+C ind* igh ight
011

Example talk find high light troll fatigue antique epilogue bread hook
fowl

Consistency (%) 96 II 100 100 77 100 100 75 88 98 72 95

Total consistent I24 words 61 words 39 words 108 words 17 words 2 words IO words I6 words 1I4 words 60 words I4 words 53 words Total = 618 words

igue ique ogue ead ook owl own

* ind is recalculated so that all compound words containing wind are counted as one word; otherwise, ind would be 44.3% consistent.

Whole word spelling of high-frequency

words

In the most extreme case of an ineffective alphabetic spelling system, one would have to memorize the pronunciation of every word and could derive no phonetic value from the letters. While this is certainly not the case where English consonant letters are concerned, English vowel letters seem extremely unpredictable: In the list of twenty-four thousand words, (a), (e), (i) and (0) are pronounced more than a dozen different ways each; (u) is pronounced ten different ways and (y) is pronounced six different ways. This does not mean, however, that the distribution of each pronunciation is equal. As is well known, high frequency words are more likely to have irregular pronunciations. They are also the elements with which compound words are most often formed. In order to assess the actual impact of high-frequency words, a list of those written words which both have irregular pronunciations and appear

262

MARK

ARONOFF

AND

ERIC

KOCH

within the first 1,068 most frequent written words (according to Francis & KuEera 1982: 465-476) was compiled. Next, these irregular, high-frequency written words were searched for within the twenty-four-thousand-word list as either words standing on their own or parts of words. Once encountered, these words were replaced with their actual pronunciation. A total of 338 words out of the list of 1,068 (31.7%) were found to have irregular pronunciations. Through search and replace, 3,127 syllables out of 64,847 total syllables in the word list (4.8%) were changed. This involved syllables in 1,933 words out of the 23,854 total words (8.1%). By definition, the pronunciation was accurate at a level of 100.0% (as the words were replaced with their actual pronunciation). Thus, while memorizing whole words is by definition a perfect guide to pronunciation, its overall impact seems to be small, even for the most frequent words. Since irregularity is closely related to frequency, enlarging the word list to include a greater number of less frequent words should only decrease the percentage of irregular pronunciation. We conclude that irregular whole words are less important in English spelling than has sometimes been thought.
Morpheme spelling

The English language contains a great number of latinate prefixes and suffixes. We sought in our analysis to gauge the regularity of the spelling of these affixes. Because the word list is in alphabetical order, examining the effect of prefixes was quite easy; however, in order to examine the suffixes, the word list had to be arranged in reverse alphabetical order. After this was done, a comprehensive list of prefixes and suffixes was compiled. The word list was then broken down into several smaller lists based solely upon the initial letters for prefixes or the final letters for suffixes. Thus, any word which began with (ab) was treated as a word which began with the &-prefix, despite the existence of words which begin with (ab) but not with the ab- prefix (e.g., ab~cus).~ Still, the prefixes and suffixes were more consistently pronounced in 116 out of 138 cases (84.1%) than if syllable-type analysis had been used. When written prefixes and suffixes were replaced with pronunciation, they yielded an accuracy of 84.7%. The accuracy was disproportionately slanted towards the suffixes, which were 92.0% accurate, over the prefixes, which were only 64.4% accurate. Not only was the accuracy of the affixes quite high, but the overall scope of the affixes was also very impressive: 20,960 out of 64,847 syllables (32.3%) in the entire list and 14,522 out of 23,854 words (60.9%) were affected by this crude method of morphological analysis. Again, the suffixes were more impressive in their scope than the prefixes: 15,421 of the affected syllables (73.6%) were affected by suffixes while only 5,539 syllables (26.4%) were affected by prefixes. The consistent pronunciation of both prefixes and suffixes seems to contribute greatly to the readability of English; with a 92.0% accuracy and a scope which is-23.8% of the entire word list, suffixes seem especially important tools in reading English.

ENGLISH

VOWEL

SPELLING

263

Syllable-type

analysis

The criteria used in identifying written syllable types are easy to apply; together they yield sixty-two types, each with its own pronunciation value, as shown in Tables 2 through 4 (28 digraph syllable types; 10 digraph +r syllable types; and 24 single vowel syllable types). The clear advantage to the syllable-type method of analysis is that all of the vowel letters in the 23,854 word list were encompassed by this one method alone. When compared to actual pronunciation, syllable-type analysis produced 42,079 correct syllables out of 64,854 (64.9%).
All four methods of analysis in succession

Instead of returning to the original word list after each method of analysis, the methods were applied in succession, starting with the method that affected the largest units and proceeding in order of the diminishing size and specificity of the affected units. Whole-word analysis of frequent words was done first; morphological analysis was applied to the results of the whole-word analysis; riming analysis was applied to the combined results of the wholeword and morphological analyses; and syllable-type analysis was applied to the results of the three others. The first interesting observation is that, after the whole-word and morphological analyses were completed, individual rime spelling only affected 469 syllables (0.7% of all syllables) as opposed to the 2,254 syllables (3.5% of all syllables) that were affected when individual rime spelling analysis was done on its own. Morphological analysis still accounted for 32.3% of all syllables and whole-word analysis of frequent words still accounted for 4.8% of all syllables. After morphological, whole-word and rime spelling analyses were done, syllable-type analysis transformed the remaining 40,289 syllables (62.1%). The aggregate of all four analytical methods generated 43,455 correct vowel pronunciations out of 64,845 total vowels or vowel digraphs (70.5%). This is not a large improvement (8.6%) over using the syllable-type method of analysis by itself.

CONCLUSION

We set out to find out whether individual rime spellings in English are regular in their pronunciation by performing a computational analysis of a list of 23,854 written words paired with their pronunciations. This analysis revealed that only a small number of individual rime spellings are indeed regular in their pronunciations, where regular is defined as pronounced more than half the time in a way different from the otherwise expected modal value of the vowel in syllables of that general type. However, the analysis showed that analysis by written syllable type is the most reliable key to English pronun-

264

MARK

ARONOFF

AND

ERIC

KOCH

ciation. This suggests that methods of reading instruction that depend on this method of analysis reflect well the basic structure of English spelling. But we must emphasize that the dichotomy of long and short vowel letter pronunciations (the traditional analogue of what we are calling syllable type) is based on the division of words into written syllables, a fact which is often overlooked. Our findings thus support the general notion that English spelling is based at least in part on syllable structure (Treiman 1992). In addition, our analysis showed that prefixes and suffixes provide reliable clues to pronunciation and suggests that their regularity should be exploited in the teaching of reading.

ACKNOWLEDGEMENTS

This work was supported by a grant from the Spencer Foundation to The Research Foundation of the State University of New York. Thanks to Frank Anshen, Judith Klavans, and Mark Liberman for assistance. Thanks to Rebecca Treiman for comments on an earlier version.

NOTES I. We distinguish letters, written words, and sounds as follows: we write letters or sequences of letters within angled brackets: we italicize written words; and we put sounds between the slash marks that linguists traditionally use for phonemes. So, the written word love ends in (e) and corresponds to the spoken word /lav/. We use the symbol # to stand for the beginning or end of a word. For example, words ending in (e) are represented by means of the formula (e)#. 2. The only exceptions are compounds: blackguard (in which (ck) is silent) and words like handgun and headgear, where the (dg) sequence spans two members of the compound. 3. French is the only other language whose writing system so systematically distinguishes homonyms (Aronoff 1994). 4. In this abbreviation, C stands for one or more consonants. 5. The original database contained standard British pronunciation. We have modified this in the direction of American English. 6. This algorithm will lead to anomalies in a small number of cases involving (ph) and (th). These are usually digraphs, but sometimes, as in shepherd or rathole, they are not. There are too few of the latter type to disturb our analysis. Similarly, we treat all sequences of vowel letters as digraphs, but again, there are cases like rearm, where they are not pronounced as such. I. We set the threshold at half for several reasons. First, from a pedagogical point of view, if we were to count lesser generalizations, we would be faced with disjunctive statements like the following: in most of the words of the form (V,C,), (V,) corresponds to /VJ, but in a smaller fraction of such words, (V,) corresponds to /V,/. In some cases, there would be more than two correspondences; furthermore, we would have to select a particular threshold value for these lesser generalizations. The pedagogical difficulties are obvious: learners would be asked to master several patterns for a given rime spelling. For example, Stanback shows that (0) in (0th) is pronounced thirteen times as /o/ in her data; however, in the same data it is pronounced fourteen times as /a/, the modal pronunciation for (0) in closed

ENGLISH

VOWEL

SPELLING

265

syllables, and three times as low/. The putative rime spelling pronunciation in this example would thus be beneficial less than thirty percent of the time. If we wanted to apply this observation about the pronunciation of (0th). we would have to tell learners that there are two common ways to pronounce this sequence. By contrast, the fifty percent threshold focuses on those rime spelling pronunciations that are true most of the time. Our second rationale is scientific: we want to know how English spelling works, in particular whether individual rime spellings are part of the system. By setting the threshold so high, we greatly reduce the risk of finding something that is not really there. Pedagogy and science reinforce each other in this case, which is a rare occurrence. 8. This method does not constitute real morphological analysis, but it is not realistic to expect the average reader, especially a child, to have enough conscious comprehension of English morphology to be able to parse words into their latinate prefixes and suffixes, so that this crude method may be more realistic for our purposes than a more sophisticated one. The method can in fact only diminish the results of our analysis, because it catches in its net a fairly large number of invalid cases.

REFERENCES Aronoff, M. (1994). Spelling as culture. In: W. C. Watt (ed.), Writing systems and cognition (pp. 67-86). Dordrecht: Kluwer Academic Publishers. analysis ofEnglish usage: Lexicon and grammar. Francis, W. N. & Kucera, H. (1982). Frequency Boston: Houghton Mifflin. skills and learning to read. Hillsdale, NJ: Goswami, U. & Bryant, P. (1990). Phonological Erlbaum. Huey, E. B. ( 190811968). The psychology and pedagogy of reading. Cam bridge, MA: MIT Press. Kenyon, J. S. (1934). Rules for the syllabic division of words in writing or print. In: Websters New International Dicfionary, 2nd ed. (pp. Iviii-lix). Springfield, MA: Merriam-Webster. Savin, H. B. (1972). What the child knows about speech when he begins to read. In: J. F. Kavanagh & I. Mattingly (eds.), Language by eye and by ear (pp. 319-328). Cambridge, MA: MIT Press. Stanback, M. L. (1991). Syllabic and rime patterns for teaching reading: Analysis of a frequencybased vocabulary of 17,602 words. Dissertation. Teachers College, Columbia University. Stanback, M. L. (1992). Syllable and rime patterns for teaching reading: Analysis of a frequencybased vocabulary of 17,602 words, Annals of Dyslexia 42: 196-221. Treiman, R. (1992). The role of intrasyllabic units in learning to read and spell. In: P. Gough, L. Ehri & R. Treiman (eds.), Reading acquisition (pp. 65-106). Hillsdale, NJ: Erlbaum. Venezky, R. L. (1970). The sfructure of English orthography. The Hague: Mouton.
Address for correspondence: Dr Mark Aronoff, Department of Linguistics, State University of New York at Stony Brook, Stony Brook, NY 11794-4376, USA Phone: (516) 632 7777; Fax: (516) 632 9789; E-mail: mark.aronoff@sunysb.edu

Você também pode gostar