Obk1i Fundamentals of Formulaic Language An Introduction

Fundamentals of
Formulaic Language
ALSO AVAILABLE FROM BLOOMSBURY

Formulaic Language and Second Language Speech Fluency, David Wood
Language in Education, Rita Elaine Silver and Soe Marlar Lwin
Linguistics: An Introduction, Second Edition, William B. McGregor
Perspectives on Formulaic Language, edited by David Wood
Research Methods in Applied Linguistics, edited by Brian Paltridge
and Aek Phakiti
Why Do Linguistics?, Fiona English and Tim Marr
Fundamentals of
Formulaic Language
An Introduction
DAVID WOOD
Bloomsbury Academic
An imprint of Bloomsbury Publishing Plc
LON DON OX F O R D N E W YO R K N E W D E L H I SY DN EY
Bloomsbury Academic
An imprint of Bloomsbury Publishing Plc
50 Bedford Square
London
WC1B 3DP
UK
1385 Broadway
New York
NY 10018
USA
www.bloomsbury.com
BLOOMSBURY and the Diana logo are trademarks of Bloomsbury Publishing Plc
First published 2015
David Wood, 2015
David Wood has asserted his right under the Copyright, Designs and Patents Act,
1988, to be identified as the Author of this work.
All rights reserved. No part of this publication may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage or retrieval
system, without prior permission in writing from the publishers.
No responsibility for loss caused to any individual or organization acting on or
refraining from action as a result of the material in this publication can be accepted by
Bloomsbury or the author.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
ISBN: HB: 978-0-5671-8641-6

PB: 978-0-5672-7898-2
ePDF: 978-0-5673-3217-2
ePub: 978-0-5672-7777-0
Library of Congress Cataloging-in-Publication Data
Wood, David (David Claude), 1957Fundamentals of formulaic language : an introduction / David Wood.
pages cm
Includes bibliographical references and index.
ISBN 978-0-567-18641-6 (hb) ISBN 978-0-567-33217-2 (epdf)
ISBN 978-0-567-27777-0 (epub) 1. Linguistic analysis (Linguistics) 2. Linguistic models.
3. Discourse analysis. 4. Language acquisition. 5. Applied linguistics.
6. Psycholinguistics. I. Title.
P126.W66 2015
410dc23
2015014502
Typeset by Integra Software Services Pvt. Ltd.
Contents
Preface vi
1 Formulaic Language Research in a Historical Perspective

2
3
4
5
6
7
8
9
10
Across Decades and Continents 1

Identifying Formulaic LanguageFrequency,
Psychological Representation, and Judgment 19
Categories of Formulaic Language
Labels and Characteristics 35
Mental Processing of Formulaic LanguageHolistic
and Automatized 53
Formulaic Language and AcquisitionFirst and
Second Language 67
Formulaic Language and Spoken LanguageFluency and
Pragmatic Competence 81
Formulaic Language and Written LanguageAcademic
Discourse in Focus 101
Lexical BundlesCorpora, Frequency, and Functions 121
Formulaic Language and Language Teaching
Research and Practice 139
Current and Future Directions in Formulaic
Language ResearchGaps and Pathways 159
References
Index 191
173
Preface
M
y first encounters with formulaic language date back to the mid 1990s
during my time as a teacher of English as a Second Language (ESL)
and English for Academic Purposes (EAP) at a large university. I became
particularly intrigued with the teaching of spoken language and the challenges
presented for second language learners by the real-time, ephemeral nature
of speech. In looking around for resources and background knowledge
to help, I found myself encountering the term fluency very often in the
literature. I began to look for the underlying psycholinguistic mechanisms
and the research on the nature of fluency and found some reference to
the role of formulaic language. Some of the papers I read alluded to the
notion that formulaic language might play some role in facilitating fluency
speech, or that formulaic language might be a fundamental aspect of spoken
communication in several ways. This elusive phenomenon came to haunt my
dreams for some years to come, as I soon thereafter embarked on doctoral
studies with a goal of attempting to measure or examine the relationship
between formulaic language and fluent speech.
I am still fascinated by the study of formulaic language, and have since
examined it from several other perspectives, including pedagogical and corpusbased. I have seen my students in graduate programs become interested in
formulaic language too and seen them set out to study formulaic language
from various perspectivesin academic writing, in the speech of autistic
children, in textbooks, in the discourse of official meetings, and more. I have
taught seminars on the topic and supervised a range of master of arts and
doctoral projects. Throughout all of this, I have seen students struggle with
the sheer volume and range of literature. The multidisciplinary nature of the
field means they the need to quickly grasp concepts from areas as diverse as
psycholinguistics, vocabulary research, and discourse analysis, to name but a
few. To add to the burden, it became painfully clear early on that much of the
written work in the area is not particularly reader-friendly, especially for those
new to the field. The combination of complex concepts, diverse sources, and
opaque prose has made the establishment of a foundation in this area an
uphill climb indeed.
PREFACE
vii
This has led me to take on the task of creating an overview of the area
for newcomers. The present volume is meant to be a resource for new
researchers first and foremost, but may also be a reference source for
established scholars. The content in this book is not mine, it is a distillation
of the work of many others from across the decades. It is taken from a wide
range of sources, from original research reports, review and summative stateof-the-art papers, edited collections, and so on.
The content of this book is not meant to be a complete presentation of
every bit of research conducted to date, it is meant instead to be a start,
a place for readers to get a sense of what exists, and then to go further
beyond this book as needed. Some parts of the book go more deeply into the
literature than others, partly due to space constraints, partly due to my own
perceptions of what needs to be foregrounded.
I encourage those who use this book as a teaching resource to bear that in
mind and to point out to students what more is needed in any area. The book
ranges widely, stops to scrutinize, and at the same time dances across many
areas of study. This is inevitable for an overview like this, crafted by a single
author. Please feel free to pick, choose, adapt, adjust, dismiss, embrace, or
elaborate on anything you find herein.
I hope this book will be a support for teachers and students in this area. I
fervently wish for you to be inspired to take some risks in research, to push
some boundaries in the area, and to go ahead and create new knowledge.
I must give thanks to those whose work was so useful in creating this
book:
To my students Randy Appel, Ridha Ben Rejeb, Lina Al Hassan, Alisa
Zavialova, Olga Makinina, Lin Chen, and Joelle Doucet. Special thanks
to Joshua Romancio for editing assistance.
And others from whom I have been inspired.
Ottawa
February, 2015
1
Formulaic Language Research
in a Historical Perspective
Across Decades and
Continents
ome years ago I had two interesting experiences with students of English
as a second language, which sparked in me an early interest in the formulaic
nature of language. A clever student whose first language was Spanish came
to me during a break in class and asked Teacher, what means festival?
I was a bit puzzled by the question, since the recent lessons had had no
content related to this, but I manfully attempted to explain the word as best as
I could. It was the students turn to be puzzled, as he tried to fit my definitions
with what he was trying to understand. After some struggle, he interrupted
me to ask Why do you say this word at the start? It slowly dawned on
me that what he had heard as festival was in fact first of all, which I did indeed
tend to use at the start of lessons and to give instructions. It surprised me
that he would interpret a three-word sequence as a single word, but I put it
down to some confusion about phonology and a general lack of vocabulary
knowledge. Another incident occurred with a very active and alert student
from Cambodia, who arrived in my beginner class quite late in the course,
with limited to no English proficiency whatsoever. She bravely plunged into
the job of becoming a member of the group and learning what she could. Her
English output after several lessons began with I no stan, whenever anything
was addressed to her in English or if she had to participate in anything by
speaking. Later, this was modified to I don no stan, and later still it became a
closer replication of the sequence I dont understand. I noticed this at the time
as an amusing example of a student mustering one resource to cope with a
FUNDAMENTALS OF FORMULAIC LANGUAGE
really challenging situation. It also appeared odd to me that she had taken a
three-word sequence and interpreted it as a single word. Later, in hindsight,
this story and the festival incident came to represent to me the power of
formulaic language and a glimpse of the process of language acquisition and
the importance of formulaic language in it.
To begin any discussion of formulaic language, it is important to establish
some foundations and establish the terminology that is used to refer to it, and
to look at a definition or definitions. Formulaic sequence is generally used to
refer to one such item, formulaic language is the uncountable noun referring
to these items as a collective, and phraseology is a term often used to refer to
the study of formulaic language. As we will see later, phraseology also does
double duty as a specific term for a particular type of analysis of formulaic
language.
These days, formulaic language is a language phenomenon that is quite well
known among researchers and students in linguistics and applied linguistics.
Articles with a focus on formulaic language are appearing in an expanding
range of journals at an ever more frequent rate, papers on topics related to
formulaic language are being presented at congresses around the world, and
graduate theses on this theme are appearing everywhere. It is remarkable
that all of this is occurring in the absence of a journal devoted to formulaic
language, and with only one widely attended recurring conference, that of
the Formulaic Language Research Network (FLaRN), which has been held at
various locations in Europe every second year since 2004. The Yearbook of
Phraseology, a creation of Europhras, a European organization dedicated to
the study of phraseology, and published by Mouton de Gruyter, is the only
periodical publication currently in existence that concerns itself with formulaic
language research. The real source of information about formulaic language
has been a range of books, both edited collections and monographs, about it
over the past fifteen to twenty years: Sinclairs (1991) Corpus, Concordance,
Collocation was a landmark, Nattinger and DeCarricos (1992) Lexical Phrases
and Language Teaching; Cowies (1998) Phraseology: Theory, Analysis,
and Applications; Wrays (2002) Formulaic Language and the Lexicon and
Formulaic Language: Pushing the Boundaries (2008); Allerton, Nesselhauf, and
Skanderas (2004) Phraseological Units: Basic Concepts and Their Applications;
Schmitts (2004) Formulaic Sequences: Acquisition, Processing and Use;
Granger and Meuniers (2008) Phraseology: An Interdisciplinary Perspective;
and Woods (2010b) Perspectives on Formulaic Language. This list is by no
means exhaustive, but gives a taste of the range and quantity of formulaic
language-focused work that exists on library shelves around the world. So what
is the attraction to formulaic language despite the relative lack of a coherent
set of venues in which to present or find research and information about it?
It has become apparent over the years that formulaic language is, despite its
FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE
marginalization in classic generative linguistic theory, a fundamental aspect

of language and communication. Formulaic language is as essential to us as
words or grammar. Virtually every aspect of communication and language is
linked to it: pragmatics; discourse; fluency in speech and writing; first and
second language acquisition; and cognitive processing of language.
So what is formulaic language? What do the following word sequences
have in common?
Good morning
Look up
On the other hand
Dont let him take you for a ride
By and large
Haste makes waste
At top speed
Speed limit
Camera speed
Computer desk
In the case of
Up to date
We can certainly identify these sequences as fairly common in English,
but they do have a certain common element which is a bit elusive at first
glance. Some of them appear to have a specific meaning or function unto
themselves, as if a single wordfor example, good morning, on the other
hand, computer desk. Some of them seem to be preferred ways of expressing
or identifying something, be it a concrete or an abstract thingfor example,
haste makes waste, at top speed, in the case of. Some of them look a bit
mysterious at close examination, and we wonder how they came to be used
as they arehow did we ever come to agree on the use of strange items
such as by and large or look up? The general consensus on a definition of
formulaic language seems to be that the items will be:
1 Multi word
2 Have a single meaning or function
3 Be prefabricated or stored and retrieved mentally as if a single word
The third criterion is still under quite a bit of scrutiny, particularly by

psycholinguists, and Chapters 4 and 5 in this book will introduce you to that
research.
The research on formulaic language has generally tended to fall into
two broad categories in terms of research methodology. The phraseological
methods are top down methods that look to classify formulaic language using
certain criteria such as their semantic or syntactic composition. Phraseologists
will tend to examine lists of selected formulaic sequences in order to do this,
or they will examine texts and isolate formulaic sequences using certain
criteria such as syntactic or semantic characteristics. On the other hand, the
distributional methods of research into formulaic language are more bottom
up in nature. These types of research use corpus analysis and frequency cut
offs to identify formulaic sequences, often classifying them according to
discourse function.
Not terribly long ago there was little research being conducted on formulaic
language and the research that did exist was anything but mainstream or
recognized. The actual turning point was probably around 1970, when some
structural linguists actually began to pay some attention to formulaic language.
This helped to mainstream the research to a certain extent, as the work on
this aspect of language had, up to that point, been conducted by people from
a diverse range of areas of interest, from literary studies and anthropologists
and educational psychologists to neurologists and experimental psychologists,
to language teaching methodologists and lexicographers. Linguists began to
establish their own schools of inquiry during the 1970s, and the 1980s through
to the present have seen a remarkable expansion of effort.
Early research
Lacking the technology to perform extensive corpus research, and hampered
by the scattered nature of linguistic knowledge, researchers before the 1970s
paid scant attention to formulaic language. However, there were pockets of
work being conducted in diverse fields outside of linguistics proper.
Collocation researchers
Early work in the area of collocations was initiated by Firth (1951, 1957) in the
1950s, although the actual term itself had been around much longer. Firths
basic definition of collocation was the co-occurrence of words in proximity,
with several possible types of variation. One type is the habitual collocation,
in which words occur together quite frequently. Firth uses the example of
silly ass, a popular colloquial pejorative label at that time, as an example

of a habitual collocation. Another type of collocation discussed by Firth is
the idiosyncratic collocation, a co-occurrence of words that relatively rarely
happens and yet has a function. Firth points to some word combinations
from literature as examples of this, such as sleek supple soul from a poem
by Swinburne (see Nesselhauf, 2004 for more). Firth further complicates
matters by sometimes referring to noncontiguous words as collocations,
such as dark and night occurring in a sentence separated by other words. It is
rather unclear from Firths work how distantly separated words can be before
the collocational bond is broken.
This overall approach to collocations was, of course, later developed by
other researchers. These included Halliday, Mitchell and Greenbaum, Sinclair,
and Kjellmer. Halliday extended and refined the definition to specify that a
collocation is a function of the frequency of a word appearing in a certain
lexical context as compared to its frequency in language as a whole.
Research traditions in formulaic language

Pawley (2007) outlines eight research traditions that laid the groundwork for
much of what came later.
Literary scholars working on epic sung poetry

One of the first to turn attention to formulaic language was Parry (1928,
1930, 1932), in the 1920s and 1930s. He examined formulaic language in
the works of Homer, and later turned attention to the South Slavic tradition
of public performance of epic poems. The performers and composers of
these very lengthy poems were not literate, and Parry and Lord noted that
formulas in such poems were often serving two functions simultaneously
they allowed the performance to be more fluent, rhythmic, and smooth,
while at the same time allowing for some creative variations. Many formulas
in the epic poems showed a degree of lexical substitution, allowing them
to represent a particular meaning with a different number of syllables and
fitting with a variety of metrical patterns. For a comprehensive look at this
body of work, see Lord (1960).
Anthropologists and folklorists

Given the nature of the early examinations of formulaic language, focused
on epic poetry and oral traditions in various cultures, it is not surprising to
find that anthropologists were among those to pay attention to this language
phenomenon. Their work covered a range of types of spoken language from

everyday speech to magical incantations to child language play. Hymes
(1962) was a trailblazer in the application of anthropological research
methods to work in linguistics. His work on what he called the ethnography
of speaking focused on performance routines and recurrent patterns in
everyday speech, which stimulated research from linguistic anthropologists.
Prior to this, however, others investigated particular spoken genres and
found evidence of formulaic language playing a strong role. For example,
Malinowski, in 1935, noted that the Trobriand islanders used fixed formulaic
language in their magical incantations, designed to control invisible spirits.
Similarly, Opie and Opie (1959) showed that the game chants and sayings
and rhymes of six- to ten-year-old children were packed with formulaic
language.
Philosophers and sociologists

During the 1960s, the study of everyday communication grew to focus on
the use of routine utterances to accomplish speech acts. Goffman (1971) and
ethnomethodologists work led to the emergence of conversation analysis
as a discipline within linguistics. Goffman pointed out the fact that many
conversational moves are accomplished by means of conventional word
usage, an early type of formulaic language research. In addition, Austin (1962)
and Searle (1968) focused on speech acts and discourse functions and the
ways in which these take the form of expressions or formulas.
Neurologists and neuropsychologists

Work on brain structure dating back to Broca in the 1860s showed that the
left hemisphere of the brain is key to spoken expression. Certain types of
aphasia resulting from damage to Brocas area of the left hemisphere reduced
or eliminated the ability to use propositional speech, but left the sufferers with
the ability to use familiar expressions.
Learning psychologists
Goldman-Eisler in the late 1960s, with her book Psycholinguistics: Experiments
in Spontaneous Speech (1968), was among the first to discover that fluent
speech in particular is characterized by patterns of temporal variables such
as pause phenomena and length of runs of speech. This was the start of a
tradition of research, with a psycholinguistic flavor, into speech fluency. The
researchers discovered that patterns of fluent speech hint at a possible role for
formulaic language, as it has become apparent that fluent speakers have many
more automatized chunks of language to use while speaking spontaneously.

This helps them to save mental effort so as to conceptualize and formulate
the next stretch of discourse, simultaneous to maintaining a certain pace and
rhythm of speech. It has been discovered that skillfully blending formulaic
sequences with newly assembled strings of words is probably an important
factor in the ability of proficient speakers to produce the longer runs between
pauses that characterize fluency.
For sure, the nature of fluent speech is distinctive. As Chafe (1980) notes,
speech is produced in bursts, in which vocalization is broken up by pauses at
junctures of meaning and syntax. This type of skill requires an ability to juggle
plans that may compete for mental attention and cause a sort of traffic jam
of speech conceptualization, formulation, and utterance. When such a jam
occurs, one produces speech which is disfluent, marked by slow speed,
pauses at mid-clause, sentence, or phrase, and brief, incomplete or simplified
runs between pauses. Rehbein (1987, p. 104) proposes that one may
propose that fluency in a second language requires the capability of handling
routinized complex speaking plans. The plans to which he refers need to
be stored in long-term memory so as to be easily retrieved and produced
as speech. To complicate matters further, one must generate new words
and constructions to encode novel elements simultaneous to producing the
automatized sequences. See Chapter 6 in this book as well as Wood (2010a)
for an extensive review of this research.
Grammarians
Early grammarians hinted at the importance of formulaic language
Jesperson (1924) examined the phenomena of free and fixed expressions,
and the structure of idioms was a focus of the works of Chafe (1968)
and Fraser (1970). Work on phrasal dictionaries by Hornby, Gatenby, and
Wakefield (1942) and Palmer (1938) influenced how phrasal units were
handled in later works. Meanwhile, in Eastern Europe phraseology was
taking off as a legitimate area of study in its own right. Researchers including
Amosova (1963), Melcuk (1988), and Vinogradov (1947) compiled lists of
idioms and collocations and classified them. Some examples include
pure idioms, expressions with literal meanings totally divorced from their
idiomatic meanings, for example, chew the fat and beat around the bush.
Figurative idioms are expressions in which the figurative meaning is an
obvious derivation from the literal meaning, for example, hold water or steal
ones heart. Restricted collocations include word combinations in which the
interpretation of one word is dependent on its relationship with the other, for
example, pay a visit or meet ones needs.
Since 1970s
The 1970s represent a turning point in formulaic language research, with a
number of linguists pursuing research in the area, and some major areas of
research came to be defined. Lexicographers began to assemble information
about multiword chunks; research on speech acts and pragmatics grew. A
major event was a course taught by Charles and Lily Wong Fillmore in 1977 at
the 1977 Linguistic Institute, and Coulmas (1981) and Krashen and Scarcella
(1978) published a review article and an edited collection of papers. Pawley
and Syder in 1983 published a landmark paper pointing out that formulaic
language is likely key to second language fluency and nativelike selection
the tendency we have to use routine ways of expressing things, despite the
supposed infinite potential of language. For example, we say how are you?
rather than creative alternatives such as what is the nature of your current
well-being? The reasons for this, according to Pawley and Syder, have to do
with processing restrictions and the probability that we acquire language in
chunks and we store and retrieve word strings often as wholes from longterm memory to fit the meanings and functions that arise in communication.
In 1991, Sinclair posited the idiom and the open choice principle, a somewhat
similar idea, that most texts are largely composed of multiword expressions
that constitute single choices in the mental lexicon. Many new areas of focus
arose over the years.
Oral formulaic genres

A fascinating area of inquiry that has yielded some remarkable information
is the study of oral genres of production in traditional societies and in
specific areas of communication. Balkan epic poetry, the storytelling
traditions of peoples of New Guinea (Rumsey, 2001), and the rapid-fire
speech routines of auctioneers (Kuiper, 1996) are so formulaic that almost
every utterance is a formula. Kuiper and collaborators isolated some
distinct features of auctioneering language: strict discourse structure rules
of topics and sequencingfor example, in stock auctions (Kuiper & Haggo,
1984) first is the description of the lot, second is the search for a first
bid, third is a call for the bids, and fourth is the sale; a high concentration
of formulas; special grammatical rules for formulas; prosodic and musical
patterns; exceptional fluency, speed, and very few pauses within clauses.
Similar research by Pawley (1991) into cricket match commentary identified
it as a formulaic genre.
Identification
Issues began to develop around the actual identification of formulaic language
in texts and discourses. Some cases are clear such as true idioms, phrasal
verbs, nominal compounds, and so on (see Chapters 2 and 3 for detailed
discussions), but many gray areas persisted. For example, discontinuous
expressions are hard to identify, as fillable slots and two-part expressions
tend to blend into surrounding textfor example, not onlybut also. Pawley
(1986) elaborated a list of twenty-seven diagnostics, and Moon (1998), among
others, also presents lists of diagnostic criteria.
Wray (2002) laid out a set of criteria for determining if multiword
combinations might be prefabricated. Structure or form of the sequence is
one such criterion, and it is often the case that strings begin with conjunctions,
articles, pronouns, prepositions, or discourse markers (p. 31). Compositionality
or internal structure of strings is also important, as Wray observes that the
string is no longer obliged to be grammatically regular or semantically logical
(p. 33). Fixedness, or the tendency for prefabricated sequences to be of
invariable form, is another such criterion, although Wray does allow that a large
subset of formulaic sequences often have fillable slots (p. 34). Other criteria
relate to phonological or prosodic aspects of the articulation of a sequence,
including intonation contour and speed of articulation, and fluency criteria
such as lack of internal pausing (p. 35). For spoken language in particular,
an important point of Wrays to bear in mind is that it may simply be that
identification cannot be based on a single criterion, but rather needs to draw
on a suite of features (p. 43).
Somewhat later, Wray (2008) came to emphasize that the processing
of formulaic sequences as wholes likely results from the ways acquisition
processes operate with respect to input. She notes that much language
input in first language acquisition is left unanalyzed unless necessary, a
phenomenon she terms needs only analysis, or NOA (p. 17). If there is a
strong form-meaning link with a particular string, for example, How do you
do, as a standard greeting among previously unacquainted adults, with no
variation, then the string will remain unanalyzed. Over the course of first
language acquisition, acquirers may note some variation in such strings,
such as lexical insertion (e.g., Have you seen my boots/shirt/watch?, or Id
like a Coke/cheeseburger/3-month plan), but analysis will likely stop at the
recognition of the existence of a fillable slot and the possible word types
that may fill the slot. For adult second language learners, this process
may be much less frequent or slower, since the tendency of adults and
language programs is to analyze second language input for patterns, not
10
to mention the fact that second language learners receive greatly reduced
input compared to children in a first language.
Some researchers have linked formulaic sequences to the lexicogrammar
(Tucker, 2005) and to systemic models of functional grammar (Butler, 2003).
These researchers have noted that formulaic sequences have a place in
models of language that prioritize the lexicogrammar and levels of structure
related to speech act realizations. They acknowledge the role of formulaic
sequences in integrating extraclausal or partially clausal expressions into
functional grammars of discourse.
One of the best checklists to aid in identifying formulaic language is that of
Wray and Namba (2003) (see Chapter 2 for details).
Classification
There are many categories of formulaic language, including collocations,
idioms, phrasal verbs, lexical phrases, lexical bundles, and so on (see Chapter
3 for a detailed discussion). For formulaic sequences with pragmatic functions,
Pawley (2007) outlines seven identifying criteria:
1 Segmental phonology
2 Music, that is, intonation, rhythm, and stress of production
3 Grammatical category
4 Grammatical structure
5 Idiomaticity constraints
6 Literal meaning pragmatic function
7 Accompanying body language
Nattinger and DeCarrico (1992) identified a subset of formulaic language

with pragmatically specialized functions and meanings, which they labeled as
lexical phrases. They classified the phrases into two large categories: strings
of specific lexical items and generalized frames. The former are mostly
unitary lexical strings and may or may not be canonical in the grammar,
while the latter consist of category symbols and specific lexical items. In
addition, four criteria further classify the phrases: length and grammatical
status; canonical or noncanonical shape; variability or fixedness; whether
it is a continuous, unbroken string of words or discontinuous, allowing
lexical insertions (pp. 37, 38). Nattinger and DeCarrico also identify four
large categories of lexical phrases that display aspects of the four criteria:
polywords, which operate as single words, allowing no variability or lexical
11
insertions, and including two-word collocations (e.g., for the most part,
so far so good); institutionalized expressions, which are sentencelength, invariable, and mostly continuous (e.g., a watched pot never boils,
nice meeting you, long time no see); phrasal constraints, which allow
variations of lexical and phrase categories, and are mostly continuous
(e.g., a ___ ago, the ___er the ___er); sentence builders, which allow
construction of full sentences, with fillable slots (e.g., I think that X, not
only X but Y) (pp. 3845).
A more recent descriptive scheme for formulaic sequences is that of
Wray and Perkins (2000), in which they focus on semantic and syntactic
irregularities of the sequences. A vital aspect of formulaic sequences,
according to Wray and Perkins, is their semantic irregularity. They are not
composed semantically, but are holistic items, like idioms and metaphors.
Another key element of formulaic sequences is their syntactic irregularity,
which is manifest in two qualities: a restriction on manipulation, for example,
one cannot pluralize beat around the bush or passivize face the music or
say you slept a wink, or feeding you up; the fact that in formulaic language
normal restrictions are flouted, such as the sequences that contain an
intransitive verb + direct object, for example, go the whole hog or other
gross violations of syntactic laws like by & large.
Prevalence
Researchers have worked to identify what proportion of discourse in a
given genre or register is in fact formulaic. For example, Altenberg (1998) in
examining the London Lund corpus found that over 80 percent of words are in
formulaic sequences. A well known and often cited number is from the study
by Erman and Warren (2000), which found that 52 to 58 percent of texts in a
corpus were comprised of formulaic sequences.
Speech production and comprehension

A certain amount of research has focused on the links between fluent speech
production and formulaic language. Wood (2006, 2009a, 2009b, 2010a) has
examined the role of formulaic language in fluency speech with second
language learners, finding that it appears that increased use of formulaic
language facilitates improvements in speech fluency. Wood (2006) notes that
learners appear to use formulaic sequences to facilitate speech fluency by
relying on one sequence, repeating a particular sequence, stringing together
multiple sequences, and using them as self talk or rhetorical structuring
devices.
12
One of the earliest observations about the nature of spoken discourse was
that of Pawley and Syder (1983), who described the way clauses tend to be
chained. They noted that everyday fluent conversational speech is composed
largely of strings of more or less independent clauses, without much
grammatical integration. For example, subordination is only present in limited
amounts in spontaneous speech (Pawley & Syder, 1983, pp. 202204). Pawley
and Syder presented an analysis of two types of native-speaker production to
illustrate how fluency relates to clause-chaining. One speaker, George Davies,
produced speech in which fluent units were separate clauses:
/we had a /fan tastic time
[slows]
(1.1)
/there/were/ all kinds of re/lations /there/
[accel]
[slows ]
/I dun/no where they/all come /from/
[accel]
[slows]
I didnt know/alf o them
[accel]
(0.9)
and ahthe kids/sat on the floor
(0.2)
(1.5)
and ol/ Uncle Bert/he/ah
o/course /he was the life and soul of the party
[accel]
[slows]
/Uncle /Bert ad a /black bottle
[accel]
[slows] (1.5)
an ahed t/tell a/few stories
(0.2)
[accel] [slows]
an ed/take a /sip out of the/black bottle
[accel]
[slows]
n the/more sips he /took /outa / that bottle
[accel]
(1.0)
the worse the /stories got
(1.6)
(Pawley & Syder, 1983, p. 203)
Another speaker, Q., produced comparatively nonfluent speech, in a PhD
dissertation oral defense:
and it/seems to be
[accel]
if a /word is/fairly/high on the frequency /list/
13
[slow]
[accel]
I /havent /made /any count
[accel]
but/justim/pression isticallyum
[slow]
umthe /chances are
that you get acom /pound
[slow]
ora /notherphono /logically deviantform
[slow]
with ah/which is al/ready in other /words
[accel]
[slow]
/which is /fairly frequently the /same/phono /logical
[accel]
[slows]
shape
(Pawley & Syder, 1983, p. 201)
It is apparent that Q is planning only a few words at a time, unlike Davies. The
context and the content of the discourse is novel for him, and it is obvious
that he struggles with formulating and conceptualizing and articulating, due
to the considerable stress of the experience. To make matters even more
arduous, Q tries to use a clause-integrating strategy, in which each new
clause depends to some extent on the structure of the previous onefor
example, his false start or reformulation of the final clause in the sample,
beginning with with, repaired to begin with which. The genre, register,
and relative lack of interactivity of the speech production leave Q little choice
but to use this style of speech. On the other hand, Davies is speaking more
spontaneously and comfortably, and his speech shows clause-chaining of
independent clauses linked by conjunctions such as and, in most cases.
According to Pawley and Syder, this style is most effective in narrative
speech:
With the chaining style, a speaker can maintain grammatical and
semantic continuity because his clauses can be planned more or less
independently, and each major semantic unit, being only a single clause,
can be encoded and uttered without internal breakswe may speak,
then, of a one clause at a time facility as an essential constituent
of communicative competence in English: the speaker must be able
regularly to encode whole clauses in their full lexical detail, in a single
encoding operation and so avoid the need for mid-clause hesitations.
(Pawley & Syder, 1983, pp. 203, 204)
14
There is a clue here about one prime function of formulaic language in

spoken communication. The tendency to chain clauses in conversational and
narrative speech in English implies that a speaker should be able to encode
whole clauses and avoid hesitations in mid-clause. If we look at the temporal
variables most often associated with fluency, we find that they include
pauses at clause junctures and a certain length of speech runs between such
pauses. The means by which a speaker is able to maintain this pattern of
pausing has to do with the recall of most clauses as more or less intact,
and automatically chained. In other words, much of everyday speech is
formulaic. In fact, Pawley and Syder (1983, p. 205) suggest that memorized
chunks form a high proportion of the speech of everyday conversation. The
benefits of this are obvious: if speech is formulated and articulated word-forword, a speakers attention is freed to focus on rhythm, variety, combining
memorized chunks, or producing creative connections of lexical strings and
single words.
Acquisition
One early researcher in the area of formulaic language in child first language
acquisition is Lily Wong Fillmore (1976), who examined the language
development of six-year-old children. A later work by Peters (1983) elaborated
on childrens use of strategies to extract formulaic sequences from input and
retain them while at the same time breaking them down to build grammar
and lexical competence. Later, Wray and Perkins (2000) identified four stages
for childrens use of formulaic language in first language acquisition: a purely
holistic strategy whereby they extract multiword sequences from input
without analysis; analytic stage where grammar and lexical knowledge are
acquired; fusion of sequences and use of processing shortcuts; a balance
that favors holistic processing except where circumstances require analytic
processing.
The evidence for a role of formulaic language in adult second language
acquisition is less clear than that for children. Adults tend to take an analytic
approach to language learning and only under certain circumstances will they
acquire multiword sequences holistically.
Yorio (1980) was one of the early investigators of adult language
development and formulaic sequences. In an examination of studies of
instructed adult learners writing, he found that, unlike children, adult learners
do not appear to use formulaic language to any great extent and that when
they do, they seem not use it to develop overall language knowledge. Instead,
they appeared to use it more as a production strategy, to save effort and
attention in spontaneous communication.
15
Schmidt (1983) conducted a well-known case study of the English

language development of a Japanese adult in Hawaii and found that
formulaic sequences were an essential aspect of language gain. The
participant used a large and growing number and range of formulaic
sequences as a communication strategy, at the same time seeming to be
fossilized and grammatically inept. Schmidt found that the research subject
resisted error correction and was able to develop his language ability and
acculturate through using formulaic sequences. There was no evidence of
the processes of segmentation and analysis that Peters (1983) found in child
language acquisition.
Ellis (1996) asserts that much of language acquisition is really acquisition
of memorized sequences, and that short-term repetition and rehearsal permit
the development of long-term language ability. Long-term storage of frequent
language sequences permits a learner to more easily use them for meaning
reference, and they can be accessed more automatically. This allows for more
fluent language use, as attention is freed for dealing with conceptualizing and
meaning.
Similarly, Bolander (1989), studying learners of Swedish as a second
language, found that formulaic sequences contributed to ease of learning
and use. The participants in the study consistently used prefabricated
language units that contained target language structures well in advance of
demonstrating that they had actually acquired the structures themselves.
It appears that adults in naturalistic L2 learning environments, like children,
tend to acquire and use formulaic sequences. However, the established
cognitive and learning styles of adults, their diverse acquisition contexts,
knowledge of L1, and other factors make for more variety in the route of
language acquisition generally, and with regard to use of formulaic sequences
specifically. Some adults may be more analytic and seek to infer rules from
chunked units or from pieces of input, while others, such as Schmidts (1983)
subject, may rely heavily on acquired formulas and not attempt to break them
down or analyze them. Furthermore, degree of literacy and type and degree
of instruction may play a part.
Lexical bundles research

With the rise of corpus analysis tools and other technological aids to researching
formulaic language has come lexical bundle research (see Chapter 8 for a
detailed discussion).
In essence, lexical bundles are combinations of three or more words that
are identified in a corpus of natural language by means of corpus analysis
software programs. As well, lexical bundles occur across a range of texts
16
in a corpus, or, in the case of academic language, a range of disciplines.

Lexical bundles are quite frequently used in published academic writing such
as journal articles, and particular types of the bundles are characteristic of
particular disciplines (Cortes, Jones, & Stoller, 2002). It has become apparent
that acquisition and use of lexical bundles do not come naturally, but may
require focused instruction (Biber & Conrad, 1999).
An excellent piece of research in the area is a monograph written by Biber
(2006), which presents a comprehensive corpus-based analysis of university
language, containing a thorough examination of lexical bundles in textbooks.
Biber discovered that academic disciplines use lexical bundles differently,
with natural and social sciences using them more than the humanities.
Overall, the distribution of lexical bundles across functional categories in
Bibers study show that referential bundlesmaking direct reference to
real or abstract entities or to textual content or their attributesare the
most common. Stance bundlesexpressing attitudes or assessments of
certaintyare the second most common type of function for lexical bundles
in the textbooks, whereas discourse organizersreflecting relationships
between previous and subsequent discoursewere the least common.
Within the category of referential functions, it appears that quantity and
intangible framing subfunctions represent the largest categories.
In summary
From this short and dense overview of the research history of formulaic
language, some patterns and themes emerge. One image remains, however. It
still seems that we are working with something quite elusive about language.
Like the characters in the tale of the blind men and the elephant, we can only
feel for a certain aspect of the phenomenon at a time. Luckily, we can all pool
our impressions from these encounters with particular aspects and create a
fuller image through reading and researching over time.
A few of the many themes and patterns that the research shows are:
MM
Formulaic language is important in spoken and written language.
MM
Formulaic language is defined in certain ways.
MM
MM
MM
Formulaic language has been studied from a wide range of

research and disciplinary traditions.
Formulaic language study has only been synthesized and pulled
together over the past two decades or so.
There are still a wide range of questions about formulaic language.
17
For certain, all the questions have not been answered yet in any particular
area. How do we know whether a formulaic sequence is stored and retrieved
as a whole in spoken language? Do the basic assumptions about formulaic
language in the processing of spoken language also apply to written language,
to any extent? How valuable is it to elaborate lists of categories of formulaic
language?
POINTS TO PONDER AND THINGS TO DO

1 Think back to your assumptions and images of formulaic
language before you read this chapter. How have these changed
now that you have read and thought about it?
2 From the descriptions in this chapter, can you draw a timeline
and a mind map of the history of research on formulaic language?
3 Which of the various areas of investigation over time seem to have
been the most powerful in helping us understand formulaic language?
Why?
4 Can you imagine any areas of study that have not been covered in the
research traditions described in this chapter?
5 Based on what we have read here, where do you feel the most
important areas of investigation are likely to be in the coming years?
6 If you were to begin a plan of research in this area, what would you
focus on?
7 Does the study of formulaic language challenge any common
assumptions about the nature of language and how it is produced?
8 Choose a particular area of focus in the history formulaic language
research. Read the relevant sources. Write a short paper of five to
ten pages and share it with those who have focused on other areas.
If you compile these papers, you will have a fuller historical view of
what is presented in this chapter.
9 Based on what you have read here, imagine an area of investigation.
Can you envision a particular research method or methods to employ
in such an investigation?
10 Based on what you have read here, compose a short guide to formulaic
language for a particular type of language professional or student.
What are the implications of this area of research for students of
second languages? For parents and early childhood educators? For
writers and editors?
2
Identifying Formulaic
LanguageFrequency,
Psychological Representation,
and Judgment
aving an idea of what formulaic language is, at least in definitions elaborated

by scholars, and understanding some major categories of formulaic
language takes us to a certain point in dealing with it. However, the proverbial
or formulaic elephant in the room is bound to make his presence felt sooner or
later: how can one identify formulaic sequences in texts, spoken and/or written?
To make this issue seem more like an issue, take a look back at the first
two sentences in this chapter. Identify the formulaic sequences. It is not at
all easy, is it? A few strings jump out as being idiomatic in some way, for
example, elephant in the room, make its presence felt, sooner or later. But
can we be comfortable with this identification? What clues or features of the
word combinations helped us to make these decisions? What other formulaic
strings are lurking under the surface, invisible to untrained eyes, or accessible
only to digital corpus analysis tools? This particular concern is central to the
work of virtually all researchers in this field. After all, how can you present a
study of formulaic language from any source without a good indication of how
you isolate word strings which are formulaic? In the end, these are going to
be your units of analysis.
Take a look at the following list of word strings, taken from Nick Ellis (2012,
p. 27), and see if you can determine which are formulaic:
1 Put it in.
2 Put it in the fridge.
20
3 Polly put the kettle on.

4 Put the butter on the table.
5 Put that in your pipe and smoke it.
6 Put another nickel in the Nickelodeon.
7 Gabe cleared the music stands from the stage.
8 Why dont you kids ever clear the dishes from the table?
9 Boy, you gonna carry that weight, carry that weight a long time.
10 Dads spilled Digestive crumbs all over the kitchen floor again, typical!
Ultimately, you may simply resort to remarking that some are formulaic,
some are not, and some are more formulaic than others. But how can you
even make those decisions? You could look at how often they are used in a
particular context, study the prosodic features of the string (see Chapter 6),
maybe you would look in a corpus.
It is encouraging to know that a variety of means of identifying formulaic
sequences have been developed. However, the processes are in many cases
more inexact than we might expect. Some means of empirical measurementbased identification are discussed below, followed by a more detailed look at
criteria-based checklists, which rely on the decisions of judges as opposed to
measurement instruments.
Frequency and statistical measures

Formulaic sequences are generally recurrent. The lexical bundle approach, as
discussed in Chapter 8, uses this as the primary criterion. The general idea is
that a word string which is used often is likely formulaic. However, we would
agree that it is necessary for a string to be more than just frequent, it needs
to have a unitary meaning or function, and perhaps a particular way of being
mentally stored, retrieved, or produced as well.
Statistical identification of formulaic language in corpora is a foundation
of the frequency-based approach to formulaic language. In this approach to
identification of formulaic sequences, researchers set certain specifications
before scanning and analyzing a corpus. Generally, minimum lengths of word
combinations and minimum frequency cutoffs are determined, and then
the corpus is scanned and analyzed for word combinations that fit within
the parameters. Frequency cutoffs can range from 10 to 40 occurrences
IDENTIFYING FORMULAIC LANGUAGE
21
per million words (e.g., Biber, Johansson, Leech, Conrad, & Finegan, 1999;
Simpson-Vlach & Ellis, 2010). This approach often yields word combinations
which are not complete structural units (Cortes, 2004), and are generally
labeled as lexical bundles (e.g., Biber et al., 1999), or multiword constructions
(Liu, 2012; Wood & Appel, 2014). Some researchers, using this set of criteria
as only part of a more complex identification protocol, simply use the term
formulaic sequences (e.g., Simpson-Vlach & Ellis, 2010).
This frequency-based method is most appropriate for large corpora
of hundreds of thousands of words, if not millions, taken from specific
registers of language and/or academic disciplines. It has many limitations
for use with small data sets, as the standard minimum cutoffs for frequency
established in the field may not be met in such cases. It would be difficult,
for example, to use only frequency as a criterion for identifying formulaicity
in a set of transcribed conversations on a range of topics. Some items which
we might consider formulaic might arise only once or twice in such a data
set. Another drawback of use of frequency-based analysis is that it does not
give any information about the psycholinguistic validity of the formulas. This
particular issue arose in a study by Schmitt, Grandage, and Adolphs (2004),
who identified formulas from a corpus and presented them to subjects in
spoken dictation tasks designed to overtax short-term memory capacity.
After an analysis of the participants reconstructions of the dictations, it was
concluded that the holistic storage of the sequences, which were formulaic
according to frequency in the corpus, varied among participants (Schmitt,
Grandage, & Adolphs, 2004). A further limitation of using frequency alone
as a criterion for formulaicity is that additional steps are also required to
eliminate meaningless combinations of words for functional analyses of
formulaic language.
Sequences which are salient or readily recognizable as chunks, such as
on the other hand or how do you do, may or may not be frequent in any
particular corpus or genre, but they do have coherence in that they represent
elements which usually stick together in this order and which always have
a particular meaning or function. This tendency for words to stick together
can be measured statistically using measures of association such as mutual
information (MI), which determines how likely the items are to appear together
compared to chance. MI has no particular statistical significance cutoff and is
most useful for purposes of comparison. A higher MI score would indicate
a higher likelihood of co-occurrence, and taken together with frequency
measures, can provide objective evidence of formulaicity. Other measures of
the relative stickiness of word strings are also used, for example, in corpus
linguistics Gries (2008, 2012) is using the Fisher-Yates exact probability test to
help determine the degree of association between a word and a construction.
22
Studies are also triangulating data from various sources such as corpus
measures of association together with eye tracing and response latency
dataprocedures often referred to as psycholinguistic measures.
Often, researchers with small or quite specific corpora will refer to a large
general corpus such as the British National Corpus (BNC) or the Corpus
of Contemporary American English (COCA) for information about particular
word strings. For example, Wood and Namba (2013) identified formulaic
sequences of potential value for Japanese university students to perform
oral presentations. The sequences were all generated using native speaker/
proficient speaker intuition, and were then confirmed as formulaic with
reference to the spoken language subcorpus of the COCA at a frequency
cutoff of at least ten occurrences per million words and with a MI score of
at least 3.0 in the corpus (for an overview of MI see Schmitt, 2010). This
ensured that the sequences were frequent in spoken discourse and that
they were strings of items highly likely to stick togethertwo powerful
markers of formulaicity. Other researchers have used the hits generated
by online search engines such as Google to aid in determining what is
formulaic. Shei (2008) illustrated how no popularly available corpus seems
large enough to provide adequate instances of formulaic sequences for
close investigation. Shei proposes that researchers and teachers use the
Internet as a sort of vast corpus, employing a search engine like Google
to help identify and retrieve multiword units for linguistic research and
language teaching and learning. Simply Googling a particular word string
and examining the resulting hits can yield valuable information about its
frequency, form, variability, and functions.
Psycholinguistic measures
As seen in Chapter 4, a number of studies of formulaic language have been
carried out using measures of processing speed. Conklin and Schmitt (2012)
summarize a list of studies that have incorporated a variety of measurements,
including reaction times (e.g., Conklin & Schmitt, 2012), eye movement
(e.g., Underwood, Schmitt, & Galpin, 2004), and electrophysiological (ERP)
measures (e.g., Tremblay & Baayen, 2010).
These studies use eye tracking or response latencies involving reading.
While psycholinguistic measures are useful for determining which sequences
have been stored holistically by individual speakers, they provide us with
a partial view of the use of a sequencefor example, they do not usually
help us to know how common a given sequence may be in actual use in the
community, and the formulaic sequences identified in these ways may include
23
rare, unusual, or one-off sequences that the speaker has tended to use for a
variety of idiosyncratic reasons.
Phonological characteristics
Another measure used to identify formulaic sequences in spoken language can
be phonological coherence, discussed at some length in Chapter 6. Formulaic
sequences tend to be uttered with particular prosodic features such as
alignment with pauses and intonation units, resistance to internal dysfluency,
no internal hesitations, fast speech rhythm, and stress placement restrictions
(see Lin, 2010, 2012, for a discussion). Some cautions are important here: as
with psycholinguistic methods, phonological coherence provides a limited or
partial sense of formulaicity. For one thing, phonological coherence is limited
to analysis of spoken language only. As well, it only relates to formulas used
by a particular speaker, and analysis is limited by the quality of the audio data
recorded.
Criteria checklists and native speaker intuition

If the measures of frequency, psycholinguistic processing, or acoustic
analysis taken individually fail to provide satisfactory results in most cases,
what is a researcher to do? One thing the researcher can do is use criteria
checklists that combine characteristics typically associated with formulaic
language. These work especially well with spoken language samples or
corpora.
Wray (2002) reviews approaches to the issue of what constitutes a
formulaic sequence and how to detect formulaic sequences in corpora. She
notes that use of corpus analysis computer software is one possible method
of identification, but presents some serious concerns:
It seems, on the surface, entirely reasonable to use computer searches
to identify common strings of words, and to establish a certain frequency
threshold as the criterion for calling a string formulaic(however)
problems regarding the procedures of frequency counts can be identified.
Firstly, corpora are probably unable to capture the true distribution of
certain kinds of formulaic sequencesThe second serious problem is that
the tools used in corpus analysis are no more able to help decide where the
boundaries between formulaic sequences fall than native speaker judges
are (pp. 25, 27, 28).
24
It seems that use of computer corpus analysis software has certain

limitations. For one thing, the specific nature of the type of speech elicited in
some types of research and the relatively small word counts which make up
some corpora mean that frequency alone cannot be a satisfactory criterion
for identifying formulaic sequences. Some formulaic sequences may be used
only once or used idiosyncratically in such a situation. Wrays second concern
is even more worthy of attention. Many formulaic sequences tend to blend into
the linguistic context in transcripts, and many are frames or have larger fillable
slots, which present real challenges for corpus analysis software. As well,
if the participants in a study are second language learners, many formulaic
sequences may be nonstandard or idiosyncratic. In the end, it appears that
the best compromise is to employ with what Wray terms the application of
common sense (p. 28) in determining what is formulaic in corpora. This is
especially true for spoken corpora.
Native speaker judgment

We can examine second language performance to see how it conforms
to native speaker use of formulaic sequences. ODonnell, Romer, and Ellis
(2012), for example, look at this. Native speaker judgment is another possible
means of identifying formulaic sequences in a corpus. However, Wray (2002,
p. 23) identifies five weaknesses in this method:
1 It has to be restricted to smaller data sets.
2 Inconsistent judgment may occur due to fatigue or alterations in
judgment thresholds over time.

3 There may be variation between judges.
4 There may not be a single answer as to what to search for.
5 Application of intuition in such a way may occur at the expense of
knowledge we do not have at the surface level of awareness.

We have only to return to our two identification tasks at the beginning of this
chapter to see also how challenging it can be to start attempting to isolate
formulaic sequences from a text or a corpus without any guidelines. This is
where the idea of use of a checklist of specific criteria to guide judgments
comes into play. The procedure would then involve having judges study the
criteria which inform a checklist, and then go through a corpus to apply the
criteria to determine what is formulaic and what is not.
25
While some checklists have been developed for specific populations,

others are more general. Lets take a look at four checklists which are
well developed and have been used in various studies: an early checklist
elaborated by Coulmas (1979); a checklist used to identify formulaicity
in child language acquisition (Peters, 1983); a checklist used to identify
formulaicity in second language acquisition of speech fluency (Wood, 2006,
2009b, 2010); a checklist applicable to a range of child and adult native or
nonnative speakers (Wray & Namba, 2003).
Early list of criteria: Coulmas (1979)

Coulmas (1979, p. 32) outlines conditions which need to be met if a word
sequence is to be considered formulaic. Two conditions, that the unit must
be at least two morphemes long and cohere phonologically, are identified
as necessary for formulaicity. Utterances which are formulaic, then, are
polymorphemic and produced without internal hesitation or pausing. Coulmas
also specifies that a formula may be more grammatically advanced than
surrounding language, exhibiting a level of syntactic and phonetic complexity
beyond the norm for the language produced by the learner. Other criteria laid
out by Coulmas for formulaic sequences are that they are typically shared within
a community, situationally dependent, and repeatedly used in the same form:
1 at least two morphemes long (i.e., two words)
2 coheres phonologically
3 individual elements are not used concurrently in the same form
separately or in other environments

4 grammatically advanced compared to other language
5 community-wide formula
6 idiosyncratic chunk
7 repeatedly used in the same form
8 situationally dependent
9 may be used inappropriately
Formulaicity in child first language speech: Peters (1983)

Similarly, Peters (1983), in an effort to elaborate criteria for identifying formulas
in child first language, focuses on:
26
1 phonological coherence
2 greater length and complexity than other output
3 nonproductive use of rules underlying a sequence
4 situational dependence
5 frequency and invariance in form
Gradience of formulaicity: Wray and Namba (2003)

A sophisticated checklist which can be used for a range of applications is that of
Wray and Namba (2003). Originally designed for use in assigning formulaicity
to utterances of bilingual children, the checklist is remarkably comprehensive.
It consists of eleven criteria, each of which would be applied to the researchers
perception of formulaicity of a word string using a Likert Scale of 1 to 5. This
cleverly deals with the issue of gradience or ranges of formulaicity:
1 By my judgment, there is something grammatically unusual about this
word string.
2 By my judgment, part or all of the word string lacks semantic
transparency.
3 By my judgment, this word string is associated with a specific
situation and/or register.

4 By my judgment, the word string as a whole performs a function in
communication or discourse other than, or in addition to, conveying

the meaning of the words themselves.
5 By my judgment, this precise formulation is the one most commonly
used by this speaker/writer when conveying this idea.

6 By my judgment, the speaker/writer has accompanied this word string
with an action, use of punctuation, or phonological pattern that gives

it special status as a unit, and/or is repeating something s/he has just
heard or read.
7 By my judgment, the speaker/writer, or someone else, has marked
this word string grammatically or lexically in a way that gives it special

status as a unit.
8 By my judgment, based on direct evidence or my intuition, there is a
greater than-chance-level probability that the speaker/writer will have

encountered this precise formulation before, from other people.
27
9 By my judgment, although this word string is novel, it is a clear
derivation, deliberate or otherwise, of something that can be

demonstrated to be formulaic in its own right.
10 By my judgment, this word string is formulaic, but it has been
unintentionally applied inappropriately.

11 By my judgment, this word string contains linguistic material that is
too sophisticated, or not sophisticated enough, to match the speakers

general grammatical and lexical competence.
These are the eleven diagnostic criteria for identification of formulaic
sequences (Wray & Namba, 2003, pp. 2932).
Native speaker judgment: Wood (2010a)

Wood (2010a) describes a study in which the speech of second language
learners of English was analyzed with a focus on the role of formulaic
language in facilitating fluency (see details in Chapter 6). The participants,
from three separate language backgrounds, retold the narratives from silent
film prompts six times in six months, and the resulting corpus of second
language speech was analyzed. Identifying formulaic sequences in the data
was a central concern in this study, and Wood takes pains to explain the
checklist and the procedures used. In the study, native speaker judgment
was used to determine what constitutes a formulaic sequence, and each of
Wrays (2002) concerns about native speaker judgment were addressed in
the procedures used:
1 It has to be restricted to smaller data setsThe small corpus accords
with Wrays first concern about native speaker judgment.

2 Inconsistent judgment may occur due to fatigue or alterations in
judgment thresholds over timeThe concern with inconsistent

judgment was addressed by having judges individually listen to as well
as read the transcripts.
3 There may be variation between judgesVariation among judges
was addressed by having a discussion and benchmark identification

session before actual individual judging began. The samples used
for the benchmark session were not included in later judgment
processes, but were set aside as complete after the benchmark
session ended. In the benchmark session, two random transcripts
were analyzed individually and judges presented the formulaic
sequences they had marked.
28
4 There may not be a single answer as to what to search forThe idea
that there might not be a single answer as to what to search for was
at least partly addressed by having the judges read relevant literature
about formulaic sequences and to study and apply a set of five criteria
drawn from that literature.
5 Application of intuition in such a way may occur at the expense of
knowledge we do not have at the surface level of awarenessAs for

knowledge beyond the surface level of awareness of judges, all judges
read the most salient literature on criteria for identifying formulaic
sequences. In the benchmark sessions, the criteria taken from the
background literature were used as justification for selecting particular
items as formulaic sequences in the transcripts, and features of the
recorded speech such as speed and volume changes were also used
as guides.
Given the small and very specific corpus obtained, it was logical to avoid
complete reliance on frequency counts as would be required when using
computer corpus analysis. As noted earlier, some formulaic items might be
uttered only once or be highly idiosyncratic. As well, a researcher would need
to use a great deal of judgment in determining what is or is not actually a
formula after a list were compiled by means of corpus analysis software, since
word combinations are not necessarily formulaic just because they occur
together often. Recall that formulaic sequences need to have a more or less
unitary meaning or function, and/or be produced or comprehended more or
less as a whole. And finally, it is vital to remain aware that Woods participants
were second language English speakers with a less than solid grasp of the
nuances of English phraseology.
The major reason for using native speaker judgment in Woods study was
the fact that it was a corpus of spoken language and the act of listening to
speech and noting intonation and pause patterns that cannot be done by
machine. In other words, human judgment was required if all the factors
relevant to formulaicity in speech were to be determined.
Judgment criteria
Five criteria were applied in deciding whether a sequence was a formula,
drawn from previous research on formulaic sequences. No particular criterion
or combination of criteria were deemed as essential for a word combination
to be marked as formulaic, these were only guides:
1 Phonological coherence and reduction. In speech production
formulaic sequences may be uttered with phonological coherence
29
(Coulmas, 1979; Wray, 2002), with no internal pausing and a

continuous intonation contour. Phonological reduction may be present
as well, such as phonological fusion, reduction of syllables, deletion
of schwa, all common features of the most high-frequency phrases
in English, but much less in low frequency or more constructed
utterances, according to Bybee (2002). Phonological reduction can
be taken as evidence that much of the production of fluent speech
proceeds by selecting prefabricated sequences of words (Bybee,
2002, p. 217).
2 The taxonomy used by Nattinger and DeCarrico (1992) (see a
description in Chapter 3). This includes syntactic strings such as

NP + Aux + VP (), collocations such as curry favor, and lexical
phrases such as how do you do?, all of which have pragmatic
functions () (p. 36). This taxonomy is not necessarily applicable
in every case; it was used as a guide to possible formulaicity. For
example, if a sequence matched other criteria and fit into a category in
this taxonomy, it might be marked as formulaic.
3 Greater length/complexity than other output. Examples would
include using I would likeor I dont understand, while never using

would or negatives using do in other contexts. Judges were able to
see and hear the entire output of a particular participant to help in
applying this criterion.
4 Semantic irregularity, as in idioms and metaphors. Wray
and Perkins (2000, p. 5) note that formulaic sequences are

often composed holistically, like idioms and metaphors, and not
semantically. Examples of this were apparent in the background
literature for the judges, and many formulas readily match this
criterion.
5 Syntactic irregularity. Formulaic sequences tend to be syntactically
irregular. This criterion was readily applied to some sequences, but it

was important to check syntactically irregular sequences against other
criteria on this list.
(Wood, 2010a, pp. 111, 112).
Features of the recorded speech such as speed and volume changes were
also used as guides.
A sequence was marked as formulaic if two or all three of the judges
agreed. Idiosyncratic or nonnative-like sequences were accepted, the
idea being that various criteria were employed by the judges in making
determinations. Some types of productions, which still met all or most of
30
the criteria, were examples of several phenomena marginally relevant to

the study. For example, a sequence might have been stored and retrieved
by a participant as a whole, but in an inaccurate or misperceived form, for
example, whats happened instead of what happened, or thanks god instead
of thank god. The retell situation was heavily stressful on communicative and
cognitive resources of the participants, as they were required to recall events
seen in the film while creating a running narrative, causing articulatory slips or
gaps and inaccuracies of some components of sequences. Because of these
realities of the spontaneous speech situation, it was decided that a sequence
could match the criteria and still be idiosyncratic, misperceived and stored
with errors, or misarticulated due to stress.
Judgment procedure
The expert judges were two graduate students in applied linguistics, and the
researcher himself. All had read Coulmas (1979), Nattinger and DeCarrico
(1992), Peters (1983), and Wray and Perkins (2000) prior to the judging
process. A benchmarking, preliminary discussion session was held in which
the judgment criteria and the procedure as a whole were clarified, and a
two transcripts were jointly examined and coded by all three judges, in an
effort to standardize the overall approach to identification of formulas. Due
to the fact that the speech samples were very specific narrative retells, the
formulas identified covered a wide range, from idioms (love your neighbor,
thats it, instead of) to two-word verbs (throw away, come back, let out, give
up, got mad, fall down), to repeated prepositional and participial phrases
(living in the same house, taking a bath, started fighting, out of the house,
at the moment, in the middle). The judges individually coded the rest of the
transcripts, following the time sequences of the speech samples, beginning
with sample number one for a given participant and continuing to sample
two and on through sample six for the same participant. After this, marked
items were accepted as formulaic if two or all three of the judges were
in agreement. In some cases, issues such as location of the boundaries
between formulas and the surrounding language, or judges determination
that some items were possibly but not definitely formulaic, were decided by
the researcher.
The four checklists: Coulmas, Peters,

Wray, Namba, and Wood
All four checklists are designed so that none of the criteria are necessary
nor must they all be met for the purpose of identification. Wray and Namba
31
(2003) propose that each applicable criterion on their list should be rated on
a 5-point Likert scale from strongly agree to dont know, to strongly disagree,
where strongly disagree indicates the absence of a trait that sometimes
indicates it [formulaicity] (p. 26). All four checklists represent a departure
from the methods described above in that they place considerable importance
on native speaker intuition.
Wray and Nambas (2003) checklist is the most ambitious, having a
total of eleven criteria that address thirteen points. Peters (1983), on the
other hand, lists six criteria that address eight points while Woods (2010a)
checklist is based on five criteria. Taken together, the checklists show
remarkable agreement on the range of characteristics that may be indicative
of formulaicity, all of them make reference to phonological characteristics
and complexity. As seen elsewhere in this book, phonological markers of
formulaicity can include phonological coherence, reduction, or distinctive
phonological patterns, including phonological fusion, reduction of syllables,
or deletion of schwa (Wood, 2010a). Complexity refers to the fact that a
given sequence may be noticeably more advanced or less advanced than
the individuals typical nonformulaic language use in terms of syntactic and
morphological features.
Woods (2010a) is the only checklist to specifically reference form, linked
to Nattinger and DeCarricos (1992) taxonomy of lexical phrases. Frequency is
also considered by Wray and Namba (2003) and Peters (1983) to be a mark of
formulaicity, although frequency in this context refers to frequent use by the
speaker, not an arbitrary threshold for identification set in corpus research. As
already discussed, Wood (2010a) does not consider frequency in and of itself
to be a criterion for identification.
It is also interesting to note that both Peters (1983) and Wray and Namba
(2003) allow for the two social extremesidiosyncratic uses and communitywide usesof formulas. Wood (2010a) does not consider idiosyncratic
uses a mark of formulaicity, though they are accepted given the developing
competence of nonnative speakers participating in the study. Wray and
Nambas (2003) checklist features criteria that can be applied to either correct
or inappropriate forms. Wray and Nambas (2003) checklist takes into account
local repetitions, including reading, derivations, and functional uses as possible
indicators of formulaicity.
Clearly, all the checklists rely on native speaker intuition to classify word
combinations as formulaic. For a number of types of research in this area,
judgment checklists can help to overcome the limitations of frequencybased psycholinguistic, or phonologically focused identification methods, and
provide a sort of aggregate measure of formulaicity. This is first and foremost
a useful means of identifying formulas in spoken language corpora, but, as
32
will be seen in Chapters 7 and 8, judgment can be useful in working with

written corpora too. The work of, for example, Simpson-Vlach and Ellis (2010)
into academic formulas uses judgment protocols having to do with salience
and teachability to help narrow down lists of frequency-derived word
combinations.
In summary
From this general overview of approaches and means and methods of
identifying formulaic sequences in corpora and texts, some interesting
possibilities come into sight. It is possible to determine formulaicity by using
frequency statistics, either from ones own corpus or from taking sequences
from ones own text or small corpus and checking their frequency or MI
in very large corpora such as the BNC or the COCA. It is even feasible
to use Internet search engines to guide decisions about formulaicity.
Psycholinguistic or acoustical features of a sequence and its processing
can also yield useful guidance about possible formulaicity. In working with
language data, expert or native speaker judgment about formulaicity may
be employed, measures well suited to smaller or quite specific data sets. In
these cases, a checklist of characteristics of the strings and their uses can
be a useful guide for judges. A few of the many themes and patterns which
the research shows are:
MM
MM
MM
MM
MM
Formulaic language is challenging to identify from texts,

transcripts, and corpora.
Formulaic language can be identified by various means.
Formulaic language may best be identified by use of a combination
of measures.
Formulaic language can be identified by expert or native speaker
judges using checklists as guides.
Regardless of the measures used to determine formulaicity,
absolute certainty is elusive.
Even if you use corpus frequency and MI statistics and acoustical features and
judges and checklists, you are likely to remain guarded about your decisions
about formulaicity. As the body of research grows, however, it is more and
more likely that new and more reliable or confidence-inspiring means of
determining formulaicity will emerge.

1 Before reading this chapter, how would you have approached the
task of determining what is formulaic in a given text?
2 Look at the list of sentences taken from Ellis (2012). Are there
particular items in the list which might be better matched to any of
the measures of formulaicity than others?
3 Take a word string which you think may be formulaic from a particular
text. Check its frequency in the BNC or COCA. Does the result help
confirm your intuitions?
4 For the word string you used in #3 above, check its MI in the BNC
or the COCA. Does this confirm your intuitions or does it change the
picture? What are the implications of the frequency and MI results
you have found?
5 Take a word string you think may be formulaic from particular text.
Take another which you think may not be formulaic. Enter each
string into a Google search box. What do the resulting lists of hits
tell you about the strings in terms of their frequency and function or
meaning? Are your original intuitions confirmed or not?
6 Record some spoken language from the media or from real-life
communication. Can you apply some knowledge of the acoustic
characteristics of spoken formulaic sequences and find some
possibly formulaic sequences? What guides your decisions in this
exercise?
7 Survey the checklists of criteria described in this chapter. Choose
one of them and see if you can employ it to help you isolate some
formulaic sequences from a text or transcript. What are some of the
complications you experience in doing this?
8 Using a set of similar texts or transcripts in a group of three, attempt
the checklist and judging procedure described above as used in
Woods 2010a research. How consistent are your judgments? What
are some ways to make the judging process more consistent?
9 Can you think of a way to determine formulaicity not discussed in
this chapter?
10 What are the implications of the identification procedures for
researchers? For language teachers? For language testers and
assessors?
33
3
Categories of Formulaic
LanguageLabels and
Characteristics
ver the history of formulaic language research the units of analysis have
been labeled in a wide range of ways. This is largely because researchers
were not all examining the exact same phenomenon, and were frequently
working in quite separate areas of linguistics and, as we saw in Chapter 1,
even in areas outside of linguistics in fields as diverse as social anthropology
and neurology. It took time for anyone to survey the existing research and
actually attempt to draw a picture of the phenomenon under examination and
sketch out the state of knowledge about it. In fact, it was not until Wray (1999)
took a step back and examined the growing body of research that the umbrella
term formulaic language/formulaic sequence came into widespread use. Since
then, that term has more or less gained and held traction in the literature.
We have seen special issues of journals devoted to formulaic language, for
example, in 2012 a special volume of Annual Review of Applied Linguistics.
We have seen particular academic symposia devoted to the field, for example,
the Symposium at University of Wisconsin Milwaukee in 2007. Wray herself
was instrumental in developing the Formulaic Language Research Network
(FLaRN), which has a social networking site on the Web with hundreds of
members, and has spawned a series of well-attended seminars over the years
at which researchers share their work.
It was Wray and Perkins (2000, p. 3) who noted that formulaic language at
that point had been labeled by as many as forty terms (Table 3.1):
36
Table 3.1 Terms used to refer to formulaic language

Amalgams
Automatic
Chunks
Cliches
Co-ordinate constructions
Collocations
Composites
Conventionalized forms
FEIs
Fixed expressions
Formulaic language
Formulaic speech
Formulas/formulae
Fossilized forms
Frozen phrases
Gambits
Gestalt
Holistic
Holophrases
Idiomatic
Idioms
Irregular
Lexical(ized) phrases
Lexicalized sentence stems
Multiword units
Noncompositional
(Continued)
CATEGORIES OF FORMULAIC LANGUAGE
37
Table 3.1 Terms used to refer to formulaic language

Noncomputational
Nonproductive
Petrification
Praxons
Preassembled speech
Prefabricated routines and patterns
Ready-made expressions
Ready-made utterances
Rote
Routine formulae
Schemata
Semi-preconstructed phrases that constitute single choices
Sentence-builders
Stable and familiar expressions with specialized subsenses
Synthetic
Unanalyzed chunks of speech
As we can see from the list, the range is remarkable. In the years since
the publication of Wray and Perkins, new terms have been added, including
corpus technology-derived labels such as n-grams and concgrams. However,
a survey of the main categories in recent literature shows that there is
considerable common ground among researchers as regards exactly what
they are studying, but, at the same time, categories exist for valid reasons.
The main areas of focus which have emerged over the years are collocations,
idioms, lexical phrases, lexical bundles, metaphors, proverbs, phrasal verbs,
n-grams, concgrams, and compounds. If we examine each of these in turn, we
will end up with a strong sense of what is actually meant by formulaic language.
Collocations
The term collocation is a bit of a puzzler for many, because it appears to
simultaneously refer to a specific type of word combination and to all
38
multiword phenomena. There are many possible definitions of collocation, but,

in linguistics, they mostly boil down to the notion of a syntagmatic relationship
among words which co-occur. The syntagmatic relationship may be defined
quite generally, or it may be restricted to relationships which conform to
certain syntactic and/or semantic criteria. Over the years, two perspectives
on collocations have emerged from the literaturefrequency-based and
phraseological (see Granger & Paquot, 2008) for an overview of these). The
frequency-based approach, with roots in the work of Firth (1951, 1957), is
concerned mostly with the statistical likelihood of words appearing together,
while the phraseological approach, with roots in Soviet phraseology, tends
much more toward restrictive descriptions of multiword units, with a narrower
view of what to label as a collocation. In addition to all this, after Firths work in
the 1950s, work on other types of word combinations began to expand, with
researchers using the term collocation in more creative ways.
The influence of Firth, the pioneer

The frequency-based approach to dealing with collocations was basically
initiated by Firth in the 1950s, although the actual term itself had been
around much longer. Firth developed the concept of collocation as a
functional description of language in line with his overall theories of meaning
(1951, 1957). As touched on earlier in Chapter 1, Firths definition of collocation
was essentially the co-occurrence of words in proximity to one another.
There are several types of variation: the habitual collocation, in which words
occur together quite frequently, exemplified by Firths use of the pejorative
label, silly ass, as well as the idiosyncratic collocationa co-occurrence of
words that occur relatively rarely, but retain a useful function. Firths pointed
examples from literature, such as sleek supple soul from a poem by Swinburne
(see Nesselhauf, 2004 for more), help to further complicate the otherwise
straightforward definition given. Reference to noncontiguous words as
collocations, such as dark and night occurring in a sentence separated by other
words, may for the inexperienced lay the final blow. Indeed, even for those
better versed, it is rather unclear from Firths work how distantly separated
words can be before the collocational bond is broken.
This overall approach to collocations was later developed by Halliday,
Mitchell and Greenbaum, Sinclair, and Kjellmer. Halliday extended and refined
the definition to specify that a collocation is a function of the frequency of
a word appearing in a certain lexical context as compared to its frequency
in language as a whole. Mitchell and Greenbaum, working separately but
based on the Firthian tradition, refined the study of collocation by including
syntactic and semantic aspects in the descriptions. Sinclair (1991) worked
to refine things by focusing specifically on the issue of what span of words
39
to consider a collocation. Jones and Sinclair (1974) found that the span of
words which is optimal for a collocation is four words to the right or left of a
node, or core word. Kjellmer, Stubbs, and Altenberg took the computerized
methods espoused by Sinclair some steps further. Kjellmer worked on the
Dictionary of English Collocations (1984) defining a collocation as a continuous
and recurring sequence of two or more words which are grammatically well
formed. These efforts were the genesis of the computer-based frequencydriven study of collocations.
Phraseological approaches to collocation research

Phraseologists have typically viewed collocations as combinations of words
whose relations are fixed or variable to varying degrees, and the meaning
of which is somewhat transparent (Nesselhauf, 2005). Cowie (1994) sees
word combinations as occurring along a scale from composites, combinations
below the sentence level with lexical or syntactic functions, and formulae,
often sentence-length and having pragmatic functions. Composites can be
fully opaque and/or invariable, as in pure idioms. Figurative idioms can have
both a literal and figurative meaning, and restricted collocations are those in
which in which at least one element is literal and the other figurative. Cowie
gives no restrictions on the number or words or the span of words in a
collocation.
The phraseological approach has roots in the 1940s1960s as Russian
pioneers in the area such as Vinogradov (1947) and Amosova (1963) classified
phraseological units according to their semantic and pragmatic functions.
In his definition of phraseological units, Vinogradov (1977) identifies such
important characteristics as noncompositionality and nonsubstitutability,
classifying the units into four broad types according to the degree of
their opacity, structural fixedness, literal/figural meaning, and contextual
boundaries. A similar classification was elaborated by Amosova (1963), using
the term phrasemes instead of phraseological combinations and outlining
specific parameters within a word combination. The Russian phraseologists
were conscious of relationships between the components of collocations,
and identified that one of the words in a multiword unit might have a leading
position.
Igor Melcuk (1998) founded the Meaning-Text Theory, using semantics
and pragmatic functions of formulaic sequences as the basis of classification.
Melcuk (1998) observed that collocations are not free and noncompositional,
and that the specific relations between the words in a collocation cause it to be
perceived as a single unit of meaning. Melcuk pointed out that in a multiword
combination one of the components is leading, while another depends on
it. Both components combine and participate in the creation of meaning as
40
a whole unit. Melcuk (1998) attempted to classify collocations on the basis

of the relations between the components. He differentiated four types: (1)
collocations with light delexicalized verbs such as do a favor; (2) collocations
in which the meaning of the dependent word is clarified only through its relation
with the main word such as well-chilled beer; (3) collocations in which one of
the elements has a synonym, yet this synonym is impossible in a given word
combination such as strong (but not powerful) coffee; and (4) collocations in
which a dependent word embraces the meaning of the main word such as
artesian well or aquiline nose (p. 31).
Lexicography and collocations

Lexicographers and lexicographists, whose primary interest was in creating
dictionaries, had been interested in collocations even before the advent of
phraseology. An early reference to the concept collocation was in the works
of Palmer (1933) and Hornby, Gatenby, and Wakefield (1942). These scholars
examined collocations in the context of phraseology and lexicology, and
Palmer (1933) attempted to classify collocations into verb, noun, preposition,
and adverb morphologo-syntactical types. Collocations were not the primary
research focus of phraseologists and lexicographists. In fact, in early stages,
the Russian classical school of phraseology concentrated on formulaic
language in general. It wasnt until the latter part of the twentieth century
that Melcuk (1998) and Cowie (1998) focused on collocations as a specific
phenomenon.
Idioms
Definitions of idiom
Unfortunately, the definition of idiom is in some ways just as fraught as that
of collocation. Some researchers use the term in an extremely broad sense,
encompassing proverbs, slang expressions, and even individual words of
certain types. Many others, however, use the term in a much narrower sense,
to refer only to word strings which are, in the words of Moon (1998, p. 4),
fixed and semantically opaque or metaphorical, for example, kick the bucket
or spill the beans. It may be that for such a complex language phenomenon,
no specific single definition will do it justice.
The most encompassing definition of idiom is that which includes even
single words. The definition elaborated by Hockett (1958) is a classic in this
category, labeling any language item whose meaning is not visible from
41
its structure as an idiom, even if it is a single morpheme, for example, -ed.

To Hockett, the -ed morpheme is idiomatic because its meaning cannot
be interpreted from its structure, whereas attaching a lexical word to the
morpheme makes it not idiomatic, since the meaning of, for example, worked
can be deduced from the structure of the two morphemes it contains. In
the end, however, this definition of idiom is far too broad to be of practical
use to most researchers. However, some researchers have extended the
definition of idiom to include single words which are polymorphemes or
compounds, such as lighthouse or television (see Katz & Postal, 1963;
Makkai, 1972).
More focused and limiting criteria have been employed by some scholars
to define and identify idioms. For example, Weinreich (1969) maintained
that only multiword phenomena which have both literal and figurative
meanings can be termed idioms, which rules out such intuitively idiomatic
noncompositional and purely figurative expressions as by and large or as of.
Weinreich also excludes phenomena he calls stable collocations from the
category of idioms, such as two wrongs dont make a right, because they
lack a figurative interpretation. Weinreichs definition of idioms may also be
somewhat too narrow to work easily for most scholars.
Transformational-generative grammar featured in the origins of the work
of Katz and Postal as well as Weinreich, but it was Fraser (1970) who strictly
defined idioms as word strings with transformational power. Looking at the
range of idioms, Fraser elaborated a six-level hierarchy to encompass the
variety of manipulation and transformation a given idiom may allow (adapted
from Liu, 2008, pp. 7, 8):
6. Unrestrictedno real idioms allow this much transformation
5. Reconstructiononly nominalization of a verb, for example, she
lay down the law to her laying down of the law
4. Extractionpassivization, for example, the buck has been passed
too often and particle and noun inversion, for example, look up the
information/look the information up
3. Permutationinversion of direct and indirect object, for example,
cannot teach an old dog new tricks and particle and noun inversion
when the noun is part of the idiom, for example, put on some weight/
put some weight on
2. Insertioninsertion of a nonidiomatic item into the idiom, for
example, she read the class the riot act
1. Completely frozenno transformation or manipulation is
possible
42
Another strict definition of idiom is centered around semantic

noncompositionality and nonproductiveness of form. Wood (1981) adopts this
sort of definition, holding that the meaning of an idiom must not be merely
the sum of the meaning of its parts, and that the structure of an idiom must
not allow any transformations. Word combinations may then be placed on a
sort of continuum of idiomaticity, crossing a range from idioms at one end to
expressions, formulas, and free forms at the other.
Later scholars dealt with idioms in similarly restricted ways. Moon
(1998, p. 5) defines idioms as semi-transparent and opaque metaphorical
expressions such as spill the beans and burn ones candle at both ends.
She separates idioms from what she terms fixed expressions, which are
word combinations such as routine expressions, sayings, similes, and so
on (Moon, 1998, p. 2). Grant and Bauer (2004) go a step further than Moon
in the direction of exclusivity in defining idioms, adding the qualification
that an idiom is not only noncompositional, that is to say, nonliteral, but
also nonfigurative, in that its meaning cannot be interpreted from the
constituent parts. To Grant and Bauer, a sequence such as kill two birds
with one stone is not an idiom because it can be interpreted as nonliteral,
and then reinterpreted by means of studying its pragmatic intent. It is
unlikely or rare to actually kill two birds by casting one stone. When we see
this word string, we recognize that fact and then look at the context and
likely arrive at a good sense of what it actually means. On the other hand,
to Grant and Bauer, by and large is a classic true idiom, because it is not
only nonliteral, but it also gives no clue as to what its figurative meaning
might be.
Categorizations of idioms
As for categorization of types of idioms, various scholars have elaborated
taxonomies. Makkai (1972) identifies six subcategories (adapted from Liu,
2008, pp. 17, 18):
1 Phrasal verbsverb and one or two particles, for example, come
across
2 Tournurea verb and at least two words (often noun phrases), for
example, take the bull by the horns

3 Irreversible binomialstwo nouns or adjectives in a fixed sequence,
for example, safe and sound

4 Phrasal compoundscompound nouns and adjectives, for example,
high-handed
43
5 Incorporating verbscompound verbs, for example, brainwash

6 Pseudo-idiomscompound words or phrases in which one item has
no meaning by itself, for example, chit-chat

Moon (1998), meanwhile, classified idioms into three broad categories
(adapted from Liu, 2008, pp. 19, 20):
1 Anomalous collocationsuniquely formed collocations, which may:
a violate grammatical rules, for example, day in and day out
b contain items specific only to the collocation and with no meaning
outside of it, for example, to and fro

c be somehow defective, for example, foot the bill, in which the
word foot carries a meaning unique to this collocation

d be phraseological, or allow variation in structure, for example, with
regard to or in regard to
2 Formulaegrammatical in structure and compositional in meaning,
yet pragmatically specialized in function

a Sayings, for example, an eye for an eye
b Proverbs, for example, every cloud has a silver lining
c Similes, for example, as right as rain
3 Metaphorsexpressions which link the concrete and the imaginary
or abstract, with three degrees of transparency

d Transparentfor example, stepping stone
e Semi-transparentfor example, throw in the towel
f Opaquefor example, pull ones leg
This overview of the history and range of definitions of idioms is somewhat

complex. But we can summarize the definition of idiom as centered around
five defining criteria (see Skandera, 2004):
1 At least two words in lengththis is common to all categories of
formulaic language.
2 Semantic opacity (adding up meanings does not yield the whole)
spic and span and to and fro are examples of this phenomenon,
although the lexical items involved are in and of themselves opaque,
we do not see spic, span, or fro used in other contexts (see Allerton,
1984 for more on this). Other examples of semantic opacity have
roots in history, such as kick the bucket (die), which derives from
a phenomenon known in the procedures involved in the slaughter
44
of pigs, and look for a needle in a haystack, which is actually more

a figurative use of a sequence which also has a possible literal
interpretation.
3 Noncompositionalitysimilar to semantic opacity, but more the
idea that an idiom is unanalyzable in terms of meaning or function.

If we look again at the previous examples, we can see that this
criterion is a bit flexible, in that many idioms are actually figurative
interpretations of a word sequence which can also be taken
literally.
4 Mutual expectancyalso referred to as lexicality, this means
that the items which comprise an idiom tend to occur together

in a more or less fixed way, frequently making the idiom appear
more like a single lexical item than a collection of individual
words.
5 Lexicogrammatical invariability/frozenness/fixednesssimilar to the
idea of lexicality, this implies that the words in an idiom are fixed and
cannot be substituted by synonyms. Some idioms are fixed to the
point of not allowing any syntactic or morphological variation, such
as hook line and sinker or by the way, or beat around the bush; we
cannot pluralize any of the items in these word sequences nor, for
example, passivize the latter one to read the bush is beaten around.
However, some idioms allow a limited amount of such variation, such
as red herring or teach an old dog new tricks; it is possible to say red
herrings, plural, and to reverse the order of an idiom such as teach
new tricks to an old dog.
Lexical phrases
Lexical phrases are a particular subset of formulaic language first publicized
by Nattinger and DeCarrico (1992), based largely on previous work by
Becker (1975). They outline two large categories of the phrases, strings of
specific lexical items and generalized frames. The former are generally unitary
lexical strings and may or may not be canonical in the grammar, while the
latter consist of category symbols and specific lexical items. Four criteria
help in classifying the phrases: length and grammatical status; canonical
or noncanonical shape; variability or fixedness; whether it is a continuous,
unbroken string of words, or discontinuous, allowing lexical insertions (pp.
37, 38). They also identify four large categories of lexical phrases which
display aspects of the four criteria: polywords, which operate as single
45
words, allowing no variability or lexical insertions, and including two-word

collocations (e.g., for the most part, so far so good); institutionalized
expressions, which are sentence-length, invariable, and mostly continuous
(e.g., a watched pot never boils, nice meeting you, long time no see);
phrasal constraints, which allow variations of lexical and phrase categories,
and are mostly continuous (e.g., a ___ ago, the ___er the ___er); sentence
builders, which allow construction of full sentences, with fillable slots
(e.g., I think that X, not only X but Y) (pp. 3845). Nattinger and DeCarricos
comprehensive taxonomy covers a large proportion of the types of utterances
which are produced in a language.
Lexical bundles
Lexical bundles (Biber & Conrad, 1999; Biber, Johansson, Leech, Conrad, &
Finnegan, 1999) are a category of formulaic language characterized by the
means by which they are identified and their purely functional naturethey
are not meaning units per se, but rather, units of function which serve to
characterize particular types of discourse. The work on lexical bundles has
been overwhelmingly conducted on academic language, especially academic
written text.
Lexical bundles are combinations of three or more words which are
identified in a corpus of natural language by means of corpus analysis
software programs. An additional characteristic of lexical bundles is that they
occur across a range of texts or, in the case of academic language, a range of
disciplines. Biber and Conrad (1999) noted that these word combinations are
so common, it might be assumed that lexical bundles are simple expressions,
and that they will be acquired easily (p. 188). However, the acquisition and
use of lexical bundles does not appear to occur naturally. Lexical bundles have
been shown to be used at high frequency in published academic writing,
and particular types of the bundles are characteristic of particular disciplines
(Cortes, Jones, & Stoller, 2002). Academic disciplines have different ways
of seeing the world, connected with different communicative conventions
(Hyland & Hamp-Lyons, 2001).
Biber (2006) presented a comprehensive corpus-based analysis of
university language, including an examination of lexical bundles in textbooks.
He found that academic disciplines differed in their use of lexical bundles, with
natural and social sciences relying on them more than the humanities. Overall,
the distribution of lexical bundles across functional categories in Bibers study
show that referential bundlesmaking direct reference to real or abstract
entities or to textual content or their attributesare the most common.
46
Stance bundlesexpressing attitudes or assessments of certaintyare the

second most common type of function for lexical bundles in the textbooks,
whereas discourse organizersreflecting relationships between previous
and subsequent discoursewere the least common. Within the category
of referential functions, it appears that quantity and intangible framing
subfunctions represent the largest categories.
Other researchers have deviated somewhat from the somewhat
narrowly defined methodologies stipulated by lexical bundle researchers.
Simpson-Vlach and Ellis (2010) note that many of the items identified
by lexical bundle research are characterized by context-specific lexical
components and are of limited utility for the purposes of teaching L2
learners to be competent with academic discourse. They conducted a
modified type of such research wherein they added elements of native
speaker judgment to the process of identifying lexical bundles, producing
lists of items they call instead formulaic sequences. Liu (2011) took this a
step further and, analyzing items extracted from the British National Corpus
(BNC), produced lists of what he terms multiword constructions, which
stand as units of meaning or function. This terminology was also employed
by Wood and Appel (2014) in an analysis of multiword phenomena in first
year university textbooks.
Metaphors
Metaphor is essentially a semantic principle centered around an
unconventional act of reference, a word that is used to describe an entity
which is essentially outside of its denotational range, and there is tension
between a literal and a metaphoric interpretation. The structure of a metaphor
is like this: a vehicle is the term used in an interpreted sense which cannot
be understood literally because of the unusual context of use. The topic is
the referent of the vehicle. The grounds are the analogies or features shared
between vehicle and topic. Take, for example, life is a highway. Highway
is the vehicle, the word being used in an interpreted, not literal sense. Life
is the topic, and the grounds are the analogy between the passing of time
and the covering of distance. The metaphor can, of course, use a marker
such as is like or kind of such as in life is like a box of chocolates. The vehicle
can be single words or phrases or full clauses or sentences. The strength of
a metaphor depends on the degree of semantic tension between vehicle
and topic, linguistic markers such as like, kind of, and the implicitness or
prominence of the vehicle. The metaphor calls on us to compare the two
components, the vehicle and the topic.
47
Proverbs
Proverbs are hard to define but key are the opacity of the relationship
between literal and figurative meanings, and sentence-like length. Pragmatic
characteristics of proverbs include advice and warning (better late than never,
dont put the cart before the horse), instruction and explaining (an apple a
day keeps the doctor away, the ball is in your court), and communicating
common experience and observations (you cant get blood from a stone,
just as the sun rises in the east). They are not the words of the speaker, but
quotations from a canon of proverbs shared by members of a community.
They feature the linguistic characteristics of brevity and directness, simple
and/or parallel syntax, metaphorical quality, and sometimes archaic
structures.
Compounds
Compounds are special cases in formulaic language study, being more a
branch of word formation. A compound is, in fact, the creation of a word
with a unique meaning by combining two existing words, and in English
many compounds in fact are written as two separate words (see ten
Hacken, 2004).
Compounds show asymmetry, with the second of the two words
usually the head or core of the combinationfor example, desk computer
describes a type of computer and computer desk describes a type of desk
(see Williams, 1981). The head word is subject to rules of pronoun reference
and has some freedom of syntactic form. The head represents a type and
the nonhead serves to classify the head. There are three forms of compound
words:
MM
MM
MM
closed form, in which the words are written as one, such as

secondhand, childlike, or notebook;
hyphenated form, in which the lexical items are separated by
hyphens such as mother-in-law, or mass-produced;
open form, in which the two words are written separately such as
post office, or real estate.
Compounds written as single words indicate a stronger lexicalization.

Words are combined into compound structures in various ways and they
can change over time. Two words may be joined by a hyphen and then be
48
blended into a single word. The rules for writing compounds are not universal
or specific in English, and it is common for even experienced and highly
educated writers to need to consult dictionaries or online resources to
determine whether a given item is two words, a hyphenated compound, or
a single word.
Words modified by adjectives, for example, an old school, are different
from a compound word, for example, a high school in the degree to which
the nonhead word changes the essential character of the head, or the
degree to which the modifier and the noun are inseparable. In the example
of high school, the compound represents a single entity, a particular type
of school which is always identified as such, whereas old school is simply a
school being described as old. The adjective slot in the combination can be
filled by any number of items.
Modifying compounds are often hyphenated, for example, an old-furniture
salesman sells old furniture, but an old furniture salesman is an old man. When
compound modifiers precede a noun, they are often hyphenated: part-time
worker, high-speed chase. Adverbs, words ending in -ly, are not hyphenated
when compounded with other modifiers: highly rated university, a partially
refundable purchase.
In pluralizing, the most significant word, the head, takes the plural form.
Examples include also-rans, fathers-in-law, and go-betweens.
Phrasal verbs
Phrasal verbs are a particularly English type of formulaic language phenomenon.
They are verbs combined with a preposition or particle, or both, with often
nonliteral meanings, or both literal and figurative interpretations, like idioms.
Three structural categories exist:
Verb + preposition (prepositional phrasal verbs)
Help me look after Jakes dog for the weekend.
Other children often picked on Sebastian.
What if you run into your ex-wife at the party?
Verb + particle (particle phrasal verbs)
You should bring that up at the next meeting.
Try not to give in when you see the dessert table.
Come over and lets hang out for the afternoon.
49
Verb + particle + preposition (particle-prepositional phrasal

verbs)
I am not putting up with any more outbursts from her.
Jane is looking forward to a long sunny vacation.
The kids loaded up on chocolates before we got there.
Three criteria exist for determining whether an item is in act a phrasal verb
(adapted from Liu, 2008, p. 22):
1 A phrasal verb does not permit insertion of an adverb between its
components, for example, we cannot say, The kids loaded slowly up

on chocolates before we got there.
2 A phrasal verb particle cannot be forefronted in a sentence, for
example, we cannot say, Up with I am not putting any more outbursts.

3 A phrasal verb never exists as only literal in meaning, but must have
some degree of figurative meaning, as seen in the examples above.

Some researchers in the area of idioms have actually included phrasal
verbs as a subcategory. As is the case with idioms, the meaning of these
items cannot be understood in a literal fashion, or interpreted from the
component words. For example, pick on has little to do with actually
picking, not to mention on. Hanging out has nothing to do with actually
hanging. Phrasal verbs are common in everyday, informal speech, and their
synonyms, which are often borrowings from Latin, Greek, or French, are
reserved for more formal discourse registers or for more specific high-level
usage. We tend to say get together rather than congregate, put off rather
than postpone.
Concgrams
A concgram is, like all formulaic language, a combination of two or more
words. However, a concgram is a noncontinuous sequence, in which the
constituent words are separated by others. The idea dates back to the 1980s
when the Cobuild team at the University of Birmingham tried to find a way
to search corpora by machine for noncontiguous sequences of associated
words.
The ability to discover noncontinuous word combinations in corpora
increases the likelihood that researchers will discover not only a more
50
extensive description of patterns of collocation and their meanings, but also,

and more importantly, new patterns of language use.
In summary
From this general overview of categories of formulaic language, it can
be surprising how many and varied the types are. The phenomenon we
are dealing with is by no means unitary, and the classifications and the
taxonomies are somewhat leaky or slipperythe distinctions between,
for example, a collocation and an idiom are blurry, and it also appears
that particular researchers have somewhat arbitrarily composed their
own sets of descriptions and classifications and definitions of the various
types. Other types of formulaic sequences seem lost in the shuffle,
uncategorized but intuitively formulaiclook at items like and then or
sooner or later. Where do they fit? The advent of corpus analysis technology
and techniques has done much to help us identify new types of formulaic
sequences, but what makes the exact determination of a lexical bundle
different from a sequence identified using frequency and other statistical
measures such as mutual information (see Chapter 2)? Is the distinction
even worthy of debate? A few of the many themes and patterns which the
research shows are:
MM
MM
MM
MM
MM
Formulaic sequences can be classified in various ways.

The nature of the classifications and the criteria used to determine
them has changed over time.
The classifications which exist are by no means exhaustive, and
some types of word strings are difficult to classify.
Some categories overlap with others.
There is no firm consensus that all the categories are similarly
processed semantically or psycholinguistically.
One interesting conclusion to be drawn from this survey of types and

categories of formulaic language is that the classifications are in some ways
arbitrary. We might even wonder why the categories are even valuable to us
as researchers or teachers. Does it matter whether a sequence is a phrasal
verb or a collocation? Or are many of the classifications really just carryovers
from an early era of armchair phraseology, with little particular relevance to
those who do applied research or language teaching?

1 Before reading this chapter, what categories of formulaic sequences
would you have identified?
2 Look at the list of word strings at the beginning of Chapter 1 in this
book. Can you classify them on the basis of the information presented
in this chapter?
3 Take several word strings which you think may be formulaic from a
particular text. Can you categorize them according to the information
in this chapter? Does this exercise help enhance your understanding
of the word strings in some way?
4 For each of categories of formulaic language make a list of its
characteristics. Find several examples of each one from the literature
and/or from your own intuitions.
5 For each of the categories you have dealt with in activity #4, see if
you can find examples from a text. Do these examples differ from
those you drew from intuition?
6 What is a workable definition of formulaic language which takes all
the categories and their characteristics into account?
7 How would your methods of conducting research into collocations
differ from those you might use to conduct research into phrasal
verbs? Lexical phrases? Lexical bundles?
8 How would your methods of teaching collocations differ from those
you might use to teach phrasal verbs? Lexical phrases? Lexical
bundles?
9 Which of the categories of formulaic language make intuitive sense,
and which appear to require specific research techniques to uncover?
10 What are the implications of the nature of the categories for language
testers and assessors? To writers and editors?
51
4
Mental Processing of
Formulaic LanguageHolistic
and Automatized
n any discussion of formulaic language and its definition, one is bound

to encounter the assertion made most famously by Wray (2002) that a
formulaic sequence is a sequence, continuous or discontinuous, of words
or other elements, which is, or appears to be, prefabricated: that is, stored
and retrieved whole from memory at the time of use, rather than being
subject to generation or analysis by the language grammar (2002, p. 9). This
claim is attractive to many because it allows us to make additional claims for
formulaic languagemost notably that it facilitates fluent communication
by allowing us to bypass laborious creative construction, and to produce
and comprehend chunks of words with particular meanings or functions,
helping with fluent and accepted use of language. Rather than assemble
utterances or sentences in a word-by-word, step-by-step manner, we are
able to use holistically stored formulaic sequences instead. Does this sound
like an attractive model of language use? Of course it does, and it handily
explains away some tricky things about communication. It simply makes
sense, right?
Attractive claims are one thing, but it is essential that they not be taken
on faith. What is really meant by stored and retrieved as wholes, what does
that imply for formulaic language research, and, most importantly, what is the
evidence that it is so? To answer this it is necessary to take a look at some
basic concepts of mental processing and models of language production
and see where formulaic language may fit. As well, we need to survey the
evidence of holistic storage and retrieval of formulaic sequences.
54
Mental processing: A primer

Language production, and, to a great extent, comprehension, is largely
a function of mental processes and skills. Some key concepts, such as
declarative and procedural knowledge, automatization and proceduralization,
controlled and automatic processing, are essential to an understanding of
these mental processes.
Declarative and procedural knowledge

One foundational distinction which is essential to an understanding of
language production is that between declarative knowledge and procedural
knowledge. Declarative knowledge is information which is consciously
known, while procedural knowledge is more a sense of how to accomplish
things, and is linked closely to skilled behavior. The distinction can be reduced
to a comparison between knowledge of what, as opposed to knowledge
of how. For example, it is one thing to know a particular grammatical rule
or transformation. It is quite another to be able to accurately and fluently
produce an utterance containing that rule or transformation. Knowing the
rule is basically declarative knowledge. We can explain things we know
declaratively. We can analyze things we know declaratively. We can produce
accurate language containing a declaratively know rule or transformation if
we have time and focus to deal with it in an explicit way such as answering
test questions or filling in blanks on grammar worksheets or even practicing
it in pair work which is contrived to force us to produce the form. However,
procedural knowledge is quite different. Use of procedural knowledge gives
us the ability to simply feel what is accurate in a particular context, and to
produce spontaneous speech with accuracy, or correct forms, and fluency,
or speed and flow. Procedural knowledge is related to skill, or ability to do or
perform, rather than encyclopedic or analytic knowledge of things. If you know
how to perform skilled behavior, you can relate to this distinction. Remember
learning to swim or ride a bicycle, or drive a car? Not even a world of abstract
instruction about strokes or pedals, or brakes and steering could have made
you actually capable of these skills. Only some subconscious senses of what
to do and how to do it can really enable you to perform in a skilled manner.
That is procedural knowledge.
But can declarative knowledge become proceduralized? Can explicit
understandings of things be transformed into relatively effortless skill in
using or implementing it? Yes, and connected to all of this is the concept of
automatization, or proceduralization, by which declarative knowledge may be
changed into procedural knowledge. Think of how you might have learned
MENTAL PROCESSING OF FORMULAIC LANGUAGE
55
to swim, ride a bicycle, or drive a car. The key to your ultimate success was
probably practice, right? In fact, automatization or proceduralization occurs
generally through a process of repetition and repeated use and recall. Through
automatization, content originally stored in the conscious mind can become
available for efficient use in real time. Knowledge which is proceduralized can
be said to be available for use more or less subconsciously, and one may
perform other tasks simultaneously. Lets take one of our examples and break
it into stepslearning to drive a car. At first all the necessary steps must be
explained and shown. The novice driver struggles to control the steering, the
brakes, the accelerator, and, if a standard shift vehicle, the delicate clutch, shift,
and accelerator control while shifting. Any distractions will tend to make the
novice driver lose controlnoise, the need to carry on a conversation, or any
physical effort in addition to the driving itself, such as controlling windshield
wipers, attending to other traffic, and so on. With repetition, however, the
performance becomes much smoother and less effortful. Eventually one
gets to the point where the driving itself is more or less automatic, and
you can chat, sing to music, drink beverages, smoke cigarettes, and so
on, simultaneous to driving. This is an example of how automatization of a
complex set of knowledges or actions can, over time with practice, become
skilled and performable concurrently with other tasks. So it is with language.
Think about someone progressing to be more and more fluent over time.
He or she first struggles to produce even content words, let alone anything
approaching grammar. With exposure and practice he or she can start to
laboriously create roughly grammatical utterances. But this takes quite a bit
of effort. His or her cognitive and affective resources are almost entirely taken
up with formulating utterancesretrieving words from the mental lexicon,
applying rules of syntax and morphology to them, lining them up, articulating
themall of this takes up most of his or her head space. Distractions,
stresses, or interruptions may cause him or her to lose the train of thought or
communication and need to start all over. With time, aspects of this process
become automatized and the student is able to produce utterances with less
excruciating effort, depending on context and so on. He or she can attend to
coming up with ideas to communicate, to planning the next things to say, and
so on, while actually producing language. Like the driver of a car, he or she is
now able to multitask and be more skilled and flexible with the passing of
time and plenty of practice.
Spontaneous speech
Producing spontaneous language is a shockingly complex thing, if you pause
to really look at it. Watch a person speaking in conversation or in a skilled way
in any context. Mind and the muscles of articulation are operating in synch in
56
a truly remarkable way. Ideas roll around the brain, simultaneous to weighing
contextual factors such as a sense of who is listening and what shared
knowledge and opinion exist. Clauses and phrases and words encapsulating
the ideas and meanings and nuances of the speakers mind clump together
and roll out into the air. And, remarkably, a listener can attend to and react to
the utterances in real time, showing comprehension, and even be driving a
car or cooking a meal, or watching a television program at the same time. This
seems miraculous and may be at least partly explained by the automatization
or proceduralization processes described earlier. It seems that ones memory
of language bits and pieces and rules, and so on is the underpinning of all
this. We recall what we have learned and implement it using another type
of memory. But if we take a look at the nature of human memory, we find a
surprising limitation which at the same time helps us to understand a bit more
the role of formulaic language in communication.
Long- and short-term memory

The concepts of short- and long-term memory are extremely relevant to how
language may be processed mentally. Long-term memory is essentially a
repository of all kinds of language knowledge. But digging around this store
of memory is really difficult and time consuming. For this knowledge to be
used, therefore, it must first be assembled in short-term memory. From longterm memory, words need to be selected to express concepts, and then
morphological, syntactic, and phonological rules need to be applied to them.
And here is where the nature of human cognition presents a huge barrier
to our ability to assemble language in this way. Unfortunately, or fortunately,
human short-term memory is limited to approximately seven or eight items
at a time (Anderson, 1983), making construction of utterances from scratch
as described earlier unlikely. There is a role here for another component
of memory termed working memory. The originator of the models of working
memory is Baddeley (1988), who came up with the concept of working
memory and an accompanying phonological loop, conceived of as a site
in which items encountered in input or retrieved from long-term memory are
rehearsed for production. So between long-term memory and the actual point
of utterance, working memory represents a launch pad for language. Here
the form-meaning relationship can be developed and retained. An example
of how this may work is the mental repetition of a newly encountered sevendigit telephone number so as to be able to recall it later. What happens
when someone tells you a seven-digit telephone number and you have no
immediate place to write it down or otherwise record it? What happens if
you need to cross the room to your phone or a pen and paper to record the
57
number? You need to resort to chanting or repeating the sequence of digits

as you go. If there is a distraction or interruption in the meantime, you may
forget the number. So you keep it in a loop in your mind in order to retain it.
Your short-term memory has been maxed out. As for language production,
Hulstijn (2001) and Robinson (1995) point out that such rehearsal is important
in lexical acquisition in particular. This may help to explain how formulaic
sequences may be acquired. They are then retrieved directly from longterm memory as chunks, bypassing the restrictions of short-term memory.
Think of it this way: assembling the number of words we need to produce
any reasonable sentence in English would overtax the short-term memory
rapidly. But assembling seven to nine formulaic sequences allows for many
words and clauses and phrases to be lined up and produced with much less
effort. Assembling seven to eight formulaic sequences in short-term memory
certainly allows for a greater volume of language and thought to be produced
or comprehended than seven or eight individual words!
Second language acquisition theory

and spontaneous speech
Second language acquisition theory provides clues as to how the ability to
produce spontaneous speech in these ways may develop over time. It may
be that frequency of exposure to input and language experience is key to
acquisition and automatic processing. From a connectionist perspective (see
Chapter 5), language production and comprehension are determined by vast
amounts of statistical information about how language behaves, how words
fit together, how they collocate, and so on. This experience with input allows
us to implicitly understand the likelihood of certain language items occurring
together. Ellis (2002) distinguishes between explicit and implicit memory,
noting that explicit memory is a conscious process of remembering a
prior episodic experience or fact such as questions like what did you have
for breakfast? (p. 145), whereas implicit memory is a result of repeated
encounters with a particular stimulus, and does not require any conscious
recollection or knowledge of particular events. In connectionism, each
encounter with or repetition of a stimulus strengthens memory connections
between the stimulus and the category to which it belongs, as well as the
characteristics of the stimulus which link it to the category. One basically
subconsciously recalls past encounters and assesses their similarity to the
new one, which is then classified accordingly.
These associations and categorizations are likely made based on multiple
sources. Research evidence with children segmenting words from continuous
speech shows that, using just phonotactic information, they can achieve a
58
success rate of 47 percent. Using utterance boundary and relative stress

increases the success rate to 70 percent (Ellis, 2002, p. 140). There is also
evidence of a cohort effect in lexical retrieval, in which exposure to or retrieval
of an initial phoneme of a word activates all words in the mental lexicon which
share that initial phoneme. As more information is retrieved, we reduce the
range of possibilities and the most high-frequency words are activated the
most in memory.
For formulaic sequences, then, it would seem that they are probably
automatized through repeated exposure in input, constrained of course by the
pragmatic requirements of the communication contexts one encounters most
often. They are probably stored and retrieved over time by a number of cue
sources, including initial phoneme classification and so on.
Storage and retrieval of formulaic

sequences as wholes
What exactly is meant by the notion of formulaic sequences being produced
and recalled as wholes? Obviously this has not been proven beyond a doubt,
and there are likely a number of possible answers to the question. It may be,
as Weinert (1995) pointed out some years ago, that they are retrieved based
on the order of their parts, by phonological cues, or by recall of first and last
words first, which then trigger recall of the entire sequence. Alternatively, the
sequences may be cognitively stored as clusters or bundles, and retrieved
based on salient pragmatic aspects, functions, meanings, and so on. As
well, there may be a continuum or spectrum of processing. Depending on a
particular context or due to cognitive stress from a communication situation,
a sequence might be retrieved sometimes as a whole automatically, at other
times only partially as a whole, and sometimes in a controlled step-by-step
way. It is also likely the case that some types of sequences are retrieved
more holistically than others, for example, a frame or discontinuous string
such as not only X but also Y, or a frame with a fillable slot such as a X
ago may be retrieved less holistically than, for example, a syntactically and
semantically opaque string such as by and large, or an idiomatic item such as
beat around the bush or a proverb such as a stitch in time saves nine.
It is one thing to consider how formulaic sequences (or language in
general, for that matter) may be retrieved and processed, but how can strings
of words become holistic units like this? There are several means by which
this might occur, but it is important to bear in mind that the answers are fairly
speculative. Perhaps it is a process of recognizing a meaning or function in a
string as a whole and storing it accordingly. Perhaps it is because of the utility
59
of a particular stringas an example, as reported earlier in this book, in my

past as an ESL teacher, I encountered a woman in my class named Marie
who had just arrived from Cambodia and spoke absolutely no English. She
was observant and silent for some time, and her first English utterance was
something which sounded like I no stan. Later, this adapted to sound more
like I do no stan. Clearly, she picked up the most useful utterance she could,
as a complete beginner, the utterance which could help to deflect attention
and make clear she was unable to communicateI dont understand. In
this case, a learner took, stored, and retrieved a lexical string based on its
pragmatic utility and function first and foremost.
Perhaps, not unlike children acquiring their first language, learners initially
may segment formulaic sequences from input and alter, fuse or combine
them. Perhaps consciousness, noticing, and awareness of sequences in
input leads to initial registration of the sequence as a lexical item and it is
then automatized through repeated exposure and/or use. There are a number
of potential research projects in this area, some virgin territory for new
researchers looking to uncover some of the processes involved in acquisition
of and perception of formulaic language.
Wrays heteromorphic mental lexicon

Wray (2002, 2008), a theorist, commentator, and critic in this area, has
elaborated a model of the mental lexicon which applied to formulaic language
based on syntheses of research conducted by a number of scholars. The
model basically consists of three rather commonsense ideas (adapted from
Wray, 2008, pp. 1521):
The mental lexicon is heteromorphicThe mental lexicon consists of
a variety of linguistic material which ranges from single morphemes
to lengthy multiword units. Some, but not all, of the multiword units
are formulaic. A formulaic sequence is in essence the equivalent
of a single morpheme in terms of the space it takes in the lexicon
and the processing effort required to produce or comprehend it. The
term Wray (2008) uses here for formulaic sequences is morpheme
equivalent unit.
The content of the lexicon is determined by needs-only analysisIn
dealing with input as children in first language acquisition, or as
adults in second language acquisition, we will only break down or
segment linguistic material to the extent that it is necessary. In
other words, if a string of words is readily assigned to a meaning
or function as a whole, we will not break it into parts for analysis
60
or productive use because the string functions perfectly well as a

unit. For example, the string How do you do? requires no analysis,
since it is readily assigned to the role of what is said when meeting
someone, and is fixed and invariate in form. Other multiword strings
may be broken into useful separate pieces, yet also remain stored as
a whole.
Morpheme equivalent units allow the speaker to connect with the
hearerPaying attention to the hearer often requires a speaker to be
more formulaic, so that the hearer may more efficiently grasp meaning
and affect. The more formulaic the utterances, the less processing
required.
Evidence of holistic processing

Despite the amount of theorizing and reasoning which can be brought
to bear on the notion that formulaic sequences are stored and retrieved
as wholes, researchers have struggled to find ways to uncover empirical
support. Some of the concepts outlined earlier in this chapter are helpful
in getting an overall sense of how formulaic sequences may be integrated
into cognitive theories of language storage and processing. But the burning
question still remains from the very first paragraph here: what is the
evidence that formulaic sequences are stored as wholes? It is obvious
that we store individual words in the mental lexicon, but it is less clear
whether we store multiword items in a similar way. Researchers have been
chipping away at this topic for some years, and we have reached a point
where there is growing support in the research for a claim that formulaic
language is processed faster and differently from nonformulaic language.
Research on idiom processing

One category of formulaic language which has received attention from
researchers who are curious about mental processing is idioms. Recall that
idioms are generally defined as having unitary meanings. But the research
linking idioms to mental processing as wholes has tended to use the fact that
there are basically two ways of interpreting many idioms, figurative and literal.
So the researchers in this area have zeroed in on the activation of the figurative
or literal meaning of idioms, and have given us some interesting ideas. One
such example is the work of Swinney and Cutler (1979), who propose that
when an ambiguous (literal and figurative interpretations both possible) word
string is encountered by a proficient speaker, both the analysis of the literal
meaning and the retrieval of the figurative interpretation are initiated, but the
61
figurative one will be activated first because it is faster. A figurative meaning

is generally holistic, in that it is a singular interpretation of the word string,
whereas a literal meaning might more likely involve some analysis of the
string as a syntactic/morphological unit, which would be significantly slower
than merely interpreting it as a whole.
One obvious angle of attack for researchers in trying to determine whether
idioms are processed holistically is to compare how we deal with them to
how we deal with novel or literal word strings. This type of idiom-focused
research has investigated the activation of idioms in comparison to novel
phrases with nonfigurative interpretations. One way to determine whether
idioms are processed differently from nonidiomatic strings is, of course, by
measuring the speed at which people react to them. Perhaps unsurprisingly,
a range of studies have found that native speakers process idioms faster
than they process novel strings of words (e.g., Gibbs & Gonzalez, 1985;
Swinney & Cutler, 1979; Van Lancker, Canter, & Terbeek, 1981). However,
similar research using nonnative speakers as subjects is ongoing and
in some ways quite complex. For example, Van Lancker-Sidtis (2003) found
that native speakers were able to use prosodic (pronunciation and accent/
stress/word blending) cues to determine whether particular idioms were
used in a figurative or literal way, whereas nonnative speakers did not.
Cieslicka (2006) found that, in contrast to native speakers, as discussed
earlier, nonnative speakers reacted more quickly to literal interpretations of
idioms than to figurative ones. Conklin and Schmitt (2008) and Underwood,
Schmitt, and Galpin (2004) found that native speakers and proficient
nonnative speakers shared an ability to process idioms in texts faster than
novel language. Siyanova-Chanturia, Conklin, and Schmitt (2011) conducted
a study using idioms in reading passages and eye-movement tracking, and
found that native speakers processed the idioms more quickly than novel
language, while nonnative speakers did notand in fact often processed
idioms more slowly than other language. These studies are described in
more detail in elsewhere in this book.
So we have, from the studies described briefly here, a sense of idioms
being processed differently from nonidiomatic strings. But, while this
research involving idioms presents some tantalizing evidence of processing
phenomena, unfortunately it is hard to generalize from it to any great extent.
For one thing, the fact that idioms often have two interpretations, the literal and
figurative, can slow processing as choices have to be made while processing
them. As well, idioms are not all equally transparent or decompositional, which
makes it hard to present sweeping conclusions even about idiom processing,
let alone processing of all other types of formulaic language. And finally, it is
a fact that idioms represent a subset of formulaic language and are actually
not very frequent in language, meaning that, nonnative speakers are unlikely
62
to have encountered many idioms very often (Conklin & Schmitt, 2012). In
light of all of these restrictions on concluding from the idiom research, we
still need to approach the study of mental processing with a mix of caution
and enthusiasm. While the idiom research provides us with a few tantalizing
bits of knowledge, the study of mental processing of other types of formulaic
sequences may be a much richer and more rewarding area to investigate.
Research on formulaic language other than idioms

Fortunately, a look at some research focusing on nonidiomatic formulaic
language shows us that there is some evidence of processing advantages.
Classic work by Kuiper (1996, 2004) noted that the speech production of
certain performers such as auctioneers and sports commentators, who need
to speak at much faster rates than normal, is replete with formulaic language.
This gives us an impression that perhaps it is the formulaic language which
is easing the processing load of producing utterances at such rapid and
unrelenting speed, a clue to a processing advantage.
A range of quite delicately and carefully designed studies have attempted
to tease out the nature of processing of various formulaic sequence types. The
key variable in most of these has been the relative frequency of sequences.
One such study by Sosa and MacFarlane (2002) investigated the word of
used in high-frequency and low-frequency two-word combinations. Using
auditory word monitoring, they found that native speakers reacted more
slowly to the word when it was in a high-frequency combination, indicating
that they were used to processing the frequent two-word sequences as
wholes and were slowed down by the need to deal with the sequence word
by word. Similarly, Bod (2000, 2001) had native speakers read frequent and
less frequent three-word sentences and found that they reacted more quickly
to the higher frequency items. Arnon and Snider (2010) conducted a similar
study using compositional (comprehensible from individual words) fourword phrases at different frequency levels and found that higher frequency
items were processed faster than lower frequency. Tremblay, Derwing,
Libben, and Westbury (2011) found that sentences containing lexical bundles
(see Chapter 8) were processed faster than those without lexical bundles, and
Tremblay and Baayen (2010) found that electrophysiological measures show
that native speaker processing of frequent four-word sequences was faster
than for less frequent sequences, and that the frequent items appeared to
be stored as both parts and wholes. Eye-tracking work by Siyanova-Chanturia
et al. (2011) found that both native and nonnative speakers read more frequent
formulaic sequences faster than less frequent ones. Taken together, these
studies appear to be unanimous in indicating a faster processing speed for
63
frequent sequences compared with less frequent ones. It is intriguing to see

that in some cases this effect was present in the processing by nonnative
speakers as well as native speakers.
Research on brain-damaged individuals

The study of processing by brain-damaged individuals has also helped to
understand how formulaic sequences are processed. Van Lancker and Kempler
(1987) studied how left- and right-brain-damaged people processed formulaic
and novel phrases. Note that the right brain is generally understood to be the
site of holistic processing of information, and that the left brain is the analytic
and discrete-point processor. Using picture-matching auditory comprehension
tests, they discovered that left-brain-impaired participants performed better
on recognizing formulaic phrases, whereas right-brain-impaired participants
deal more readily with novel phrases. This indicates that the right brain, which
specializes in holistic processing, has much to do with formulaic language
processing, giving some support to the notion that formulaic sequences may
be processed as wholes. In a later study van Lancker-Sidtis and Postman
(2006) found that left-hemisphere-damaged individuals produced significantly
more formulaic language than participants with right hemisphere damage,
extending the notion of right brain holistic processing into production as well
as recognition. These types of studies offer much in the way of understanding
of how formulaic sequences may be processed. The comparison of the two
hemispheres in damaged individuals gives us a sense of where processing
may be happening in undamaged brains. Taken together with the frequency
studies and the idiom studies discussed above, this brings the nature of
processing of formulaic language into clearer and clearer focus.
Other types of research

Other types of research into processing of formulaic language have used
dictation tasks as a means of data collection. A study examining whether
formulaic sequences extracted from corpora are also psycholinguistically
processed as formulaic is that of Schmitt, Grandage, and Adolphs (2004).
The researchers extracted a range of types of formulaic language from
corpora and embedded them in a dictation task for both native and nonnative
speakers. The utterances in the dictations contained more words than shortterm memory could manage, generally twenty to twenty-five words, forcing
participants to reconstruct the language. Many of the formulaic sequences
were in fact reconstructed intact, reflecting some evidence of holistic
processing, but some were not, perhaps indicating that some individuals may
64
process formulaic sequences as wholes under some circumstances, and as

strings of individual words under other circumstances. This type of research
has rich implications for our understanding of these phenomena, however,
and certainly bears replication and refinement in future work.
What can be concluded?

An overview of some basic concepts in cognition and language, as well
as a review of some relevant research, leads us to an interesting place.
The evidence, as discussed earlier, certainly appears to indicate that
adult native speakers and proficient nonnative speakers have, in some
ways, mental representations of formulaic sequences as wholes. Thinking
back to the discussions of automatization and procedural knowledge, it
makes some sense that frequency seems to play an important role in the
holistic storage and processing of formulaic sequences. Automatization
will occur since repeated exposure to a word string with a particular
meaning or function over time likely leads to it becoming entrenched in
memory and stored as a whole. Researchers who take a generative
grammar approach to language such as Pinker (1999) would likely argue
that any effect of frequency would apply only to single words, since, in
their view, the lexicon and the grammar are separate. However, when
dealing with formulaic language, it is much more logical to integrate more
data-based and encompassing theories of language. For example, usagebased (e.g., Goldberg, 2006; Tomasello, 2003) and exemplar-based models
of language (e.g., Bod, 2006) see language learning as the acquisition of
constructions which vary in length and complexity. Constructions are seen
as units of form and meaning which may be as small as a single morpheme
or word, or as long as an entire sentence. These models would include
formulaic language as a type of construction just as a single word would be,
and so a formulaic sequence could be subject to the effects of frequency
in acquisition, storage, and retrieval. In this view, faster processing of any
frequent item, be it a word or a sequence, is logical.
Formulaic sequences, then, do indeed appear to be represented mentally
as if single words, and are processed faster than novel language. The exact
nature of the processes underlying the faster processing is still extremely
uncertain. It may be that words which co-occur often are more strongly
connected mentally and semantically. For example, when we encounter a
word or set of words such as fish and our brain activates chips. Interestingly,
this particular combination occurs in standard American English as well as
British English, despite the fact that outside of this combination the term chip
is not used for this type of fried potato in American English, which labels
it a French fry or a fry instead. In accordance with the idea that language
65
acquisition is fundamentally a matter of exposure to input and subconscious

compilation of statistics of frequency of co-occurrence, it may be that the
probability of these words occurring together is greater than other possible
combinations, and so chips wins the race for what is to be activated next. If
this is the case, then the sequence is not being dealt with as a chunk, but as
a rational and probable combination of items.
Regardless of the details of whether sequences are dealt with as
rational combinations or as chunks in and of themselves, we have seen
some interesting evidence of some type of unitary mental representations
of formulaic sequences. Idioms tend to be dealt with faster by means of
their figurative or holistic meanings. Higher frequency word sequences are
processed more quickly and efficiently by both native speakers and high
proficiency nonnative speakers. Taken together with the brain lateralization
evidence, all of this indicates that formulaic sequences are processed as
wholes, or, if we take the approach that words activate each other according
to subconsciously held frequency data, as sets of probability formulas.
In summary
From this general overview of research and theory on mental processing of
formulaic language, it is really interesting to note that despite the quantity of
research which has explored the notion of retrieved and stored as wholes,
there are still unanswered questions. It may be stated that formulaic language
is likely to some extent dealt with holistically. But is this always the case?
Can a given sequence sometimes be dealt with holistically and sometimes
constructed in a more synthetic, conscious manner? If so, what are the factors,
cognitive and contextual, which might influence whether the sequence is
dealt with holistically or not? Is it also perhaps the case that sequences may
fit on a spectrum of holistic processing, with, for example, collocations and
idioms being on the holistic end of the spectrum, and lexical bundles or lexical
phrases being dealt with in more constructed way? Are there answers in
second language acquisition theory that we have not yet encountered?
A few of the many themes and patterns which the research shows are:
MM
MM
MM
Formulaic sequences are probably mentally processed more or

less holistically.
Frequency and automatization likely play a role in the holistic
processing.
A large amount of the evidence for holistic processing comes from
research on idioms.
66
MM
MM
A great deal of the research in this area is highly experimental and

does not deal much with real life language use.
Studies of brain lateralization of language processing indicate
that formulaic language is dealt with holistically in the right
hemisphere.
One interesting conclusion to be drawn from this survey of research in the

area of mental processing is that higher frequency sequences appear to be
processed holistically. This makes us wonder about the nature of input and
the power of exposure to language, especially naturally occurring language.
It may be that the research in this area does indeed reinforce the ideas of
the associative, usage-based schools of second language acquisition theory.
But do the evidence and the theory have real implications for our work as
researchers and as language teachers, assessors, editors?

1 Before reading this chapter, were you willing to take the notion of
holistic processing as a given?
2 Summarize the evidence for holistic processing based on the idiom
research. What is it about idioms that lends them to this type of work?
3 Summarize the evidence for holistic processing based on brain
lateralization research. How relevant is this to our work as researchers
and practitioners?
4 Summarize the evidence for holistic processing involving nonidiom
sequences. Does this research bolster the claim of holistic
processing? If so, how?
5 Is holistic processing the central element of a definition of formulaic
language? What other features are important?
6 Is holistic processing important for other areas of research into
formulaic language, for example, corpus studies, acquisition and
teaching research, and so on?
7 As a language teacher, how does the notion of holistic processing
affect how you might present and introduce formulaic language to
learners?
8 As a language teacher, how does the notion of holistic processing
affect how you might provide feedback to learners?
9 Does the idea of holistic processing affect the spoken language
differently from written language? If so, how?
10 Create or obtain a small corpus of spoken language, either first or
second language. Can you detect evidence of holistic processing in
the way in which the speech is produced?
5
Formulaic Language and
AcquisitionFirst and
Second Language
t is interesting to note that, despite the wealth of research which exists

on formulaic language from a range of perspectives, there is relatively little
empirical work on its role in language acquisition, or how it is itself acquired.
The reasons for this are obscure, but it is likely the case that the fixation of
linguistics and applied linguistics on acquisition of morphosyntax is at least
partly to blame. Also, while the research on vocabulary acquisition grows
ever richer, there is only a certain amount of overlap between that body of
work and that which concentrates on formulaic language. Vocabulary research
tends to focus on single words and their meanings, while much of formulaic
language constitutes, as we have seen, more than mere meaning units, often
with functions in discourse and so on.
This is not to say that the work which has been done on formulaic language
and language acquisition is of low quality or marginal worth. On the contrary,
the relatively small body of work into this area has yielded some tantalizing and
useful results in the area of child first language acquisition. For adult second
language acquisition, however, the research results, while still tantalizing,
are much thinner. This is an area which is crying out for quality research. The
coming years will no doubt bring us plenty to consider and learn.
First language acquisition

The study of formulaic language in child language acquisition has given us
some serious knowledge with which to understand child language. For one
thing, there is a certain amount of evidence of formulaic sequences being
used as learning and communication strategies by children in first and
68
second language acquisition. It appears that initial first and second language
acquisition in children includes attending to formulaic sequences in language
input, adopting them for use, and later segmenting and analyzing them. The
analysis may take place later partly as a result of neurological development
and a resultant increase in analytic cognitive skills.
Early research
The first serious study of formulaic language in child language acquisition
dates back to the 1970s. The first such studies were basically case studies
of individual children and their progress through the acquisition of language.
Wong-Fillmore (1976) was one of the very first to study the second language
acquisition of a child and find that one prominent process involved formulaic
chunk acquisition. Her data further revealed that this was followed by a
process of segmentation or syntactic and semantic analysis and breakdown
of the acquired chunks. This in turn furthered development of overall linguistic
competence. Another early researcher in the area, Hakuta (1974), conducted
a sixty-week study of the second language acquisition of a Japanese child and
found evidence of initial acquisition of prefabricated chunks later analyzed
and used to facilitate overall language development. Much later, in a similar
vein, Hickey (1993), in a longitudinal examination of the acquisition of Irish
Gaelic of a child, also discovered a role for formulas in acquisition. Again, she
found that they were later broken down and analyzed, providing grist for the
linguistic competence mill.
A turning point in the first language acquisition research came in the early
1980s with Anne Peters seminal piece of work on child language acquisition.
Peters (1983) documented how the process of formulaic chunk acquisition and
later segmentation, as established by Wong-Fillmore and Hakuta, might actually
work. Peters claims that there is evidence for eight assertions about the process:
1 First acquisition units by children often consist of more than one
morpheme.
2 There is no difference between these units and minimal ones in terms
of storage.
3 All of the polymorphemic units can be segmented (broken down).
4 Smaller units from segmentation are stored in the lexicon.
5 Both the original unit and the segmented ones can be stored in the
lexicon.
6 Segmentation produces structural information, starting with simplest
frames with slots, then generalized into patterns.
FORMULAIC LANGUAGE AND ACQUISITION
69
7 The lexicon grows through units perceived in conversation and their
segmentation, as well as fusion (storage of combinations).

8 Fusion continues into adulthood.
According to Peters, the child very early and quickly develops strategies for
extracting meaningful chunks from the flow of conversation. This may be
based on any of a range of cues, for example:
1 the utility of the chunk for his/her own needs
2 the result he/she observes occurring when the chunk is used among
adults
3 the frequency with which he/she is exposed to the chunk
4 some attractive aspect of the phonetics or prosody of the chunk
He/she is able to remember the chunks, compare them phonologically with

others, and remember them as new lexical units. They are initially stored
as wholes in the lexicon as individual words or as multiword units. Later in
his/her cognitive development, he/she is able to analyze the stored chunks
and then recognize and remember structural patterns and information about
distribution classes revealed by the analysis. He/she is then ready to develop
an ability to utilize lexical and syntactic information already acquired to analyze
new chunks in the linguistic environment.
Wong-Fillmore, Hakuta, and Peters set the stage for further analysis of
these child language acquisition processes. These dynamics of acquisition of
formulaic sequences and their use as a basis for creative construction were
investigated much later by Myles, Hooper, and Mitchell (1998) and Myles,
Mitchell, and Hooper (1999) in child learners of French as a second language
in a classroom context. As expected, Myles and her research associates
found that the young learners in their studies did in fact acquire and use
formulaic sequences as wholes, but they also used segmentation of the
formulas to enhance their increasingly complex communication needs over
the two years of the research project. In other words, as the learners worlds
expanded due to increased experience and cognitive development, and they
became more aware of the world around them and the need to participate in
communication, they were forced to move away from simply using multiword
chunks to communicate. Initially, the learners were able to use unanalyzed
wholes to communicate simply, but they began to break the formulas apart and
use components in different ways as their routine classroom communication
needs developed beyond simple communication of personal information into
a need to discuss third person activities and characteristics. When the third
70
person communication needs grew, the segmentation process began and

then accelerated (Myles, Hooper, and Mitchell, 1998, p. 359).
A role for pragmatic competence

What might underlie the need to expand language abilities with increased
exposure to others? Some researchers have been able to determine that
processes related to pragmatic competence are at work when children
acquire formulaic sequences. Bahns, Burmeister, and Vogel (1986)
investigated the second language acquisition of a group of children and
found evidence of a formula segmentation process at work. They found two
particular pragmatic factors at work in the use of formulas by the children,
namely, participation in situational frames requiring their use, and frequency
of occurrence of the formulas. The authors note that it was common for
researchers to discover exceptionally sophisticated language in stretches of
child learner speech in research:
In their attempts to write grammars for different stages of development,
mainly in structural areas like negation or interrogation, child language
researchers were very often confronted with utterances of a rather complex
nature. The structure of these utterances was somehow outside the rules
written to account for the bulk of data representing syntactic development
for the stage in question. (pp. 696, 697)
In their study, Bahns et al. found a large range of formulas used by the
children, accounting for the complex utterances noted by earlier researchers.
The categories found included:
1 Expressive formulasindicators of a sudden state of mind, for
example, shut up, stupid idiot, thank you

2 Directive formulasintended to change the hearers behavior, for
example, lets go, knock it off, wait a minute

3 Game or play formulastied to specific play activities, for example,
whos up, youre out

4 Polyfunctional formulasexceed a single semantic-pragmatic value,
for example, what is it? I dont know

5 Question formulaselicit information, for example, how come? What
time is it?
6 Phatic formulasto establish, prolong, or discontinue interaction, for
example, good bye, see you later, You wanna see X?
71
The researchers also found signs of a pattern of development of use of the

formulas, starting with use of the simpler expressive and game formulas. This
was followed by a broadening of the range of formulas and scope of use as
pragmatic awareness and ability grew, and, eventually, full nativelike selection
and use of formulas with more precise knowledge of when an expression is
pragmatically appropriate.
A double role for formulaic language

An important and interesting point to note is the double role of formulaic
sequences as an element of child language acquisition. First, they are acquired
and retained in and of themselves, linked to pragmatic competence and
expanded as this aspect of communicative ability and awareness develops.
Second, overlapping in time, they are segmented and analyzed, broken down,
and combined as cognitive skills of analysis and synthesis grow. Both the
original formulas and the pieces and rules which come from analysis are
retained.
It has been observed that children basically acquire their first language
by attending to and imitating the speech of others. Traditionally, linguists
operated on the assumption that the acquisition process in children passed
through four general stages:
1 acquisition of words
2 classifying words into categories
3 inferring rules for combining words
4 producing and understanding by combining or analyzing word
sequences on the basis of rules

However, recent theory and research tends to focus more on the guidance
and scaffolding provided by attending to and communicating with adults in a
childs environment, rather than how children may break down and reconstruct
language in a linear fashion.
Children in fact tend to pick up and use pieces of language which are
useful for them in satisfying their needs, or which provide opportunities for
language play. The formmeaning combinations which they pick out of input
are stored separately yet overlap in memory, and provide a source of growing
or emergent grammar through subconscious cognitive processes on various
levels, forming schematic patterns which eventually become available for
analysis. Yet, at the same time, the original chunks perceived and taken from
input remain for use as wholes, which helps explain the formulaic nature of
language use.
72
Studies which examine childrens communications with caregivers provide

plenty of data about the nature of the language children hear and what they
pick up. The quantity of data has increased over the past five decades or so,
and evidence has grown that the language children experience is, for one
thing, quite repetitive. Cameron-Faulkner, Lieven, and Tomasello (2003) note
that over 50 percent of all utterances mothers direct at two-year-old children
begin with fifty-two item-based phrases. Most of these contained two words
or two morphemes. Therefore, we can conclude that children are exposed
to language which is highly formulaic in nature, and which is restricted in
variety, and very repetitive. As well, given the nature of this input, and the
restricted linguistic needs of a child, it stands to reason that he/she would
take and retain the word sequences more or less as wholes and retain them
in memory.
Moving beyond the nature of the input, if we look at childrens language
production, we can see that indeed they tend to reproduce multiword
constructions from the input. A typical type of research method used to
examine childrens language output is the traceback method, in which
researchers look at the multiword sequences produced by a child after a
period (perhaps weeks) of recording, and comparing it to the productions
from the previous time periods. In a classic such study, Lieven, Salomo,
and Tomasello (2009) studied thirty hours of the speech of each of four
2-year-old children. It was found that 20 to 40 percent of the utterances in
the final two hours were sequences the children had used in the previous
twenty-eight hours. Furthermore, 40 to 50 percent of these were identical
to previous sequencesexcept for single fillable content word slots,
generally references to a person, place, or thing. This constitutes strong
evidence that childrens speech, at early stages at least, is highly formulaic
and limited in range.
How, then, do children create mental representations of formulaic
language? Intriguing evidence has been uncovered of the power of frequency
in this process. Looking at the nature of the input we also see evidence of
childrens creation of representations of frequent sequences. A key study
investigating this is that of Bannard and Matthews (2008), who had children
repeat back frequent word sequences and less frequent sequences which
were different in the final word only. They found that they were faster and
more accurate in uttering the initial, shared part of the sequence when
repeating the higher frequency one. This speed and accuracy of uttering the
common, shared, part of a high frequency sequence points to a strong role
for frequency in retention and storage of formulaic sequences by children. It
is as if they store the entire high-frequency sequence for ready accessibility,
while a similar sequence with lower frequency will be stored more or less as
a whole too, but in a less readily accessed way.
73
Earlier, researchers noted that the morphosyntax of language in child

language develops from segmentation and breaking down of initially stored
multiword chunks. The growth of grammar from this process of creation of
representations of strings has actually been tested. In an important such study,
Bannard, Lieven, and Tomasello (2009) analyzed the speech of two children
when they were at age two and again at age three, discovering that reliance
on formulaic language to communicate lessens over time. The researchers
inferred grammars from the data and found that lexically specific grammars
covered 84 and 75 percent of the utterances at age two, and 70 and 81 percent
at age three. At age two, 60 percent one childs utterances consisted of a
single simple grammatical operation, and at age three roughly 50 percent
consisted of only two operations. For the other child, at age two, 60 percent
of utterances consisted of two operations, and at age three, 60 percent of
utterances were accounted for by five operations. It appears, then, that at
age two the children are communicating in a very formulaic fashion, and by
age three their language starts to appear more productive. In general, evidence
points to a process in which children extract formulaic sequences from input
and use it to develop productive language. This seems to be accomplished by
perceiving and playing with the phonological and semantic shared features of
the formulaic sequences.
What is the nature and dynamic of the balance between formulaic and
productive language in child language acquisition? In classic linguistic research
fashion, looking at the errors children produce while speaking can help shed
light on competition between embedded formulas and productive language.
This helps to show how formulaic language remains active even as productive
grammar emerges. For example, children make errors in which they use me
in the subject position, which can be taken as evidence of their extracting
formulaic pieces out of more complex sentences. Kirjavainen, Theakston,
and Lieven (2009) found that this type of error was linked to a grammatical
structure common in caregiver speech, the use of me directly previous to
a nonfinite verb, for example, let me do it. The seventeen children in the
study even tended to use the me form incorrectly in utterances in which their
caregivers had used these particular types of verbs. As well, children tend to
make errors of using nonfinite forms when a finite one is correct, for example,
go to work instead of goes to work. Some researchers (e.g., Freudenthal,
Pine, & Gobet, 2010) suggest that this type of error may vary according to
how often nonfinite verb forms occur at the end of caregivers utterances.
Utterance-final word sequences in caregiver speech are much more likely to
be adopted by children for use in their own speech.
Question inversion is another area of errors which has significance for
determining whether children retain formulas even while generating productive
grammar. In making these types of errors, children will tend to simply add the
74
interrogative word to the front of the declarative form of the sentence, as in

what he is doing or what does he doesnt want. Rowland and Pine (2000)
examined these types of errors in the utterances of and the input received
by a child between the ages of two and five, and found more errors if the
combination was relatively rare in the input, and fewer if the combination was
frequent in the input. Ambridge, Rowland, Theakston, and Tomasello (2006)
noted this type of error in the utterances of three- and four-year-olds and found
that the most likely cause of errors was in fact the nature of the specific word
combinations. Taken together, these studies provide some evidence of the
possibility that children take and retain sequences as wholes from caregiver
input, and that they still have a strong influence on speech even as productive
grammar begins to emerge. The evidence of a sort of competition between
the retained formulas and their use as data to structure emergent grammar
lends credence to the claims of Peters (1983) and Wong Fillmore (1976) from
many years ago.
Second language acquisition

A body of evidence has also been collected over the years of a role for
formulaic sequences in the process of adult language acquisition, but the
development processes uncovered by researchers in this area are not exactly
like those found in the child language acquisition studies. The evidence is also
more limited for adult language acquisition.
It was the 1980s before serious work in this area was undertaken. Yorio
(1980) was an early investigator of adult language development and formulaic
sequences. Examining several longitudinal studies based on instructed adult
learners written work, he found that unlike children, adult learners do not
make extensive use of prefabricated formulaic language, and when they do,
they do not appear to use it to further their language development. Instead,
they appeared to use it more as a production strategy, to economize effort and
attention in spontaneous communication.
A keynote study by Schmidt (1983) consisted of an in-depth case study of
the English language development of a Japanese adult in Hawaii, uncovering
a definite role for formulaic sequences. In fact, the learner under study used
a great and ever-increasing number and range of formulaic sequences as a
communication strategy, while appearing fossilized and grammatically inept
in other aspects of language. Schmidt found that, while the research subject
was highly motivated and rapidly acculturating, he remained resistant to error
correction and yet managed to develop linguistically and adapt socioculturally
almost exclusively through use of formulaic sequences. It is important to note
that, in this case at least, there was little or no evidence of the processes of
segmentation and analysis which so characterize the child acquisition studies.
75
Other early researchers found that, as appears to be the case with

child language learners, adult learners tend to use formulaic sequences
as communication and learning strategies. For example, as noted earlier,
Bolander (1989), in a study of acquisition of Swedish by adults, found that
formulaic sequences contributed to a greater facility and economy in learning
and use. The adults in this longitudinal study consistently used prefabricated
language units which contained target language structures well in advance
of demonstrating that they had actually acquired the structures themselves.
Like the child subjects of Hickey (1993) and Peters (1983), they produced
formulaic sequences which contained language which outstripped their
normal abilities. As well, Bolander noted that the learners appeared to
sometimes use standard or reliable canonical formulas to help in acquiring
specific rules of Swedish syntax.
Much later, in the 1990s, Ellis (1996), in an overview of sequencing in
language acquisition, finds a role for formulas in adult language acquisition.
He asserts that much of language acquisition is really acquisition of
memorized sequences, and that short-term repetition and rehearsal permit
the development of long-term sequence information for language. In turn,
this information allows chunking of working memory contents to these
established patterns. Long-term storage of frequent language sequences
allows them to more easily serve as labels for meaning reference, and they
can be accessed more automatically. The result is more fluent language use,
freeing attentional resources for dealing with conceptualizing and meaning.
Ellis asserts that multiword units in long-term storage serve as a database for
grammar acquisition.
It appears that adults in naturalistic second language learning
environments, like children, tend to acquire and use formulaic sequences.
However, the established cognitive and learning styles of adults, their
diverse acquisition contexts, knowledge of the first language, and other
factors make for more variety in the route of language acquisition generally,
and with regard to use of formulaic sequences specifically. Some adults
may be more analytic and seek to infer rules from chunked units or from
pieces of input, while others, such as Schmidts (1983) subject, may rely
heavily on acquired formulas and not attempt to break them down or analyze
them. Furthermore, degree of literacy and type and degree of instruction
may play a part.
Second language acquisition theoretical models

Perhaps the most promising area of study of adult language acquisition and
formulaic language has been in examining the links between use of formulaic
language and specific second language acquisition theoretical models.
76
Emergentist or associative models of second language acquisition lend

themselves to connections to formulaic language research. Ellis (1996, 2002,
2012) has been a strong proponent of emergentist models of language
acquisition and the importance of formulaic language. In 2002 he pointed
out that second language learner sensitivity to sequence information, or
the statistical probabilities of linguistic elements, was likely evidence of
implicit knowledge of formulaic language. A model of the developmental
sequence of acquisition therefore is formulaic sequencelimited scope
slot and frame patternproductive grammar. Note that, in emergentist or
associative models of language acquisition, this principle should apply to
second language as well as first language acquisition.
The power of memory in language acquisition and its link to the role
of formulaic language in acquisition is undisputed. Phonological shortterm memory (PSTM) appears to be crucial to language acquisition, with
evidence showing that learners with better ability to sequence linguistic
items in PSTM are more successful in acquiring vocabulary and grammar
(Ellis, 1996). Other researchers have worked to show the power of PSTM
for various aspects of language use. OBrien, Segalowitz, Collentine, and
Freed (2006) showed that fluency gains in second language were linked to
strong PSTM in second language learners, and Kormos and Safar (2008)
found that PSTM correlated to a certain extent with second language
writing ability and with speech fluency and vocabulary in intensive English
as a foreign language study. Wen (2011) found that PSTM correlated with
lexical diversity and syntactic complexity in second language speech. Martin
and Ellis (2012), using an artificial language, discovered that vocabulary
and grammar development were strongly influenced by PSTM. It appears,
then, that PSTM affects the learning of word forms and the retention of
sequences of forms.
Attention to formulaic sequences in

comprehension and production
A certain amount of recent research has focused on attention to formulaic
sequences in comprehension and production. Some of this research has been
conducted with first language participants, but the implications for second
language acquisition are clear. The following several paragraphs present some
of the relevant research in these areas.
It seems that both lexical processing and phonetic processing are
influenced by knowledge of formulaic language. As for phonetic processing,
a classic example of this type of research is that of Hilpert (2008), who used
the make-causative constructionin this construction make occurs with cry
seventy-three times and with the verb try just eleven times. However, try is
77
some ten times more frequent as a word in general discourse, making the
make X cry construction appear quite formulaic. In the study, first language
participants were required to identify whether they heard cry or try after
the carrier phrase they made me, and the signal ranged from try to cry on
an eight-step continuum. The ambiguous sounds were more often perceived
as cry, showing that the formulaic nature of the make-causative construction
was quite powerful.
Reading time also appears to be influenced by formulaic knowledge. Bod
(2001) showed that higher frequency three-word sentences such as I like it
were reacted to faster by native speakers than low frequency ones. Ellis,
Frey, and Jalkanen (2008) showed that native speakers are quick to read and
process frequent collocations with verb agreement and booster-maximizeradjectives. Arnon and Snider (2010) showed that more frequent phrases were
processed faster than less frequent ones even when they were matched
as to the frequencies of individual words. Tremblay, Derwing, Libben, and
Westbury (2011) used three self-paced reading tasks with lexical bundles
(see Chapter 8) and matched control sentence fragments to show that the
lexical bundles were read faster.
Studies involving retention of material in short-term memory and accurate
subsequent reproduction show an influence of knowledge of formulaic
language. Bannard and Matthews (2008) found that children were more likely
to reproduce familiar sequences correctly than less frequent or familiar ones,
and to reproduce them faster. Studies of priming, in which sequences recently
encountered in communication are reproduced later, show a priming effect
for hearing, speaking, reading, or writing sequences (see e.g., McDonough
and Trofimovich, 2008 and a growing body of subsequent work).
Most of the aforementioned studies involved native speaker or child
participants. With second language participants, some remarkable work has
also been done. Conklin and Schmitt (2008) found that formulaic phrases were
read faster than matched nonformulaic phrases by both native and second
language participants. Ellis and Simpson-Vlach (2009) and Ellis, SimpsonVlach, and Maynard (2008) found that second language learners processed
formulaic language more effectively if it was of high frequency, as opposed to
native speakers, who processed faster those sequences which also exhibit a
high rate of mutual information (MI), which measures the statistical likelihood
of words collocating.
Extensive exposure to formulaic language appears to aid fluency of
speech. For example, Taguchi (2007) studied the development of speech
abilities in students drilled in word chunks and found that they used more
correct chunks after instruction and that they were more aware of discourse
features. Wood (2006, 2009a, 2009b) and Wood and Namba (2013) have
shown that exposure to and practice with formulaic sequences has positive
78
effects on speech fluency and effective communication. See Chapters 6 and

9 of this book for details on these studies.
Developmental sequence of acquisition

To determine the extent to which formulaic language affects the overall
acquisition process would entail finding a process somehow similar to that
we have seen with children, in which embedding of formulaic sequences
somehow seeds or bleeds into overall language acquisition. This is
complicated. For example, some formulaic sequences are not necessarily
acquired as wholes by adults, but are easily learnable by virtue of their high
frequency and extreme functionalityHasselgren (1994) remarks that even
advanced second language learners will use high frequency words rather
than risk the time and cognitive energy to search for and utter alternatives.
These safe words and phrasesislands of reliability (Dechert, 1980) or
teddy bears (Ellis, 2012; Hasselgren, 1994)are most probably the
sources of the seeding which may happen between formulaic language
and overall second language acquisition in adults. Other less frequent and
more semantically opaque and less functionally obvious sequences are much
harder to acquire, which explains why second language learners tend to
underuse formulaic language.
In any case, evidence does exist of learners using formulaic sequences
with much more complex syntax than their creatively constructed language.
This is clearly a communication strategy of great value to learners. Myles
(2004) and Myles et al. (1999) examined the oral language of second language
learners of French and found that their use of formulaic sequences was full of
complex syntax which did not show up in creatively constructed language
until later, and those learners who acquired formulaic sequences readily at
first appeared to be the ones who acquired complex grammar faster later on,
likely as a result of analyzing the sequences: these chunks seem to provide
these learners with a databank of complex structures beyond their current
grammar, which they keep working on until they can make their current
generative grammar compatible with them (Myles, 2004, p. 153). Eskildsen
and Cadierno (2007) noted that a Mexican learner of English only used donegation correctly in English L2 in the utterance I dont know, but gradually
expanded this to use with other pronouns and verbs as he abstracted from
the exemplar.
Sugaya and Shirai (2009) in a ten-month analysis of acquisition of Japanese
tense-aspect morphology by a Russian learner discovered that she tended
to use particular verbs only with particular aspect markers. A follow up study
with groups of low- and high-proficiency learners found that lower proficiency
learners tended to link particular verbs to aspect markers but that this
79
tendency shifted as proficiency developed, which can be taken to indicate

that learners tend to begin with very context and item-specific pattern control
which evolves over time to allow more actual control over the syntactic rules
themselves.
In summary
From this short overview of the research into first and second language
acquisition of formulaic language, some patterns and themes emerge. One
important element is the notion of segmentation of formulaic sequences from
the input, and subsequent breakdown of the stored sequences and use of
their constituent elements for development of the language system, grammar,
and so on. The research into adult second language acquisition of formulaic
language is heavily concentrated in naturalistic contexts of acquisition, and
leaves little for classroom or formal language teaching practitioners to work
with. A great deal of the work with adults has involved native speakers or
second language speakers of high proficiency. Is it possible that second
language learners do acquire some formulaic language units as wholes at
first and then break them down as time passes? Or do they tend to recognize
the sequences as strings of discrete and separately recognizable units, and
only when instructed to, perceive them as wholes? Is there a blend of both
of these types of processing and acquisition? More research is needed to
determine whether this is so and if so, how it workswhat makes some
sequences salient as wholes and others not? A few of the many themes and
patterns which the research shows are:
MM
MM
MM
MM
MM
Formulaic sequences are acquired as wholes by children.

Formulaic sequences are likely retained as wholes by children and
also later broken down and their constituent parts used as material
for subsequent acquisition of morphosyntax and so on.
Formulaic language appears to be dealt with holistically by adult
native speakers and highly proficient second language speakers.
Formulaic language may be used as a strategy for second
language acquisition by adult learners.
There are still a wide range of questions about how adult language
learners perceive and acquire formulaic language.
For certain, all the questions have not been answered yet in any particular
area. How can we determine whether and how an adult learner might perceive
80
and process formulaic language? Do the basic assumptions about formulaic

language acquisition in children apply wholesale to adult acquisition of a
second language? What types of research methods are needed in order to
answer these questions? Clearly, this is fertile ground for future researchers!

1 In your own second language learning experience, have you ever
perceived and recalled multiword items and then later realized that
they were actually composed of single elements? Did you use this as
a source of language information later? Give an example or two.
2 If you have had experience as a caregiver of a small child, see if you
can recall some examples of a child using a formulaic sequence as a
whole. Give an example or two. Perhaps the childs pronunciation and
prosody were nonstandard? Why would this be?
3 Think of a research plan to study an aspect of child first language
acquisition of formulaic language. If possible, collect a corpus of
speech samples and conduct a small study.
4 Think of a research plan to study adult second language acquisition
of formulaic language. If possible, collect a corpus of spoken or
written samples, or create a small corpus, or conduct an intervention
study.
5 From the descriptions of the research on first language acquisition in
this chapter, can we draw any conclusion of value for early childhood
educators, kindergarten teachers, or caregivers in facilitating child
language acquisition?
6 As a parent, what would you look for in early child speech to help with
language acquisition? Can you imagine any areas of study which have
not been covered in the research traditions described in this chapter?
7 From the descriptions of the research on second language acquisition
in this chapter, can we draw any conclusion of value for second
language teachers?
8 Based on what we have read here about adult second language
acquisition, where do you feel the most important areas of
investigation are likely to be in the coming years?
9 If you were to begin a plan of research in the area of adult second
language acquisition, what would you focus on?
10 Does the study of the acquisition of formulaic language have any
serious implications for the work of language testers and assessors
and editors?
6
Formulaic Language and
Spoken LanguageFluency
and Pragmatic Competence
s we have already seen, a great deal of the research around formulaic

language has to do either directly or indirectly with speech. It has
been noted that formulaic sequences may make up as much as 58.6
(Erman & Warren, 2000) to 80 percent (Altenberg, 1998) of spoken
language. Indeed, the historical roots of the study of formulaic language
are in examination of speechrecall deKuyper, Pawley and Syder, and
Sinclair, discussed elsewhere in this book. The discussions of cognitive
processing focus on spoken languagerecall the idea of, for example,
stored and retrieved as wholes. A good portion of the categories
described earlier in this book are also quite speech-focusedidioms,
collocations, phrasal verbs. As far as direct links between formulaic
language research and spoken language are concerned, however, three
main areas are worthy of some in-depth consideration: speech fluency,
phonological characteristics, and speech pragmatics. Lets take each in
turn and see what the research may have to tell us.
It has become fairly well established that formulaic language is
fundamental to spoken languageand that second language speakers
can certainly benefit from using it. In a pivotal study, Boers et al. (2006)
illustrate that formulaic sequences may help second language speakers
in three important ways: they may help speakers appear more nativelike,
as they provide ready-made chunks of language which are appropriate
to specific contexts; they provide an opportunity for error-free speech
82
and may allow speakers to produce language that outstrips their actual
competence; they facilitate fluent speech.
Lists of formulaic sequences in spoken language

A number of researchers have worked to construct lists of formulaic
sequences used in spoken language. These are largely centered around
corpus analysis of academic registers. Coxhead (1998) was among the
first to construct a list of words used in academic language, the influential
Academic Word List (AWL). She later focused on formulaic language to
a certain extent (2008). Biber, in a 2006 book reporting on corpus study
of academic language from various perspectives identified lists of lexical
bundles (multiword sequences identified by means of frequency and range)
in university spoken language across a range of registers: lectures and other
classroom discourse, service encounters, and so on. See Chapter 8 of this
book for details about lexical bundle research.
While the research into lexical bundles has contributed immensely
to descriptions of academic discourse in general, researchers such as
Simpson-Vlach and Ellis (2010) and Liu (2012) point out that the lexical
bundle research has often provided us with lists of multiword units which
are semantically and structurally incomplete, such as to do with the, or I
think it was. The obvious problem with these sequences is that they are
neither terribly functional nor pedagogically compelling (Simpson-Vlach &
Ellis, 2010, p 493).
Largely in response to the perceived weaknesses of some of the lists of
lexical bundles found in previous research, Simpson-Vlach and Ellis (2010)
created the Academic Formulas List (AFL). The AFL was developed by
comparing a corpus of 2.1 million words of academic speech and writing
to a nonacademic corpus. As such, the AFL represents an effort to identify
formulas which are both genuinely academic and classroom-worthy. The
corpora used to develop the AFL were scanned at a frequency cutoff
of ten per million, using a range criterion of three out of four academic
disciplines for the written corpus. To avoid the often limited psychological
saliency and pedagogical inutility of the types of units listed in the lexical
bundle research, the AFL was compiled using mutual information (MI)
scores as a measure of collocation strength, combined with frequency
data and a rating by instructors and testers, to produce a composite score
which determined the final lists. Table 6.1 presents the top most frequent
formulaic sequences in spoken language uncovered by Simpson-Vlach and
Ellis (2010):
FORMULAIC LANGUAGE AND SPOKEN LANGUAGE
83
Table 6.1Top most frequent sequences in spoken language

Simpson-Vlach and Ellis (2010)
Blah blah blah
This is the
You know what I mean
You can see
Trying to figure out
A little bit about
Does that make sense
You know what
The university of Michigan
For those of you who
Do you want me to
Thank you very much
Look at the
Were gonna talk about
Talk a little bit
If you look at
And this is
If you look at the
No no no no
At the end of
We were talking about
In Ann Arbor
It turns out that
You need to
See what Im saying
Take a look at
You have a
Might be able to
At the end
84
Shin and Nation (2008) worked to identify the most common or highfrequency collocations in English using specific criteria. Using the spoken
subcorpus of the BNC as a basis, they used the most common single
content words (nouns, verbs, adjectives, and adverbs) as a starting point,
and used a frequency cutoff of thirty per ten million running words. The list
is presented in Table 6.2:
Table 6.2 Most frequent sequences in spoken languageShin and

Nation (2008)
You know
I think (that)
A bit
(always/never) used to + infinitive
As well
A lot of
# pounds
Thank you
# years
In fact
Very much
# pound
Talking about (something)
(about) # percent (of something/ in something/ on something/ for something)
I suppose (that)
At the moment
A little bit
Looking at (something)
This morning
(not) any more
Come on
Number (#)
Come in (somewhere/something)
(Continued)
85
Table 6.2 Most frequent sequences in spoken languageShin and

Nation (2008)
Come back
Have a look
In terms of (something)
Last year
So much
(#) years ago
Determiner the/this/a county council
Martinez and Schmitt (2012) elaborated a phrasal expressions list consisting

of 505 multiword items extracted from the British National Corpus (BNC) by
means of frequency and distributional criteria. The resulting list was then
narrowed down by applying a range of judgment criteria including whether
the sequence was semantically transparent and whether it had a one-word
equivalent. The researchers included the entire BNC in their search and
therefore the list contains sequences relevant to both spoken and written
registers, and is focused on receptive skills rather than both receptive and
productive. The most frequent items are presented in Table 6.3:
Table 6.3 Most frequent sequences in spoken languageMartinez

and Schmitt (2012)
I mean
A lot
Rather than
So that
A little
A bit (of)
As well as
In fact (be) likely to
Go on
Is to
A number of
(Continued)
86
Table 6.3 Most frequent sequences in spoken languageMartinez

and Schmitt (2012)
At all
As if
Used to (past)
Was to
Not only
Those who
Deal with
Lead to (cause)
Sort of
The following
In order to
Have got (+NP)
Have got to
Set up
As to
As well
Based on
Carry out
Speech fluency
If you look around for descriptions of spoken language abilities, such as
second language syllabuses or assessment criteria and so on, you are likely to
see the word fluency represented somewhere. It is often used as a synonym
for nativelike ability in a second language, or for having good command of
language. In terms of speech in particular, ask people what it means and you
may hear words like smoothness or flow of speech. In the end, though,
the research literature on fluency has generally attended to temporal variables
of speech. These are measurable, quantifiable aspects of speech: speed;
pauses and hesitations; length of runs.
87
Speed of speech is usually measured as syllables uttered per second or per

minute. Research shows that second language learners tend to produce more
rapid speech over time as the acquisition process unfolds, and that speed of
speech correlates with judges perceptions of fluency. Examples are Towells
1987 case study of a learner of French over four years which found that speech
rate increased by 65 percent, Riggenbachs 1991 study of Chinese students
learning English which found that syllables uttered per minute linked to
judges ratings of fluency; in other words, faster speakers were rated as more
fluent. Another well-known large study of fluency, by Freed in 1995, found
that speech rate was the only thing that actually increased when American
students of French spent time abroad in France.
A more complex aspect of fluency is pause phenomena. By this we mean
the amount and frequency of hesitations and pauses (the two terms pause
and hesitation are interchangeable as used here), as well as the location.
Location means, basically, syntactic location. In general, pause times are
measured by the proportion of the total speaking time spent in pauses or
silence. This is called the pause/time ratio, or PTR in the literature. Less time
spent pausing is, not surprisingly, an indicator of higher fluency. Researchers
like Mhle (1984) and Riggenbach (1991) found that shorter, fewer, and less
frequent pausing was linked to higher fluency.
Thinking about fluency as a function of increased speed of speaking, or of
less pausing tells us something useful, but something is missing at the same
time. The missing element is an understanding of how fluent speech occurs,
and what role there is for formulaic language. For this, it is productive to look
at pause location.
The research shows that pauses which occur at phrase and clause
boundaries are more linked to fluency than pauses occurring at other syntactic
locations. Dechert (1980) found that, after a study abroad experience, a student
telling a story was more able to pause at breaks between story segments
which established the setting, locations, and so on. Lennon (1984) found that
L1 speakers paused at clause boundaries, whereas L2 speakers also paused
within clauses. Deschamps (1980), Riggenbach (1991), and Freed (1995) found
similar phenomena in their studies. What this means is that the production of
clauses and phrases more or less as wholes is a sort of hallmark of fluency
speech production.
The most important variable of speech associated with fluency is the
length of runs of speech produced between pauses. An early investigation,
which focused on temporal variables in L2 speech, is Raupachs (1980) study
of participants telling a story from picture prompts in their L1s and L2s, in
which the L2 speech displayed shorter runs between pauses. Later, Mhle
(1984) discovered the same dynamic at play when the participants in her study
88
produced shorter runs between pauses in L2 than in L1 speech. Another key

study was that of Towell (1987), in which a British learner of French over a
four-year period increased the mean length of runs by 95 percent over the
first three years. Lennon (1990b) found that the mean length of runs between
pauses in the L2 speech of his participants increased by 2026 percent over
a 23-week period. As well, in a large-scale examination of the development
of speech fluency by American students of French, Freed (1995) identified a
strong tendency toward longer runs over time.
Here is where we can find a possible role for formulaic language in
fluency. It may be that a repertoire of formulaic sequences can help
speakers to produce phrases and clauses more or less as wholes, without
internal pausing, which would account for the less frequent pausing and the
longer runs of speech between pauses which appear to be key indicators
of fluency.
A key study, which investigates the role of formulaic language in L2 speech
fluency, is that of Wood (2010). Wood examined the fluency development of a
group of eleven L2 learners in an intensive study abroad English program over
a six-month period. The participants were two female Japanese L1 speakers,
two male Japanese L1 speakers, two female Spanish L1 speakers, two
male Spanish L1 speakers, one female Mandarin L1 speaker, and two male
Mandarin L1 speakers. Drawing on the research into temporal variables of
speech and the possible role of formulaic language, Wood hypothesized that,
with continuous experience with their L2 over the six months, the participants
spontaneous speech in English would show a faster speech rate, a lower ratio
of pause time to speech time, longer runs of speech between pauses, and
more frequent occurrence of formulaic language within the longer runs of
speech.
Participants met once a month over the six months to watch a silent
animated film prompt and then spontaneously retell the narrative of the
film. Three films were used, 810 minutes long, with eight narrative moves
each and a similar number of characters and level of plot complexity. The
films were seen in a staggered sequence so as to ensure that participants
would not become overly familiar with any one storyone film was seen
in months one and four, another in months two and five, another in months
three and six.
The resulting data were analyzed first for evidence of fluency gain over
the six months. Speed of speech was calculated as speech rate (SR),
syllables uttered per minute, and by articulation rate (AR), syllable uttered
per minute with pauses removed. Pause time was calculated as phonation/
time ratio (PTR), time spent actually articulating, divided by the total speech
time. Mean length of runs (MLR) was calculated by dividing the total number
of syllables uttered by the total number of runs for each sample. Finally, a
89
formula/run ration (FRR) was calculated by dividing the number of runs by

the number of formulaic sequences in each sample.
As noted in Chapter 2, determining what constitutes a formulaic sequence
in speech can be challenging. In Woods study, native speaker judgment was
used. Three informed judges who had read key pieces of research literature on
formulaic language used five judgment criteria to make decisions about what
constituted a formulaic sequence:
MM
The taxonomy used by Nattinger and DeCarrico (1992). It was

a guide for selection, as sequences from the transcripts were
identified with an eye to categories:
1 Syntactic strings are strings of category symbols, such as
NP + Aux + VP ().
2 Collocations are strings of specific lexical items, such as rancid
butter and curry favor, that co-occur with a mutual expectancy
greater than chance ().
3 Lexical phrases are collocations, such as how do you do? and for
example, that have been assigned pragmatic functions () (p. 36).
The authors go on to refine these categories and further refine their shared
characteristics.
MM
MM
MM
MM
Phonological coherence. Coulmas (1979) and Peters (1983) state

that if a sequence is to be considered formulaic, it must be at
least two morphemes long and cohere phonologically, that is, be
produced without internal hesitation or pausing. This was one of
the most important aspects of the formula identification process
in the present study. For more information about phonological
coherence see later in this chapter.
Greater length/complexity than other output. Also pointed out
by Coulmas (1979) and Peters (1983), chunks which are uttered in
a longer run and/or show greater semantic or syntactic complexity
than the rest of a speakers output are likely formulas. Examples
would include using I would liketo express a desire for
something, or I dont understand to show a lack of comprehension,
while never using would or negatives using do in other contexts.
Semantic irregularity. According to Wray and Perkins (2000),
formulas are often not composed semantically, but are holistic
items like idioms and metaphors.
Syntactic irregularity. Formulas often do not follow rules of
syntax (Wray & Perkins, 2000). This can restrict the manipulation
90
of elements in a formula (one cannot pluralize beat around the

bush or passivize face the music), or require the flouting of normal
syntactic restrictions as in the intransitive verb + direct object
construction of go the whole hog or the gross violation of syntactic
laws in by and large.
No one or combination of these criteria were deemed necessary for a
sequence to be determined formulaic, they were to serve as general
guidelines only. The judges had a discussion and benchmark session in which
they worked together on two transcripts to prepare them to then work alone
on the rest (Wood, 2010, pp. 111, 112).
As might be expected, the participants showed significant gains in fluency
on the SR, AR, and MLR measures over the six months. They also showed
strong gains in FRR, or formula/run ratio. This indicated that the use of
formulaic language actually may have facilitated the reduction of pauses and
increased length of runs.
Wood (2006, 2010) also noted that the study participants used formulaic
sequences in certain ways and for certain functions in order to facilitate
their fluency. He compared sequences of the speech samples in which a
participant retold a certain narrative move after the first viewing of a film, and
again three months later. Five categories of uses and functions emerged from
this analysis:
MM
repetition of a formula; stringing together of multiple formulas;

reliance on one formula; use of self-talk and filler formulas; use of
formulas as rhetorical devices.
Repetition of a formula helped the speakers to lengthen runs and avoid

pausing in later retells. An example of this would be the following, from a
Japanese female participants speech:
MM
And he came back the cat came back to the his house and ah
This results in a run of thirteen syllables, only one of which is a filler nonlexical
item ah.
MM
I forget I forget the order but maybe the f he went to the

forest
Here she appears to think aloud, buying time to recall the next event in the
narrative and uses a very simple subject + verb formula to repeat her lack of
clear recall. It helps her to produce a 19-syllable run.
Stringing together multiple formulaic sequences also helped speakers to
avoid pausing and to extend runs, as evidenced by, for example, For instance,
in later retells of the film Strings (speech samples 3 and 6, these examples
91
are from sample 6) several participants described the old man in the story
making music by himself in his room, a combination of three short
two-word formulas making music, by himself, and in his room. This
produces a very fluent ten-syllable run.
Reliance on a single formulaic sequence also helped speakers to avoid
pausing. To introduce the next action in the story, for example, it was common
to use and then, or and next.
Use of self-talk and fillers was a relatively sophisticated strategy used by
the speakers. This includes use of self-referential collocations as I know, or
I think, or I guess. Also included in this category are long strings used for
self-talk or circumlocution such as I dont know, or I dont know the
things name. These allowed them to produce longer runs.
Similarly, use of formulaic sequences as rhetorical devices was a relatively
sophisticated means of avoiding pauses and extending runs. Wood notes that,
in later retells, the study participants tended to use beginning formulas such
as at the beginning, narrative move markers such as when the story
is go ahead, and endings such as that is the end of the story. All of
these add greatly to the length of runs as well as to the effectiveness of the
storytelling.
Interestingly, this study still stands alone as the only effort to determine
the role of formulaic language in speech fluency. It shows both quantitatively
and qualitatively the importance of formulaic language in second language
speech fluency development.
Phonological characteristics
As noted in Chapter 2, formulaic sequences appear to display particular
phonological characteristics in speech. Lin (2010, 2012) cataloged these
characteristics.
Phonological coherence is the term most often used to summarize the
nature of the prosody of formulaic sequences in speech. It was Peters (1977,
1983) who first noted that children can be observed to produce sequences
which surpass their grammatical competence and which exhibit no internal
hesitations and a smooth intonation contour, making them stand out from
the rest of the speech flow. The idea is, then, that these characteristics of
formulaic sequences can give researchers a clue as to what has been acquired
as a chunk. Child language researchers such as Hickey (1993) note that
phonological coherence is basic to spoken formulaic sequences in childrens
speech. As for adult speech, it is not quite so clear, although numerous
researchers have assumed that a similar dynamic applies as to child language
(e.g., Moon, 1997; Wray, 2004).
92
It has been shown that high-frequency phrases undergo phonological

reduction more quickly than other phrases (e.g., Bybee, 2002, 2006). This
means reduced schwa, sometimes t/d deletion (Bybee, 2000).
As for stress placement, Ashby (2006) looked at idioms and determined
that there are actually three classifications of idioms according to their accent
patterns in comparison with literal uses of the same word sequences: in one
case idioms have the same pattern as a literal version, for example, to have a
CHIP on ones shoulder, to have a BEE on ones shoulder. In a second case,
the accent pattern is different from a literal version, for example, POUR down
(idiomatic, as in heavily raining) versus pour DOWN (literal). In a third case,
the idiom is very restricted, for example, I could eat a HORSE (falling tone)
versus I could eat a horse (falling-rising tonenever used). Ashby notes that
the second type of case signals or invites a listener to note an interpretation
of the utterance which is nonliteral.
Theories of holistic storage of formulaic sequences, as discussed in
Chapter 4, are often linked to phonological coherence. Lin (2010, p. 179),
however, notes that this connection is often an assumption and used in a
circular fashion, with researchers pointing out that phonological coherence
is an indicator of holistic storage, while others state that sequences are
phonologically coherent because they are holistically stored. If we look instead
from a frequency-based perspective, we may conclude that the phonological
coherence of a frequently used/frequently encountered sequence may simply
be a result of the fact that it is uttered and heard often. The neurological and
motor underpinnings of the processing and production of the sequence may
simply have become faster with frequent use.
Research has indicated that the nature of pausing before the uttering
of formulaic sequences in spontaneous speech differs from that before
nonformulaic language. Erman (2006, 2007) examined the London-Lund
Corpus and the Bergen Corpus of London Teenager Language, and noted
that retrieval of formulaic language appeared to entail shorter pre-sequence
pauses. This may be taken as something of an indication that the formulaic
sequences are processed more quickly.
Other research has pointed to an alignment between formulaic sequences
and intonation units. Lin and Adolphs (2009) found that the sequence I dont
know why taken from the Nottingham Corpus of Learner English was a
single intonation unit in over 50 percent of cases. In further research, Lin
(2010) investigated formulaic sequences in a university lecture and found that
82 percent aligned with intonation boundaries on one side of the sequence,
and 40 percent aligned on both sides.
Clearly, there is plenty of evidence that formulaic sequences are marked
by certain phonological characteristics in speech. The evidence points to
phonological coherence being a function of holistic mental storage and
retrieval of formulaic sequences, possibly in addition to frequency effects.
93
Pragmatics
One extremely important aspect of spoken language is pragmatic competence.
The idea of communicative competence (e.g., Bachman, 1990) led the
language teaching field into new ways of approaching its purpose. In addition
to grammar, communicative competence includes knowledge of discourse,
genre and text, social aspects of language, and a focus on the learner and
what he/she does. Current models of communicative competence see
communication as composed of knowledge or competence in four key areas:
organizational competence, pragmatic competence, sociolingual competence,
and strategic competence.
Pragmatic competence is key to successful ability to communicate in social
interactions, and is the basis of what we might call small C culture. We
can define pragmatic competence as the knowledge and skill necessary for
successful and appropriate use of language in communication, and subdivide
it into several broad categories:
MM
MM
Pragmalinguistics: actual language ability to perform language

functions such as requesting help, inviting, refusing invitations,
making requests, giving commands, and so on.
Sociopragmatics: ability to assess the context in which the
function occurswhat is the appropriate means of achieving what
I need, given the nature of the situation and the people involved?
Pragmatic competence involves both of the above, in real-time communication.

For example, in a communication situation we might need to determine what
grammar and vocabulary is needed to refuse an invitation, and, at the same
time, assess whether such a refusal is acceptable, and what to say, under
the specific circumstances.
Clearly, knowledge of conventional formulaic language units to use in
specific contexts is a key part of sociopragmatic competence. According to
Bardovi-Harlig (2012, p. 207), research on formulaic language and pragmatics
involves three areas of focus:
1 the form as a recurring sequence
2 its use in specific situations
3 the social contract or bonds, which include members of a speech
community
Formulaic language has been referred to by a range of labels in pragmatics,
including conventional expression, pragmatic routine, situation-based
utterance (SBU). The general agreement in pragmatics research is that
94
formulas in pragmatics are conventional expressions representing ways

of saying things agreed upon by a speech community (Bardovi-Harlig,
2012, p. 209).
In pragmatics research frequency-based research is relatively rare.
Formulaic sequences are identified in various ways depending largely on
the goals of the research. It is common to either start with the formulaic
sequence and work from there or to start with the situation and context,
specifically the illocutionary force or the speech acts, and determine the
sequences used.
Specific studies of formulaic language in pragmatics
Some studies have identified a formulaic sequence in a specific situation and

identified its use in a quantitative way. Manes and Wolfson (1981) looked
at almost 700 examples of compliments and determined that 53.6 percent
took the form NP (is/looks) BT/T1_4 1 Tf9.5 0 0 9.5 119.121nast 700 examples of coNmCtU
172
This is an area of importance and a place for practitioners and researchers

to work together. More intervention studies and more effort to link
knowledge of formulaic language with state of the art methods such as
task-based teaching and focus-on-form are needed. Also, it is important to
attempt to integrate this knowledge with post-method theory and practice
such as Dogme (Meddings & Thornbury, 2009), Kumaravadivelus (2002)
post-method ideas, and ecological approaches (Kramsch, 2008; van Lier,
2004) to name a few.
Research focuses
We certainly seem obsessed with academic language in our research on
formulaic language. Indeed, virtually the entire body of work on lexical bundles
and formulaic language in writing is focused on academic writing. It is time
to move the focus out of the academy and look at other areas of importance.
For example, studying the use of formulaic language in service encounters,
doctorpatient discourse, debates, counseling, and psychotherapeutic
situations, to name a few, can provide insights into how these types of
communication are structured, the nature of the discourse therein, and how
to foster facility with this language to native speakers and second language
learners. Ben Rejeb (2014) conducted a study of formulaic language used in
official business meetings of university student government, using a corpus of
meeting minutes spanning many years. More of this type of research is useful.
Similarly, perhaps it is time to take the focus away from productive skills
and toward receptive skills. We have studied the use of formulaic language
in speech and in writing, but comparatively little in reading and listening.
More research in these areas can help us not only to discover the ways in
which language users handle formulaic sequences, but also to uncover more
psycholinguistic processes. These types of research can help augment our
ways of teaching language as well.
A final area ripe for research is to examine the ways formulaic language works
with discourse analysis and writing studies. In empirical discourse analysis
such as conversation analysis, there are some obvious roles for formulaic
language to play, but we have not yet examined them. In critical discourse
analysis and the systemic-functional linguistic ways of deconstructing text
and determining ideologies, there seem to be many ways that research can
incorporate knowledge of formulaic language. Similarly, in writing studies,
the ways in which people learn to write and the ways that discourses evolve
seem to be natural places where the interface with formulaic language study
can exist. Collaborative and multifaceted research projects can help to push
formulaic language into the forefront and help us all to create an unlimited
amount of new knowledge in a wide range of fields.
References
Adel, A. & Erman, B. (2012). Recurrent words combinations in academic writing
by native and non-native speakers of English: A lexical bundles approach.
English for Specific Purposes, 31, 8192.
Al Hassan, L. & Wood, D. (2015). The effectiveness of focused instruction of
formulaic sequences in augmenting L2 learners academic writing skills: A
quantitative research study. Journal of English for Academic Purposes.
Allerton, D. J. (1984). Three (or four) levels of coocurrence restriction. Lingua, 63,
1740.
Allerton, D. J., Nesselhauf, N., & Skandera, P. (Eds.). (2004). Phraseological units:
Basic concepts and their application. Basel: Schwabe.
Altenberg, B. (1993). Recurrent word combinations in spoken English. In:
J. D. Arcy (Ed.), Proceedings of the Fifth Nordic Association for English
Studies Conference (pp. 1727). Reykjavik: University of Iceland.
Altenberg, B. (1998). On the phraseology of spoken English: The evidence of
recurrent word combinations. In A. P. Cowie (Ed.), Phraseology: Theory,
analysis and application (pp. 101122). Oxford: Clarendon Press.
Altenberg, B. & Tapper, M. (1998). The use of adverbial connectors in advanced
Swedish learners written English. In: S. Granger (Ed.), Learner English on
computer (pp. 318). New York: Longman.
Ambridge, B., Rowland, C. F., Theakston, A. L., & Tomasello, M. (2006).
Comparing different accounts of inversion errors in childrens non-subject whquestions: What experimental data can tell us? Journal of Child Language,
33, 519557.
Amosova, N. N. (1963). Osnovui angliiskoy frazeologii [The foundations of English
phraseology]. Leningrad: University Press.
Anderson, J. (1983). The architecture of cognition. Cambridge, MA: Harvard
University Press.
Appel, R. & Wood, D. (in press). Recurrent word combinations in EAP test-taker
writing: Differences between high and low proficiency levels. Language
Assessment Quarterly.
Arnon, I. & Snider, N. (2010). More than words: Frequency effects for multi-word
phrases. Journal of Memory and Language, 62, 6782.
Ashby, M. (2006). Prosody and idioms in English. Journal of Pragmatics, 38(10),
15801597.
Austin, J. L. (1962). How to do things with words. Oxford: Clarendon Pres.
Bacha, N. N. (2002). Developing learners academic writing skills in higher
education: A study for educational reform. Language and Education, 16(3),
161177.
174
REFERENCES
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford:

Oxford University Press.
Baddeley, A. D. (1988). Working memory. Oxford: Oxford University Press.
Bahns, J., Burmeister, H., & Vogel, T. (1986). The pragmatics of formulas in L2
learner speech: Use and development. Journal of Pragmatics, 10, 693723.
Bamber, B. (1983). What makes a text coherent? College Composition and
Communication, 34, 417429.
Bannard, C. & Lieven, E. (2012). Formulaic language in L1 acquisition. Annual
Review of Applied Linguistics, 32, 316.
Bannard, C. & Matthews, D. (2008). Stored word sequences in language learning:
The effect of familiarity on childrens repetition of four-word combinations.
Psychological Science, 19, 241248.
Bannard, C., Lieven, E., & Tomasello, M. (2009). Modeling childrens early
grammatical knowledge. Proceedings of the National Academy of Sciences,
106, 1728417289.
Bardovi-Harlig, K. (2012). Formulas, routines, and conventional expressions in
pragmatics research. Annual Review of Applied Linguistics, 32, 206227.
Bardovi-Harlig, K. & Bastos, M.-T. (2011). Prociency, length of stay, and
intensity of interaction and the acquisition of conventional expressions in L2
pragmatics. Intercultural Pragmatics, 8, 347384.
Bardovi-Harlig, K., Bastos, M.-T., Burghardt, B., Chappetto, E., Nickels, E., &
Rose, M. (2010). The use of conventional expressions and utterance length
in L2 pragmatics. In G. Kasper, H. T. Nguyen, D. R. Yoshimi & J. K. Yoshioka
(Eds.), Pragmatics and language learning: Vol. 12 (pp. 163186). Honolulu, HI:
University of Hawaii, National Foreign Language Resource Center.
Barron, A. (2003). Acquisition in interlanguage pragmatics: Learning how to do
things with words in a study abroad context. Amsterdam: John Benjamins.
Becker, J. D. (1975). The phrasal lexicon. TINLAP 75 Proceedings of the 1975
workshop on theoretical issues in natural language processing (pp. 6063).
Stroudsberg, PA: Association for Computational Linguistics.
Ben Rejeb, R. (2014). Lexical bundles in meeting minutes: The case of a graduate
students association. Unpublished Master of Arts Research Essay. Ottawa,
School of Linguistics and Language Studies, Carleton University.
Benson, M., Benson, E., & Ilson, R. (1997). The BBI dictionary of English word
combinations. Amsterdam: John Benjamins.
Berndi, M., Csbi, S., & Kvecses, Z. (2008). Using conceptual metaphors and
metonymies in vocabulary teaching. In F. Boers & S. Lindstromberg (Eds.),
Cognitive linguistic approaches to teaching vocabulary and phraseology
(pp. 6599). Berlin, Germany: Mouton de Gruyter.
Biber, D. (2006). University language: A corpus-based study of spoken and
written registers. Philadelphia, PA: John Benjamins.
Biber, D. & Barbieri, F. (2007). Lexical bundles in university spoken and written
registers. English for Specific Purposes, 26, 263286.
Biber, D. & Conrad, S. (1999). Lexical bundles in conversation and academic
prose. In H. Hasselgard & S. Oksefjell (Eds.), Out of corpora: Studies in
honour of Stig Johansson (pp. 181190). Amsterdam: Rodopi.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look atlexical bundles in
university teaching and textbooks. Applied Linguistics, 25, 371405.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman
grammar of spoken and written English. Harlow, UK: Pearson.
REFERENCES
175
Bod, R. (2000). The storage vs. computation of three-word sentences. Paper

presented at AMLaP2000, University of Leiden, Leiden, the Netherlands.
Bod, R. (2001). Sentence memory: Storage vs. computation of frequent
sentences. Paper presented at CUNY 2001, University of Pennsylvania,
Philadelphia, PA.
Bod, R. (2006). Exemplar-based syntax: How to get productivity from exemplars.
Linguistic Review, 23, 291320.
Boers, F. & Lindstromberg, S. (2005). Finding ways to make phrase learning
feasible: The mnemonic effect of alliteration. System, 33, 225238.
Boers, F. & Lindstromberg, S. (2009). Optimizing a lexical approach to instructed
second language acquisition. Basingstoke, UK: Palgrave Macmillan.
Boers, F. & Lindstromberg, S. (2012). Experimental and intervention studies
on formulaic sequences in a second language. Annual Review of Applied
Linguistics, 32, 83110.
Boers, F., Demecheleer, M., & Eyckmans, J. (2004). Etymological elaboration
as a strategy for learning gurative idioms. In P. Bogaards & B. Laufer (Eds.),
Vocabulary in a second language: Selection, acquisition and testing (pp.
5378). Amsterdam, the Netherlands: John Benjamins.
Boers, F., Eyckmans, J., & Stengers, H. (2007). Presenting gurative idioms
with a touch of etymology: More than mere mnemonics? Language Teaching
Research, 11, 4362.
Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, H. (2006).
Formulaic sequences and perceived oral prociency: Putting a lexical approach
to the test. Language Teaching Research, 10, 245261.
Bolander, M. (1989). Prefabs, patterns and rules in interaction? Formulaic
speech in adult learners L2 Swedish. In K. Hyltenstam & L. K. Obler (Eds.),
Bilingualism across the lifespan: Aspects of acquisition, maturity, and loss
(pp. 7386). Cambridge: Cambridge University Press.
Broca, P. (1863). Localisations des fonctions crbrales. Sige de la facult du
langage articul. Bulletin de la Socit dAnthropologie, 4, 200208.
Butler, C. S. (2003). Multi-word sequences and their relevance for recent models
of functional grammar. Functions of Language, 10(2), 179208.
Bybee, J. (2000). The phonology of the lexicon. In M. Barlow and S. Kemmer
(Eds.), Usage-based models of language (pp. 6585). Stanford, CA: CSLI
Publications.
Bybee, J. (2002). Phonological evidence for exemplar storage of multiword
sequences. Studies in Second Language Acquisition, 24, 215221.
Bybee, J. (2006). From usage to grammar: The minds response to repetition.
Language, 82(4), 711733.
Byrd, P. & Coxhead, A. (2010). On the other hand: Lexical bundles in academic writing
and in the teaching of EAP. University of Sydney Papers in TESOL, 5, 3164.
Cameron-Faulkner, T., Lieven, E., & Tomasello, M. (2003). A construction-based
analysis of child-directed speech. Cognitive Science, 27, 843873.
Chafe, W. (1968). Idiomaticity as an anomaly in the Chomskyan paradigm.
Foundations of Language, 4, 109127.
Chafe, W. L. (1980). Some reasons for hesitating. In H. W. Dechert & M. Raupach
(Eds.), Temporal variables in speech (pp. 169180). The Hague: Mouton.
Chan, T.-P. & Liou, H.-C. (2005). Effects of web-based concordancing instruction
on EFL students learning of verb-noun collocations. Computer Assisted
Language Learning, 18, 231250.
176
REFERENCES
Chen, L. (2010). An investigation of lexical bundles in ESP textbooks and

electrical engineering introductory textbooks. In D. Wood (Ed.), Perspectives
on formulaic language: Acquisition and communication (pp. 107128). London/
New York: Continuum.
Chen, Y. & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing.
Language Learning & Technology, 14(2), 3049.
Cieslicka, A. (2006). Literal salience in on-line processing of idiomatic expressions
by second language learners. Second Language Research, 22, 115144.
Clevedon: Multilingual Matters.
Columbus, G. (2010). Processing MWUs: Are MWU subtypes psycholinguistically
real? In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and
communication (pp. 194210). New York/London: Continuum.
Conklin, K. & Schmitt, N. (2008). Formulaic sequences: Are they processed more
quickly than nonformulaic language by native and non-nativespeakers? Applied
Conklin, K. & Schmitt, N. (2012). The processing of formulaic language. Annual
Connor, U. (1990). Linguistic/rhetorical measures for international students
persuasive writing. Research in the Teaching of English, 24, 6787.
Connor, U. (2003). Changing currents in contrastive rhetoric: Implications for
teaching and research. In B. Kroll (Ed.), Exploring the dynamics of second
language writing (pp. 218240). Cambridge: Cambridge University Press.
Cook, V. & Bassetti, B. (2005). An introduction to researching second language
writing systems. Second Language Writing Systems, 167.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing:
Examples from history and biology. English for Specic Purposes, 23,
397423.
Cortes, V. (2007) . Teaching lexical bundles in the disciplines: An example from a
writing intensive history class. Linguistics and Education, 17 (4) pp. 391406.
Cortes, V., Jones, J., & Stoller, F. (2002, April). Lexical bundles in ESP reading and
writing. Paper presented at TESOL Conference, Salt Lake City, Utah.
Coulmas, F. (1979). On the sociolinguistic relevance of routine formulae. Journal
of Pragmatics, 3(3/4), 239266.
Coulmas, F. (Ed.). (1981). Conversational routines. The Hague: Mouton.
Cowie, A. P. (1992). Multiword lexical units and communicative language
teaching. In P. J. L. Arnaud & H. Bjoint (Eds.), Vocabulary and applied
linguistics (pp. 112). Basingstoke: Macmillan.
Cowie, A. P. (1994). Phraseology. In R. E. Asher (Ed.), The encyclopedia of
language and linguistics (pp. 31683171). Oxford: Pergamon.
Cowie, A. P. (Ed.). (1998). Phraseology: Theory, analysis and application. Oxford:
Clarendon Press.
Coxhead, A. (1998). An academic word list. English language institute occasional
publication No. 18. New Zealand: Victoria University of Wellington.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213238.
Coxhead, A. (2008). Phraseology and English for academic purposes: Challenges
and opportunities. In F. Meunier & S. Granger (Eds.), Phraseology in language
learning and teaching (pp. 149161). Amsterdam: John Benjamins.
Coxhead, A. & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary
and grammar of academic prose. Journal of Second Language Writing, 16(3),
129147.
REFERENCES
177
Culpeper, J. (2010). Conventional impoliteness formula. Journal of Pragmatics,

42, 32323245.
Dai, Z. & Ding, Y. (2010). Effectiveness of text memorization in EFL learning of
Chinese students. In D. Wood (Ed.), Perspectives on formulaic language:
Acquisition and communication (pp. 7187). New York/London: Continuum.
Davis, J. (2007). Resistance to L2 pragmatics in the Australian ESL context.
Language Learning, 57, 611649.
Davis, P. & Rinvolucri, M. (1989). Dictation: New methods, new possibilities.
Cambridge: Cambridge University Press.
De Jong, N. & Perfetti, C. A. (2011). Fluency training in the ESL classroom: An
experimental study of fluency development and proceduralization. Language
Learning, 61(2), 533568.
De Pablos-Ortega, C. (2011). The pragmatics of thanking reected in the
textbooks for teaching Spanish as a foreign language. Journal of Pragmatics,
43, 24112433.
Dechert, H. W. (1980). Pauses and intonation as indicators of verbal planning in
second-language speech productions: Two examples from a case study. In H.
W. Dechert & M. Raupach (Eds.), Temporal variables in speech (pp. 271285).
The Hague: Mouton.
Deschamps, A. (1980). The syntactical distribution of pauses in English spoken as
a second language by French students. In H. W. Dechert & M. Raupach (Eds.),
Temporal variables in speech (pp. 255262). The Hague: Mouton.
Ding, Y. (2007). Text memorization and imitation: The practices of successful
Chinese learners of English. System, 35, 271280.
Durrant, P., & Mathews-Aydnl, J. (2011). A function-first approach to identifying
formulaic language in academic writing. English for Specific Purposes, 30(1), 5872.
Ellis, N. C. (1996). Sequencing in SLA: Phonological memory, chunking, and
points of order. Studies in Second Language Acquisition, 18, 91126.
Ellis, N. C. (2002). Frequency effects in language processing. Studies in Second
Language Acquisition, 24, 143188.
Ellis, N. C. (2012). Formulaic language and second language acquisition: Zipf and
the phrasal teddy bear. Annual Review of Applied Linguistics, 32, 1744.
Ellis, N. C. & Simpson-Vlach, R. (2009). Formulaic language in native speakers:
Triangulating psycholinguistics, corpus linguistics, and education. Corpus
Linguistics and Linguistic Theory, 5, 6178.
Ellis, N. C., Frey, E., & Jalkanen, I. (2008). The psycholinguistic reality of
collocation and semantic prosody (1): Lexical access. In U. Romer & R.
Schulze (Eds.), Exploring the lexis-grammar interface. Amsterdam, the
Netherlands: John Benjamins.
Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in
native and second-language speakers: Psycholinguistics, corpus Linguistics,
and TESOL. TESOL Quarterly, 42, 375396.
Ellis, R. (2005). Planning and task-based performance: Theory and research. In
R. Ellis (Ed.), Planning and task performance in a second language (pp. 336).
Amsterdam: John Benjamins.
Ellis, R. & Yuan, F. (2005). The effects of careful within-task planning on oral and
written task performance. In R. Ellis (Ed.), Planning and task performance in a
second language (pp. 167192). Amsterdam: John Benjamins.
Ellis, R., Basturkmen, H., & Loewen, S. (2001). Learner uptake in communicative
ESL lessons. Language Learning, 51, 281318.
178
REFERENCES
Erman, B. (2006). Non-pausing as evidence of the idiom principle. Paper

presented at the first Nordic Conference on Syntactic Freezes. University of
Joensuu, Finland. May 1920, 2006.
Erman, B. (2007). Cognitive processes as evidence of the idiom principle.
International Journal of Corpus Linguistics, 12(1), 2553.
Erman, B. & Warren, B. (2000). The idiom principle and the open choice principle.
Text, 20(1), 2962.
Eskildsen, S. W. & Cadierno, T. (2007). Are recurring multi-word expressions
really syntactic freezes? Second language acquisition from the perspective
of usage-based linguistics. In M. Nenonen & S. Niemi (Eds.), Collocations
and idioms 1: Papers from the First Nordic Conference on Syntactic Freezes.
Joensuu, Finland: Joensuu University Press.
Eyckmans, J., Boers, F., & Stengers, H. (2007). Identifying chunks: Who can see
the wood for the trees? Language Forum, 33, 85100.
Ferris, D. (1994). Lexical and syntactic features of ESL writing by students at
different levels of L2 proficiency. TESOL Quarterly, 28, 414420.
Firth, J. R. (1951). Modes of meaning. In J. R. Firth (Ed.), Essays and studies
(pp. 118149). London: Oxford University Press.
Firth, J. R. (Ed.), (1957). Papers in linguistics 19341951. Oxford: Oxford
University Press.
Forsberg, F. (2010). Using conventional sequences in L2 French. International
Fraser, B. (1970). Idioms within a transformational grammar. Foundations of
Language, 6, 2242.
Freed, B. F. (1995). What makes us think that students who study abroad become
fluent? In B. F. Freed (Ed.), Second language acquisition in a study abroad
context (pp. 123148). Philadelphia, PA: John Benjamins.
Freudenthal, D., Pine, J. M., & Gobet, F. (2010). Explaining quantitative variation in
the rate of Optional Innitive errors across languages: A comparison of MOSAIC
and the Variational Learning Model. Journal of Child Language, 37, 643669.
Gatbonton, E. & Segalowitz, N. (1988). Creative automatization: Principles for
promoting fluency within a communicative framework. TESOL Quarterly,
22(3), 473492.
Gatbonton, E. & Segalowitz, N. (2005). Rethinking communicative language
teaching: A focus on access to fluency. The Canadian Modern Language
Review, 61(3), 325353.
Gibbs, R., & Gonzales, G. (1985). Syntactic frozenness in processing and
remembering idioms. Cognition, 20, 243259.
Goffman, E. (1971). Relations in public. London: Allen Lane/The Penguin Press.
Goldberg, A. (1995). Constructions: A construction grammar approach to
argument structure. Chicago, IL: University of Chicago Press.
Goldberg, A. (2006). Constructions at work: The nature of generalization in
language. Oxford: Oxford University Press.
Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in spontaneous speech.
New York, NY: Academic Press.
Granger, S. & Rayson, P. (1998). Automatic profiling of learner texts. In S. Granger
(Ed.), Learner English on computer (pp. 119131). New York, NY: Longman.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations
and formulae. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and
applications (pp. 145160). Oxford: Clarendon Press.
REFERENCES
179
Granger, S. & Meunier, F. (Eds.). (2008). Phraseology: An interdisciplinary

perspective. Philadelphia, PA: John Benjamins.
Granger, S. & Paquot, M. (2008). Disentangling the phraseological web. In
S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective
(pp. 2750). Philadelphia, PA: John Benjamins.
Grant, I. E. & Bauer, L. (2004). Criteria for redefining idioms: Are we barking up
the wrong tree? Applied Linguistics, 25, 3861.
Greenbaum, S. (1974). Some verb-intensifier collocations in American and British
English. American Speech, 49, 7989.
Gries, S. T. (2008). Corpus-based methods in analyses of SLA data. In P. Robinson
and N. C. Ellis (Eds.), Handbook of cognitive linguistics and second language
acquisition (pp. 406431). New York, NY: Routledge, Taylor & Francis.
Gries, S. T. (2012). Frequencies, probabilities, association measures in usage-/
exemplar-based linguistics: some necessary clarifications. Studies in
Language, 36(3), 477510.
Hakuta, K. (1974). Prefabricated patterns and the emergence of structure in
second language acquisition. Language Learning, 24(2), 287297.
Handl, S. (2008). Essential collocations for learners of English: The role of
collocational direction and weight. In F. Meunier & S. Granger (Eds.),
Phraseology in foreign language learning and teaching (pp. 4366).
Amsterdam: John Benjamins.
Hasselgren, A. (1994). Lexical teddy bears and advanced learners: A study into
the ways Norwegian students cope with English vocabulary. International
Journal of Applied Linguistics, 4, 237260.
Hickey, T. (1993). Identifying formulas in first language acquisition. Journal of
Child Language, 20, 2741.
Hill, J. & Lewis, M. (1997). LTP dictionary of selected collocations. EMEA British
English.
Hilpert, M. (2008). New evidence against the modularity of grammar: Constructions,
collocations, and speech perception. Cognitive Linguistics, 19, 491511.
Hockett, C. F. (1958). A course in modern linguistics. New York: MacMillan.
Hoey, M. (2005). Lexical priming: A new theory of word and language. London/
New York: Routledge.
Hornby, A. S., Gatenby, E. V., & Wakefield, H. (1942). Idiomatic and syntactic
English dictionary. Oxford: Oxford University Press.
Hsu, J.-Y. & Chiu, C.-Y. (2008). Lexical collocations and their relation to speaking
prociency of college EFL learners in Taiwan. Asian EFL Journal, 10, 181204.
Hulstijn, J. H. (2001). Intentional and incidental second language vocabulary
learning: A reappraisal of elaboration, rehearsal and automaticity. In P.
Robinson (Ed.), Cognition and second language instruction (pp. 258286).
Hyland, K. (1998). Hedging in scientific research articles. Amsterdam: John
Benjamins.
Hyland, K. (2003). Second language writing. Cambridge: Cambridge University
Press.
Hyland, K. (2004). Disciplinary discourses: Social interactions in academic
writing. University of Michigan Press.
Hyland, K. (2006). English for academic purposes: An advanced resource book.
Abingdon.
180
REFERENCES
Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction.

Journal of Second Language Writing, 16(3), 148164.
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation.
English for Specific Purposes, 27(1), 421.
Hyland, K. (2012). Bundles in academic discourse. Annual Review of Applied
Hyland, K. & Hamp-Lyons, L. (2001). EAP: Issues and directions. Journal of
English for Academic Purposes, 1, 112.
Hymes, D. (1962). The ethnography of speaking. In T. Gladwin and
W. C. Sturtevant (Eds.), Anthropology and human behaviour (pp. 1353).
Washington, DC: Anthropological Society of Washington.
Jesperson, O. (1924). The philosophy of language. London: Allen and Unwin.
Jiang, N. & Nekrasova, T. M. (2007). The processing of formulaic sequences by
second language speakers. The Modern Language Journal, 91, 433445.
Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic
sequences. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing
and use (pp. 269300). Amsterdam: John Benjamins.
Jones, S. & Sinclair, J. (1974). English lexical collocations. Cahiers de Lexicologie,
24, 1561.
Katz, J. J. & Postal, P. (1963). The semantic interpretation of idioms and
sentences containing them. MIT Research Laboratory of Electronics Quarterly
Progress Report, 70, 275282.
Kecskes, I. (2000). Conceptual uency and the use of situation-bound utterances.
Links & Letters, 7, 145161.
Kemmer, S. & Barlow, M. (2000). Introduction: A usage-based conception
of language. In M. Barlow & S. Kemmer (Eds.), Usage-based models of
language. Chicago, IL: University of Chicago Press.
Keshavarz, M. H. & Salimi, H. (2007). Collocational competence and cloze test
performance: A study of Iranian EFL learners. International Journal of Applied
Kirjavainen, M., Theakston, A., & Lieven, E. (2009). Can input explain childrens
me-for-I errors? Journal of Child Language, 36, 10911114.
Kjellmer, G. (1984). A dictionary of English collocations: Based on the Brown
Corpus. Oxford: Clarendon Press.
Koprowski, M. (2005). Investigating the usefulness of lexical phrases in
contemporary coursebooks. ELT Journal, 4, 322332.
Kormos, J. & Safar, A. (2008). Phonological short-term memory, working
memory and foreign language performance in intensive language learning.
Bilingualism: Language and Cognition, 11, 261271.
Kramsch, C. (2008). Ecological perspectives on foreign language education.
Language Teaching, 41(3), 389408.
Krashen, S. & Scarcella, R. (1978). On routines and patterns in language
acquisition and performance. Language Learning, 28(2), 283300.
Kress, G. (1994). Learning to write. London: Routledge.
Kuiper, K. (1996). Smooth talkers: The linguistic performance of auctioneers and
sportscasters. Mahwah, NJ: Lawrence Erlbaum.
Kuiper, K. (2004). Formulaic performance in conventionalised varieties of speech.
In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use
(pp. 3754). Amsterdam: John Benjamins.
REFERENCES
181
Kuiper, K. & Haggo, D. (1985). The nature of ice hockey commentaries. In R. Barry
and J. Acheson (Eds.), Regionalism and national identity: Multidisciplinary
essays on Canada, Australia and New Zealand (pp. 167175). Christchurch
Association for Canadian Studies in Australia and New Zealand.
Kumaravadivelu, B. (2002). Beyond methods: Macrostrategies for language
teaching. New Haven, CT: Yale University Press.
Kunin, A. V. (1955). English-Russian phraseological dictionary (2nd ed., 1967;
3rd ed., 1984). Moscow: Russkii Yazik.
Lakoff, G. & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of
Chicago Press.
Laufer, B. (2011). The contribution of dictionary use to the production and
retention of collocations in a second language. International Journal of
Lexicography, 24, 2949.
Laufer, B. & Girsai, N. (2008). Form-focused instruction in second language
vocabulary learning: A case for contrastive analysis and translation. Applied
Laufer, B. & Roitblat-Rozovski, B. (2011). Incidental vocabulary acquisition: The
effects of task type, word occurrence and their combination. Language
Teaching Research, 15, 391411.
Laufer, B. & Waldman, T. (2011). Verb-noun collocations in second language
writing: A corpus analysis of learners English. Language Learning, 61,
647 672.
Leki, I. (2006). The legacy of first-year composition. In P. K. Matsuda, C. OrtmeireHooper &
Lennon, P. (1984). Retelling a story in English. In H. W. Dechert, D. Mhle &
M. Raupach (Eds.), Second language productions (pp. 5068). Tubingen:
Gunter Narr Verlag.
Lennon, P. (1990a). The advanced learner at large in the L2 community:
Developments in spoken performance. International Review of Applied
Linguistics in Language Teaching, 28, 309321.
Lennon, P. (1990b). Investigating fluency in EFL: A quantitative approach.
Language Learning, 40(3), 387417.
Levy, S. (2003). Lexical bundles in professional and student writing (Doctoral
dissertation) Retrieved from CSA Linguistics and Language Behaviour
Abstracts. (ISSN: 04194209).
Levy, S. (2008). Lexical bundles in professional and student writing. Saarbrucken:
VDM Verlag.
Lewis, M. (1997). Pedagogical implications of the lexical approach. In J. Coady
& T. Huckin (Eds.), Second language vocabulary acquisition (pp. 255270).
Lewis, M. (2000). Materials and resources for teaching collocation. In M. Lewis
(Ed.), Teaching collocations: Further developments in the lexical approach
(pp. 186204). Boston, MI: Heinle.
Lewis, M. (2008). Implementing the lexical approach. London: Heinle.
Li, J. & Schmitt, N. (2009). The acquisition of lexical phrases in academic writing:
A longitudinal case study. Journal of Second Language Writing, 18(2), 85102.
Lieven, E., Salomo, D., & Tomasello, M. (2009). Two-year-old childrens production
of multiword utterances: A usage-based analysis. Cognitive Linguistics, 20,
481508.
182
REFERENCES
Lin, P. (2010). The phonology of formulaic sequences: A review. In D. Wood (Ed.),

Perspectives on formulaic language: Acquisition and communication (pp.
174193). New York/London: Continuum.
Lin, P. (2012). Sound evidence: The missing piece of the jigsaw in formulaic
language research. Applied Linguistics, 33(3), 342347.
Lin, P. M. S. & Adolphs, S. (2009). Sound evidence: Phraseological units in
spoken corpora. In A. Barfield and H. Gyllstad (Eds.), Collocating in another
language: Multiple interpretations. Basingstoke, UK: Palgrave MacMillan.
Lindstromberg, S. & Boers, F. (2008a). The mnemonic effect of noticing
alliteration in lexical chunks. Applied Linguistics, 29, 200222.
Lindstromberg, S. & Boers, F. (2008b). Phonemic repetition and the learning of
lexical chunks: The mnemonic power of assonance. System, 36, 423436.
Liu, D. (2008). Idioms: Description, comprehension, acquisition, and pedagogy.
New York/London: Routledge.
Liu, D. (2011). The most frequently used English phrasal verbs in American and
British English: A multicorpus examination. TESOL Quarterly, 45(4), 661688.
Liu, D. (2012). The most frequent multiword constructions in academic written
English; A multi-corpus study. English for Specific Purposes, 31(1), 2535.
Llach, A. (2011). Lexical errors and accuracy in foreign language writing. Bristol:
Multilingual Matters.
Lord, A. (1960). The singer of tales. Cambridge, MA: Harvard University Press.
Makkai, A. (1972). Idiom structure in English. The Hague: Mouton.
Malinowski, B. (1935). Coral gardens and their magic: A study of the methods
of tilling the soil and of agricultural rites in the Trobriand Islands. New York:
Routledge.
Manes, J. & Wolfson, N. (1981). The compliment formula. In F. Coulmas (Ed.),
Conversational routine: Explorations in standardized communication situations
and prepatterned speech (pp. 115132). The Hague, the Netherlands: Mouton.
Martin, K. I. & Ellis, N. C. (2012). The roles of phonological STM and working
memory in L2 grammar and vocabulary learning. Studies in Second Language
Acquisition, 34, 379413.
Martinez, R. & Murphy, V. A. (2011). Effect of frequency and idiomaticity on
second language reading comprehension. TESOL Quarterly, 45, 267290.
Martinez, R. & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics,
33(3), 299320.
Matsuda, P. K., Ortmeier-Hooper, C., & You, X., (Eds.). (2006). The politics of
second language writing. West Lafayette, Indiana: Parlor Press.
McCarthy, M. & ODell, F. (2002). English idioms in use. Cambridge: Cambridge
University Press.
McCarthy, M. & ODell, F. (2004). English phrasal verbs in use intermediate.
McCarthy, M. & ODell, F. (2006). English collocations in use. Cambridge:
Cambridge University Press.
McCully, G. (1985). Writing quality, coherence, and cohesion. Research in the
Teaching of English, 19, 269282.
McDonough, K. & Tromovich, P. (2008). Using priming methods in second
language research. London: Routledge.
Meddings, L. & Thornbury, S. (2009). Teaching unplugged: Dogme in English
language teaching. Peaslake UK: Delta.
REFERENCES
183
Melcuk, I. (1988). Semantic description of lexical units in an explanatory

combinatory dictionary: Basic principles and heuristic criteria. International
Journal of Lexicography, 1(3), 165188.
Melcuk, I. (1998). Collocations and lexical functions. In A. P. Cowie (Ed.), Phraseology:
Theory, analysis and application (pp. 2353). Oxford: Clarendon Press.
Meunier, F. (2012). Formulaic language and language teaching. Annual Review of
Applied Linguistics, 32, 111129.
Mitchell, T. F. (1971). Linguistic goings-on: Collocations and other lexical matters
arising on the syntactic record. Archivum Linguisticum, 2 (new series), pp. 3569.
Mhle, D. (1984). A comparison of the second language speech production of
different native speakers. In H. W. Dechert, D. Mhle & M. Raupach (Eds.),
Second language productions (pp. 2649). Tubingen: Gunter Narr Verlag.
Moon, R. (1977). Vocabulary connections: Multi-word items in English. In
M. McCarthy (Ed.), Vocabulary: Description, acquisition and pedagogy
Moon, R. (1997). Vocabulary connections: Multi-word items in English. In
M. McCarthy (Ed.), Vocabulary: Description, acquisition and pedagogy
Moon, R. (1998). Frequencies and forms of phrasal lexemes in English. In
A. P. Cowie (Ed.), Phraseology. Theory, analysis, and applications (pp. 79100).
Oxford: Clarendon Press.
Myles, F. (2004). From data to theory: The over-representation of linguistic
knowledge in SLA. Transactions of the Philological Society, 102, 139168.
Myles, F., Hooper, J., & Mitchell, R. (1998). Rote or rule? Exploring the role
of formulaic language in classroom foreign language learning. Language
Learning, 48 (3), 323363.
Myles, F., Mitchell, R., & Hooper, J. (1999). Interrogative chunks in French L2: A basis
for creative construction. Studies in Second Language Acquisition, 21, 4980.
Nassaji, H. (1999). Towards integrating form-focused instruction and
communicative interaction in the second language classroom: Some
pedagogical possibilities. Canadian Modern Language Review, 55(3), 386404.
Nation, I. S. P. (1990). Teaching and learning vocabulary. Boston, MA: Heinle & Heinle.
Nation, P. (1989). Improving speaking fluency. System, 17(3), 377384.
Nattinger, J. R. & DeCarrico, J. S. (1992). Lexical phrases and language teaching.
Oxford: Oxford University Press.
Nesselhauf, N. (2004). What are collocations? In D. J. Allerton, N. Nesselhauf &
P. Scandera (Eds.), Phraseological units: Basic concepts and their application
(pp. 121). Basel: Schwabe.
Nesselhauf, N. (2005). Collocations in a learner corpus. Philadelphia, PA: John
Benjamins.
OBrien, I., Segalowitz, N., Collentine, J., & Freed, B. (2006). Phonological
memory and lexical, narrative, and grammatical skills in second-language oral
production by adult learners. Applied Psycholinguistics, 27, 377402.
ODonnell, M. B., Romer, U., & Ellis, N. C. (2013).The development of formulaic
sequences in first and second language writing: Investigating effects of
frequency, association, and native norms. International Journal of Corpus
Linguistics, 18,(1), 83108.
Opie, I. & Opie, P. (1959). The lore and language of schoolchildren. Oxford:
Oxford University Press.
184
REFERENCES
Palmer, H. E. (1933). Second interim report on English collocations. Institute for

Research in English Teaching.
Palmer, H. E. (1938). A grammar of English words. London: Longmans Green.
Paltridge, B. (2004). Academic writing. Language Teaching, 37(2), 87105.
Paqout, M. (2008). Exemplification in learning writing: A cross-linguistic
perspective. In F. Meunier & S. Granger (Eds.), Phraseology in foreign
language learning and teaching (pp. 101119). Amsterdam: John Benjamins.
Par, A. (2009). What we know about writing, and why it matters. Compendium
2, 2(1), 112.
Parry, M. (1928). LEpithte traditionelle dans Homre. Paris: Socit Editrice Les
Belles Lettres.
Parry, M. (1930). Studies in the epic technique of oral verse-making. I: Homer and
Homeric style. Harvard Studies in Classical Philology, 41, 73147.
Parry, M. (1932). Studies in the epic technique of oral verse-making. II: The
Homeric language as the language of an oral poetry. Harvard Studies in
Classical Philology, 43, 150.
Pawley, A. (1986). Lexicalization. In D. Tannen & J. Alatis (Eds.), Georgetown
round table in languages and linguistics: The interdependence of theory, data
and applications (pp. 98120). Washington, DC: Georgetown University.
Pawley, A. (1991). How to talk cricket: On linguistic competence in a subject
matter. In Currents in R. Blust (Ed.), Pacific linguistics: Papers on Austronesian
languages and ethnolinguistics in honour of George Grace (pp. 339368).
Canberra: Pacific Linguistics.
Pawley, A. (2007). Developments in the study of formulaic language since 1970.
In P. Skandera (Ed.), Phraseology and culture in English (pp. 345). Berlin/New
York: Mouton de Gruyter.
Pawley, A. & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike
selection and nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.),
Language and communication (pp. 191226). New York, NY: Longman.
Peters, A. M. (1977). Language learning strategies: Does the whole equal the
sum of the parts? Language, 53(3), 560573.
Peters, A. M. (1983). Units of language acquisition. Cambridge: Cambridge
University Press.
Peters, E. (2012). Learning German formulaic sequences: The effect of two
attentiondrawing techniques. Language Learning Journal, 40, 6579.
Pinker, S. (1999). Words and rules: The ingredients of language. New York, NY:
HarperCollins.
Poos, D. & Simpson, R. (2002). Cross-disciplinary comparisons of hedging: Some
findings from the Michigan corpus of academic spoken English. In R. Reppen,
S. Fitzmaurice & B. Douglas (Eds.), Using corpora to explore linguistic variation
Raimes, A. (2002). The steps in planning a writing course and training teachers of
writing. In J. C. Richards & W. A. Renandya (Eds.), Methodology in language
teaching: An anthology of current practice (pp. 306314). Cambridge:
Cambridge University Press.
Raupach, M. (1980). Temporal variables in first and second language speech
production. In H. W. Dechert & M. Raupach (Eds.), Temporal variables in
speech (pp. 263270). The Hague: Mouton.
REFERENCES
185
Rehbein, J. (1987). On fluency in second language speech production. In H. W.

Dechert & M. Raupach (Eds.), Psycholinguistic models of language production
(pp. 97105). Norwood, NJ: Ablex.
Reiter, R. M., Rainey, I., & Fulcher, G. (2005). A comparative study of certainty
and conventional indirectness: Evidence from British English and Peninsular
Spanish. Applied Linguistics, 26, 131.
Ricard, E. (1986). Beyond fossilization: A course on strategies and techniques
in pronunciation for advanced adult learners. TESL Canada Journal Special
Edition, 1, 243253.
Riggenbach, H. (1991). Toward an understanding of fluency: A microanalysis of
nonnative speaker conversations. Discourse Processes, 14, 423441.
Riggenbach, H. (1999). Discourse analysis in the language classroom. Ann Arbor,
MI: University of Michigan Press.
Robinson, P. (1995). Attention, memory and the noticing hypothesis. Language
Learning, 45, 283331.
Roever, C. (2005). Testing ESL pragmatics: Development and validation of a webbased assessment battery. Berlin, Germany: Peter Lang.
Rowland, C. F., & Pine, J. M. (2000). Subject-auxiliary inversion errors and whquestion acquisition: What children do know! Journal of Child Language,
27, 157181.
Rumsey, A. (2001). Tom yaya kange: A metrical narrative genre from the New
Guinea Highlands. Journal of Linguistics Anthropology, 11(2),193239.
Sadoski, M. (2005). A dual coding view of vocabulary learning. Reading & Writing
Quarterly, 21, 221238.
Salazar, D. & Verdaguer, I. (2009). Polysemous verbs and modality in native and
non-native argumentative writing: A corpus-based study. International Journal
of English Studies, 9, 209219.
Schauer, G. A. & Adolphs, S. (2006). Expressions of gratitude in corpus and DCT
data: Vocabulary, formulaic sequences, and pedagogy. System, 34, 119134.
Schloff, L. & Yudkin, M. (1991). Smart speaking: Sixty-second strategies.
New York, NY: Henry Holt and Company.
Schmidt, R. (1992). Psycholinguistic mechanisms underlying second language
fluency. Studies in Second Language Acquisition, 14, 357385.
Schmidt, R. W. (1983). Interaction, acculturation, and the acquisition of
communicative competence: A case study of an adult. In N. Wolfson &
E. Judd (Eds.), Sociolinguistics and language acquisition (pp. 137174).
Rowley, MA: Newbury House.
Schmitt, N. (2004). Formulaic sequences: Acquisition, processing and use.
Philadelphia, PA: John Benjamins.
Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual.
London: Palgrave Macmillan.
Schmitt, N., Grandage, S. & Adolphs, S. (2004). Are corpus-derived recurrent
clusters psycholinguistically valid? In N. Schmitt (Ed.), Formulaic sequences:
Acquisition, processing, and use (pp. 127151). Philadelphia, PA: John
Benjamins.
Scott, M. (2007). Oxford Wordsmith Tools: Version 5.0. Released June 2007 from
http://www.lexically.net.
Searle, J. (1968). Speech acts: An essay on the philosophy of language.
186
REFERENCES
Sharian, F. (2008). Cultural schemas in L1 and L2 compliment responses:

A study of Persian-speaking learners of English. Journal of Politeness
Research: Language, Behavior, Culture, 4, 5580.
Shei, C. C. (2008). Discovering the hidden treasure on the internet: Using Google
to uncover the veil of phraseology. Computer Assisted Language Learning,
21(1), 6785.
Sianou, M. & Tzanne, A. (2010). Conceptualizations of politeness and
impoliteness in Greek. Intercultural Pragmatics, 7, 661687.
Silva, T. (1993). Toward an understanding of the distinct nature of L2 writing: The
ESL research and its implications. TESOL Quarterly, 27(4), 657677.
Simpson, R. (2004). Stylistic features of academic speech: The role of formulaic
expressions. In T. Upton & U. Connor (Eds.), Discourse in the professions:
Perspectives from corpus linguistics (pp. 3764). Amsterdam: John
Benjamins.
Simpson-Vlach, R. & Ellis, N. (2010). An academic formulas list: New methods in
phraseology research. Applied Linguistics, 31, 487512.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University
Press.
Sinclair, J. 2005. Corpus and text basic principles. In M. Wynne (Ed.),
Developing linguistic corpora: A guide to good practice (pp. 116). Oxford:
Oxbow Books.
Siyanova-Chanturia, A., Conklin, K., & Schmitt, N. (2011). Adding more fuel to
the re: An eye-tracking study of idiom processing by native and non-native
speakers. Second Language Research, 27, 122.
Siyanova-Chanturia, A., Conklin, K., & van Heuven, J. B. (2011). Seeing a phrase
time and again matters: The role of phrasal frequency in the processing
of multiword sequences. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 37, 776784.
Skandera, P. (2004). What are idioms? In D. J. Allerton, N. Nesselhauf &
P. Skandera (Eds.), Phraseological units: Basic concepts and their applications
(pp. 2336). Basel: Schwabe AG.
Smirnitsky, A. I. (1956). Lexicology of the English language. Moscow: Foreign
Literature.
Sosa, A. & MacFarlane, J. (2002). Evidence for frequency-based constituents in
the mental lexicon: Collocations involving the word of. Brain and Language,
83, 227236.
Spears, R. A., Birner, B., & Kleinelder, S. (1994). NTCs dictionary of everyday
American English expressions. New York, NY: McGraw-Hill.
Staehr, L. S. (2009). Vocabulary knowledge and advanced listening
comprehension in English as a foreign language. Studies in Second Language
Acquisition, 31, 577607.
Staples, S., Egebert, J., Biber, D., & McClair, A. (2013). Formulaic sequences and
EAP writing development: Lexical bundles in the TOEFL iBT writing section.
Journal of English for Academic Purposes, 12(3), 214225.
Steinel, M. P., Hulstijn, J. H., & Steinel, W. (2007). Second language idiom
learning in a paired-associate paradigm: Effects of direction of learning,
direction of testing, idiom imageability, and idiom transparency. Studies in
Second Language Acquisition, 29, 449484.
Stengers, H., Boers, F., Housen, A., & Eyckmans, J. (2010). Does chunking
foster chunk-uptake? In S. De Knop, F. Boers & A. De Rycker (Eds.), Fostering
REFERENCES
187
language teaching efciency through cognitive linguistics (pp. 99117). Berlin,

Germany: Mouton de Gruyter.
Stubbs, M. (2002). Words and phrases: Corpus studies of lexical semantics.
Oxford: Blackwell.
Sugaya, N., & Shirai, Y. (2009). Can L2 learners productively use Japanese tense
aspect markers? A usage-based approach. In R. Corrigan, E. Moravcsik,
H. Ouali & K. Wheatley(Eds.), Formulaic language: Vol. 2. Acquisition, loss,
psychological reality, functional applications. Amsterdam, the Netherlands:
John Benjamins.
Swain, M. (1985). Communicative competence: Some roles of comprehensible
input and comprehensible output in its development. In S. Gass &
C. G. Madden (Eds.), Input in second language acquisition (pp. 235253).
New York, NY: Newbury House.
Swain, M. (1995). Three functions of output in second language learning. In
G. Cook & B. Seidlhofer (Eds.), Principles and practice in the study of language
(pp. 125144). Oxford: Oxford University Press.
Swinney, D. & Cutler, A. (1979). The access and processing of idiomatic
expressions. Journal of Verbal Learning and Verbal Behaviour, 18, 523534.
Taguchi, N. (2007). Chunk learning and the development of spoken discourse in a
Japanese as a foreign language classroom. Language Teaching Research, 11,
433457.
Taguchi, N. (2011).The effect of L2 prociency and study-abroad experience on
pragmatic comprehension. Language Learning, 61, 136.
ten Hacken, P. (2004). What are compounds? In D. J. Allerton, N. Nesselhauf &
P. Skandera (Eds.), Phraseological units: Basic concepts and their applications
(pp. 5366). Basel: Schwabe AG.
Terkoura, M. (2002). Politeness and formulaicity: Evidence from Cypriot Greek.
Journal of Greek Linguistics, 3, 179201.
Tomasello, M. (2003). Constructing a language: A usage-based theory of language
acquisition. Cambridge, MA & London, UK: Harvard University Press.
Towell, R. (1987). Variability and progress in the language development of
advanced learners of a foreign language. In R. Ellis (Ed.), Second language
acquisition in context (pp. 113127). Toronto: Prentice-Hall.
Traverso, V. (2006). Aspects of polite behaviour in French and Syrian service
encounters: A data-based comparative study. Journal of Politeness Research:
Language, Behavior, Culture, 2, 105122.
Tremblay, A. & Baayen, H. (2010). Holistic processing of regular four-word
sequences: A behavioural and ERP study of the effects of structure,
frequency, and probability on immediate free recall. In D. Wood (Ed.),
Perspectives on formulaic language: Acquisition and communication (pp.
151172). New York/London: Continuum.
Tremblay, A., Derwing, B., Libben, G., & Westbury, C. (2011). Processing
advantages of lexical bundles: Evidence from self-paced reading and sentence
recall tasks. Language Learning, 61(2), 569613.
Tucker, G. (2005). Extending the lexicogrammar: Towards a more comprehensive
account of extraclausal, partially clausal, and non-clausal expressions in
spoken discourse. Language Sciences, 27, 679709.
Underwood, G., Schmitt, N., & Galpin, A. (2004). The eyes have it: An eyemovement study into the processing of formulaic sequences. In N. Schmitt
188
REFERENCES
(Ed.), Formulaic sequences: acquisition, processing, and use (pp. 153172).

Philadelphia, PA: John Benjamins.
Van Lancker, D., & Kempler, D. (1987). Comprehension of familiar phrases by
left-but not by right-hemisphere damaged patients. Brain and Language, 32,
265277.
Van Lancker, D., Canter, G., & Terbeek, D. (1981). Disambiguation of ditropic
sentences: Acoustic and phonetic cues. Journal of Speech and Hearing
Research, 24, 330335.
Van Lancker-Sidtis, D. (2003). Auditory recognition of idioms by rst and second
language speakers of English. Applied Psycholinguistics, 24, 4557.
Van Lancker-Sidtis, D. & Postman, W. A. (2006). Formulaic expressions in
spontaneous speech of left- and right-hemisphere damaged subjects.
Aphasiology, 20, 411426.
VanLier, L. (2004). The ecology and semiotics of language learning: A
sociocultural perspective. Dordrecht: Kluwer Academic.
Vinogradov, V. V. (1947). Ob osnovnuikh tipakh frazeologicheskikh edinits v
russom yazike. [About the basic types of phraseological units in English]. In
A. A. Shakhmatov (Ed.), Sbornik statei i materialov [The collection of articles
and materials] (pp. 339364). Moscow: Nauka.
Vinogradov, V. V. (1977). Ob osnovnuikh tipakh frazeologicheskikh edinits v
russom yazike. [About the basic types of phraseological units in English]. In
V. V. Vinogradov (Ed.), Izbrannie trudi. Leksikologia i leksikografia [Selected
works. Lexicology and lexicography] (pp. 140161). Moscow: Nauka.
Virtanen, T. (1998). Direct questions in argumentative student writing. In
S. Granger (Ed.), Learner English on computer (pp. 94106). New York, NY:
Longman.
Wajnryb, R. (1990). Grammar dictation. Oxford: Oxford University Press.
Walker, I. & Utsumi, T. (2006). Memorizing dialogues: The case for performative
exercises. In W. M. Can, K. N. Chin & T. Suthiwan (Eds.), Foreign language
teaching in Asia and beyond: Current perspectives and future directions
(pp. 243269). Singapore: Centre for Language Studies.
Webb, S., Newton, J., & Chang, A. C. S. (2013). Incidental learning of collocation.
Language Learning, 63(1), 91120.
Weinert, R. (1995). The role of formulaic language in second language acquisition:
A review. Applied Linguistics, 16(2), 180205.
Weinreich, U. (1969). Problems in the analysis of idioms. In J. Puhvel (Ed.),
Substance and structure of language (pp. 2381). Berkeley: University of
California Press.
Wen, Z. (2011). Working memory and second language learning. Bristol, UK:
Multilingual Matters.
Williams, E. (1981). On the notions lexically related and head of a word. Linguistic
Inquiry, 12, 245274.
Willis, D. (1990). The lexical syllabus: A new approach to language teaching.
London: Harper Collins.
Wong, M. L.-Y. (2010). Expressions of gratitude by Hong Kong speakers of
English: Research from the International Corpus of English in Hong Kong
(ICE-HK). Journal of Pragmatics, 42, 12431257.
Wong-Fillmore, L. (1976). The second time around: Cognitive and social
strategies in second language acquisition. Unpublished doctoral dissertation,
Stanford University.
REFERENCES
189
Wood, D. (1998). Making the grade: An interactive course in English for academic
purposes. Toronto: Prentice Hall Allyn and Bacon.
Wood, D. (2001). In search of fluency: What is it and how can we teach it?
Canadian Modern Language Review, 57(4), 573589.
Wood, D. (2002). Formulaic language in acquisition and production: Implications
for teaching. TESL Canada Journal, 20(1), 115.
Wood, D. (2006). Uses and functions of formulaic sequences in second language
speech: An exploration of the foundations of fluency. Canadian Modern
Language Review, 63(1), 1333.
Wood, D. (2009a). Preparing ESP learners for workplace placement. ELT Journal,
63(4), 323331.
Wood, D. (2009b). Effects of focused instruction of formulaic sequences on
fluent expression in second language narratives: A case study. Canadian
Journal of Applied Linguistics, 12(1), 3957.
Wood, D. (2010a). Formulaic language and second language speech fluency:
Background, evidence, and classroom applications. London/New York:
Continuum.
Wood, D. (2010b). Lexical clusters in an EAP textbook corpus. In D. Wood
(Ed.), Perspectives on formulaic language: Acquisition and communication
(pp. 8106). New York/London: Continuum.
Wood, D. & Appel, R. (2013). Lexical bundles in first year university business and
engineering textbooks: A resource for EAP. In H. M. McGarrell & D. Wood
(Eds.), Special research symposium issue of CONTACT. Refereed Proceedings
of TESL Ontario Research Symposium, October 2012. Vol. 39, No. 2
(pp. 92102).
Wood, D. C. & Appel, R. (2014). Multiword constructions in first year university
textbooks and in EAP textbooks. Journal of English for Academic Purposes,
15, 113.
Wood, D. & Namba, K. (2013). Focused instruction of formulaic language: Use
and awareness in a Japanese university class. The Asian Conference on
Language Learning Official Conference Proceedings 2013, pp. 203212.
Wood, M. M. (1981). A definition of idiom. Bloomington: University of Indiana
Linguistics Club.
Wray, A. (1999). Formulaic sequences in second language teaching: Principles
and practice. Applied Linguistics, 21(4), pp. 463489.
Wray, A. & Fitzpatrick, T. (2008). Why cant you just leave it alone? Deviations
from memorized language as a gauge of nativelike competence. In
F. Meunier & S. Granger (Eds.), Phraseology in language learning and teaching
Wray, A. & Perkins, M. R. (2000). The functions of formulaic language: An
integrated model. Language and Communication, 20, 128.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge
University Press.
Wray, A. (2004). Heres one I prepared earlier: Formulaic language learning on
television. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing
and use (pp. 249268). Amsterdam/Philadelphia, PA: John Benjamins.
Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford: Oxford
University Press.
190
REFERENCES
Wray, A. & Namba, K. (2003). Use of formulaic language by a Japanese-English

bilingual child: A practical approach to data analysis. Japanese Journal for
Multilingualism and Multiculturalism, 9(1), 2451.
Wu, S., Witten, I. H., & Franken, M. (2010). Utilizing lexical data from a webderived corpus to expand productive collocation knowledge. ReCALL, 22,
83102.
Yeung, L. (2009). Use and misuse of besides: A corpus study comparing native
speakers and learners English. System, 37, 330342.
Yorio, C. (1980). Conventionalized language forms and the development of
communicative competence. TESOL Quarterly, 16(4), 433442.
Yorio, C. (1989). Idiomaticity as an indicator of second language proficiency.
In K. Hyltenstam & L. K. Obler (Eds.), Bilingualism across the lifespan:
Aspects of acquisition, maturity, and loss (pp. 5571). Cambridge: Cambridge
University Press.
Zhu, W. (2006). Understanding context for writing in university content
classrooms. In P. K. Matsuda, C. Ortmeire-Hooper & X. You (Eds.), The politics
of second language writing: In search of the promised land (pp. 129146).
West Lafayette, IN: Parlor Press.
Index
Academic Formulas List (AFL) 82,
11012, 123
academic textbook language 1312
Academic Word List (AWL) 82, 114
academic writing 16, 45, 1038, 110,
11719, 124, 131, 1347, 1645,
172
beneficial effects, lexical bundles
1345
corpus-focused studies 10817
historical perspectives 1056
learner corpora 1068
lists of formulaic sequences
10817
nature 1025
acquisition theory
adult language 745
associative models 76
child language 678
developmental sequence 789
flooding the input 142, 144
Focus on Form (FonF) pedagogy
147
formulaic language, classroom
lessons 6970, 166
lexical bundles 124
power of memory in language 76,
146
pragmatic competence 95, 139
second language 25, 578, 656,
157, 1623, 168
speed of speech 87
thematic context 145
Adel, A. 107
Adolphs, S. 21, 63, 92, 96
Al Hassan, L. 108
Allerton, D. J. 2, 43
Ambridge, B. 74
Amosova, N. N. 7, 39
Anderson, J. 56
Annual Review of Applied Linguistics

35
anthropologists 46
Appel, R. 21, 46, 107, 11216, 124,
1323, 136, 145, 165
Arnon, I. 62, 77, 101
Ashby, M. 92
Austin, J. L. 6
Baayen, H. 22, 62
Bacha, N. N. 102
Bachman, L. F. 93, 95
Baddeley, A. D. 56
Bahns, J. 70
Baker, P. 108, 124, 133, 135
Bannard, C. 723, 77
Barbieri, F. 106, 122, 133, 135
Bardovi-Harlig, K. 936
Barlow, M. 169
Barron, A. 956
Bassetti, B. 102
Bastos, M.-T. 96
Basturkmen, H. 147
Bauer, L. 42
Becker, J. D. 44
Ben Rejeb, R. 172
Benson, E. 144
Berndi, M. 143
Biber, D. 16, 21, 45, 82, 1067, 114,
122, 124, 126, 1315
bilingual dictionary 142
Birner, B. 144
Bod, R. 62, 64, 77, 101
Boers, F. 81, 107, 137, 13944, 146,
148, 171
Bolander, M. 15, 75
British National Corpus (BNC) 323,
46, 845, 10910, 160
Broca, P. 6
Burmeister, H. 70
192
INDEX
Butler, C. S. 10
Bybee, J. 29, 92
Byrd, P. 104, 11416, 119, 134
Cadierno, T. 78
Cameron-Faulkner, T. 72
Canadian Academic English Language
Assessment (CAEL) 107
Canter, G. 61
Chafe, W. L. 7
Chan, T.-P. 142
Chang, A. C. S. 142
Chen, L. 131, 144
Chen, Y. 108, 124, 133, 135
Chiu, C.-Y. 140
Cieslicka, A. 61
COBUILD 49, 143
Collentine, J. 76
collocation
anomalous 43
Firths definition 38
frequency-based 389
lexicography 40
phraseological approaches
3940
taxonomy 29
two-word 11, 45
collocation researchers 45
Columbus, G. 140
complexity, phonetic 256, 31
Conklin, K. 22, 612, 77, 140
Connor, U. 102, 106
Conrad, S. 16, 21, 45, 107, 122
Cook, V. 102
Corpus of Contemporary American
English (COCA) 22, 323, 109,
160
Cortes, V. 16, 21, 45, 122, 124, 131,
133, 135
Coulmas, F. 8, 25, 2930, 89
Cowie, A. P. 2, 3940, 104
Coxhead, A. 82, 104, 105, 11416, 119,
134
criteria checklists
Coulmas 25
frequency statistics 234
gradience of formulaicity 267
judgment procedure 2830
Peters 256, 302
themes and patterns 323

Wood 278, 302
Wray and Namba 267, 302
Csbi, S. 143
Culpeper, J. 94
current research
acquisition 1623
categorization 161
focus of academic writing 172
grammar construction 16970
history 15960
identification 160
language teaching 1667
lexical bundles 1656
Melcuks Meaning-Text Theory
1678
mental processing 1612
semantics and priming, lexicals
1701
spoken language 1634
teaching models 1712
usage-based theories 1689
written language 1645
Cutler, A. 601
Dai, Z. 140, 147
Davis, J. 95
Davis, P. 151
DeCarrico, J. S. 2, 10, 2931, 445,
89, 143
Dechert, H. W. 78, 87
De Jong, N. 146, 151
Demecheleer, H. 140
Demecheleer, M. 143
De Pablos-Ortega, C. 96
Derwing, B. 62, 77
Deschamps, A. 87
Ding, Y. 140, 147
discourse analysis activities (class
room) 1536
discourse organizing bundles 127
Ellis, N. 15, 19, 21, 24, 323, 46,
578, 758, 823, 101, 104, 106,
10912, 116, 119, 123, 12931,
1334, 136, 140, 147, 165
Ellis, R. 145
English as a Second Language (ESL)
59, 95, 1424
INDEX
English for Academic Purposes (EAP)
103, 105, 108, 11112, 115,
1234, 129, 1313, 1435
English for specific purposes (ESP)
131, 1434, 156
epic sung poetry 5
Erman, B. 11, 81, 92, 107
Eskildsen, S. W. 78
Eyckmans, J. 1401, 143
Ferris, D. 106
Finegan, E. 21, 107
first language
childrens use 6874
double role, formulaic language
714
pragmatic competence 701
vocabulary acquisition 678
Firth, J. R. 45, 38
Fitzpatrick, T. 146
fluency workshop
automatization stage 152
case study 153
free-talk stage 152
input stage 152
production stage 152
Focus on form (FonF), teaching
method 147
folklorists 56
formulaic language. See also current
research; specific activities
childrens use 1415, 256 (See
also criteria checklist, Peters)
classification 1011
comprehension 1114
definition 24
identification criteria 910, 160
oral formulaic genres 8
speech production 1114, 1634
writing process 1645
formulaic language, pedagogical
principle
feedback 147
practice 1457
preparation 145
Formulaic Language Research
Network ((FLaRN) 2, 35
formulaic sequences
automatization stage 152
193
benefits 1345
categorization 161
Columas criteria 25
communication strategy 15
in corpora 23
corpus analysis 4
discourse analysis, class room
1536
evidence 153
fillable slots 9
fluency workshop 1512
free talk stage 153
identification 20, 224, 278, 32
input stage 23
language models 10
length analysis 1324
multiword 14
native speaker usage 24
pragmatic function criteria 10
production stage 152
semantic and syntactic irregularities 11
speech fluency 6, 11
as vocabulary 1479
Forsberg, F. 139
Franken, M. 142
Fraser, B. 7, 41
Freed, B. F. 76, 878
frequency statistics
corpora 204, 312
criteria checklists 234
native speaker judgment 235
phonological characteristics 23
psycholinguistic measures 223
Freudenthal, D. 73
Frey, E. 77
Fulcher, G. 94
Galpin, A. 22, 61
Gatbonton, E. 146
Gatenby, E. V. 7, 40
Gibbs, R. 61
Girsai, N. 142
Gobet, F. 73
Goffman, E. 6
Goldberg, A. 64, 16970
Goldman-Eisler, F. 6
grammar construction 16970
grammarians 7
Grandage, S. 21, 63
194
INDEX
Granger, S. 2, 38, 1056

Grant, I. E. 42
Greenbaum, S. 5, 38
Gries, S. T. 21
Haggo, D. 8
Hakuta, K. 689
Hamp-Lyons, L. 45, 131
Handl, S. 105
Hasselgren, A. 78
Haywood, S. 103
Hickey, T. 68, 75, 91
Hill, J. 144
Hilpert, M. 76
Hockett. C. F. 401
Hoey, M. 1701
holistic storage, sequence 14, 212,
29, 53, 58, 601, 636, 92, 104,
1613
Hooper, J. 6970
Hornby, A. S. 7, 40
Housen, A. 141
Hsu, J.-Y. 140
Hulstijn, J. H. 57, 143
Hyland, K. 45, 1046, 110, 114, 123,
128, 131, 1345
Hymes, D. 6
ideational functions, language
participant- oriented 128
research- oriented 128
text- oriented 128
idioms
categorization 424
defining criteria 434
definition 401
Frasers definition 41
Hocketts definition 401
identification 41
Moons definition 40, 42
morphemes in 41
polymorphemes in 41
transformational-generative
grammar 41
Woods definition 42
Ilson, R. 144
Implementing the Lexical Approach
(Lewis) 144
Irreversible binomials 42
Jalkanen, I. 77
Jesperson, O. 7
Jiang, N. 140
Johansson, S. 21, 45, 107
Johnson, M. 148
Jones, J 16, 45, 131
Jones, M. 103
Jones, S. 39
Kappel, J. 140
Katz, J. J. 41
Kecskes, I. 95
Kemmer, S. 169
Kempler, D. 63
Keshavarz, M. H. 140
Kirjavainen, M. 73
Kjellmer, G. 5, 389
Kleinelder, S. 144
Koprowski, M. 144
Kormos, J. 76
Kramsch, C. 172
Krashen, S. 8
Kress, G. 102
Kuiper, K. 8, 62
Kumaravadivelu, B. 172
Kvecses, Z. 143
labels
collocation 378
concgrams 37, 4950
lexical bundles 21
lexical phrases 10
n-grams 37
terminologies 357
Lakoff, G. 148
Laufer, B. 139, 142
learning psychologists 67
Leech, G. 21, 45, 107
Leki, I. 102
Lennon, P. 878
Levy, S. 106, 124
Lewis, M. 103, 1434, 171
lexical bundles
academic discipline 1516
acquisition 124
class fragments 1256
components 1214
corpus analysis 21, 456, 1656
frequency-based method 21, 62
INDEX
195
functional characteristics 1268

noun-phrase and prepositional
fragments 126
research findings 1315
structural characteristics 1246
themes and patterns of research
136
verb phrase fragments 125
lexical phrases 10, 29, 31, 37, 44, 65,
89, 143, 162
lexical priming theory 1712
lexical semantics 167, 1701
lexicographers 4, 8, 40
lexicographists 40
lexicography 40
Li, J. 105
Libben, G. 62, 77
Lieven, E. 723
Lin, P. M. S. 23, 912
Lindstromberg, S. 139, 142, 144, 146,
148, 171
Liou, H.-C. 142
Liu, D. 21, 41, 42, 43, 46, 49, 82, 109,
110, 112, 116, 119, 133, 136, 147,
165
Llach, A. 102
L1 learners
acquisition contexts 15
class boundaries 878
pragmatic goals 95
translated formulas 96
L2 learners
natural learning environments 15
proficiency levels 107
purposes of teaching 46, 140
sequence of activities 149
speech production 878
syntactic rules 78
use of lexical bundles 124, 134
writing abilities 147
Loewen, S. 147
Lord, A. 5
Martinez, R. 85, 86, 140

Matthews, D. 72, 77
Maynard, C. 77
McCarthy, M. 143, 171
McClair, A. 107
McDonough, K. 77
Meaning-Text Theory, Melcuks 39,
1678
Meddings, L. 172
Melcuk, I. 7, 39, 40, 167, 168
mental processing
brain-damaged individuals 63
concepts of cognition 645
declarative knowledge 545
heteromorphic lexicon 5960
idiom 602
long-term memory 567
nonidiomatics 623
other types of research 634
procedural knowledge 545
real-life language use 1612
second language acquisition theory
578
short-term memory 567
spontaneous language 556
storage and retrieval, formulaic
sequence 589
themes and patterns of research
656
Meunier, F. 2
Mitchell, R. 69, 70
Mhle, D. 87
monolingual dictionary 142
Moon, R. 9, 40, 42
Mouton de Gruyter 2
multiword constructions (MWC)
10910, 11214, 1323
Murphy, V. A. 140
mutual expectancy 44, 89
mutual information (MI) statistics
212, 323, 77, 82, 110, 123,
140, 160
Myles, F. 69, 70, 78
MacFarlane, J. 62
Makkai, A. 41, 42
Malinowski, B. 6
Manes, J. 94
Martin, K. I. 76
Namba, K. 10, 22, 25, 26, 27, 30, 31,

77, 146
Nassaji, H. 147
Nation, I. S. P. 102
Nation, P. 145, 146, 150, 152
196
native speaker
clause-chaining fluency 12
intuition 20, 224, 31
judgement 235, 278, 32 (See
also criteria checklist, Wood)
Nattinger, J. R. 2, 10, 29, 30, 31, 44,
45, 89, 143
Nekrasova, T. M. 140
Nesselhauf, N. 2, 5, 38, 39
neurologists 4, 6
Newton, J. 142
noncompositionality 39, 42, 44
nonlexical-bundles 12931
OBrien, I. 76
ODell, F. 143, 171
ODonnell, M. B. 24
Opie, I. 6
Opie, P. 6
Palmer, H. E. 7, 40
Paltridge, B. 102
Paqout, M. 105
Par, A. 102
Parry, M. 5
Pawley, A. 5, 8, 9, 10, 12, 13, 14, 81
Perfetti, C. A. 146, 151
Perkins, M. R. 11, 14, 29, 30, 35, 37,
89
Peters, A. M. 14, 15, 25, 30, 31, 68,
69, 74, 75, 89, 91
Peters, E. 141
philosophers 6
phonological short-term memory
(PSTM) 76
phrasal compounds 42
phrasal verbs 910, 37, 42, 489, 81,
109, 143
Pine, J. M. 73, 74
Pinker, S. 64
Poos, D. 106
Postal, P. 41
Postman, W. A. 63
pragmatics 3, 8, 81, 936
pre-task planning 145
Raimes, A. 102
Rainey, I. 94
Raupach, M. 87
INDEX
Rayson, P. 106
referential bundles 1278
Rehbein, J. 7
Reiter, R. M. 94
research history, formulaic language
early research 47
lexical bundles 1516
since 1970s 7
source of information 2
themes and patterns 1617
use of strange items 34
word sequences, examples 3
Ricard, E. 150
Riggenbach, H. 87, 153, 154
Rinvolucri, M. 151
Robinson, P. 57
Roever, C. 95
Romer, U. 24
Rowland, C. F. 74
Rumsey, A. 8
Sadoski, M. 143
Safar, A. 76
Salazar, D. 106
Salomo, D. 72
Scarcella, R. 8
Schauer, G. A. 96
Schloff, L. 150
Schmidt, R. W. 15, 74, 75
Schmitt, N. 2, 21, 22, 61, 62, 63, 77,
85, 86, 105, 140
Scott, M. 121
Searle, J. 6
second language
developmental sequence
789
formulaic knowledge 768
themes and patterns, research
7980
theoretical models, acquisition
756
vocabulary acquisition 745,
1623
Segalowitz, N. 76, 146
semantic opacity 434
Sharifian, F. 96
Shei, C. C. 22
Shirai, Y. 78
Silva, T. 102
INDEX
Simpson-Vlach, R. 21, 32, 46, 77, 82,
83, 106, 109, 110, 111, 112, 116,
119, 123, 129, 130, 131, 133,
134, 135, 136, 165
Sinclair, J. 2, 5, 8, 38, 39, 81
Siyanova-Chanturia, A. 61, 62
Skandera, P. 2, 43
Snider, N. 62, 77, 101
sociologists 6
Sosa, A. 62
Spears, R. A. 144
specific activities
chain dictations 151
chat circles 151
mingle jigsaw 150
productive skills 97
receptive skills 97
shadowing 150
student dictations 151
4/3/2 technique 145, 1501
spoken language
lists of formulaic sequence
826
phonological characteristics 912
pragmatics competence, teaching
937
speech production 1634
themes and patterns, research 98
word fluency 8291
Staehr, L. S. 140
stance bundles 1267
Staples, S. 107
Steinel, M. P. 143
Steinel, W. 143
Stengers, H. 140, 141
Stoller, F. 16, 45, 131
Stubbs, M. 39, 171
Sugaya, N. 78
Swain, M. 145
Swinney, D. 60, 61
Syder, F. H. 8, 12, 13, 14, 81
Taguchi, N. 77, 96
Tapper, M. 106
task-based language teaching (TBLT)
144
teaching
English for Academic Purposes
(EAP) 115, 1312
197
Focus on form (FonF) approaches

147
materials 1435
models 171
pedagogical intervention 1413,
1457, 1667
practical applications 1435
strategies 13941
teaching models 1712
ten Hacken, P. 47
Terbeek, D. 61
Terkourafi, M. 94
Theakston, A. L. 73, 74
Thornbury, S. 172
Tomasello, M. 64, 72, 73, 74
tournure 42
Towell, R. 87, 88
Traverso, V. 95
Tremblay, A. 22, 62, 77, 101
Trofimovich, P. 77
Tucker, G. 10
Tzanne, A. 95
Underwood, G. 22, 61
usage-based models of language 66,
109, 162, 16770
Utsumi, T. 147
Van Lancker-Sidtis, D. 61, 63
Verdaguer, I. 106
Vinogradov, V. V. 7, 39
Virtanen, T. 106
vocabulary formulas
macro strategies 148
sequence integration 1489
Vogel, T. 70
Wajnryb, R. 152
Wakefield, H. 7, 40
Waldman, T. 139
Walker, I. 147
Warren, B. 11, 81
Webb, S. 142
Weinert, R. 58
Weinreich, U. 41
Wen, Z. 76
Westbury, C. 62, 77
Williams, E. 47
198
INDEX
Willis, D. 143
within-task planning 145
Witten, I. H. 142
Wolfson, N. 94
Wong, M. L.-Y. 94
Wong-Fillmore, L. 68, 69
Wood, D. C. 2, 7, 11, 21, 22, 25, 27,
28, 29, 30, 31, 33, 46, 77, 88,
89, 90, 91, 99, 107, 108, 112,
113, 114, 115, 116, 119, 124, 132,
133, 136, 140, 144, 145, 146,
150, 152, 153, 154, 165, 171
Wood, M. M. 42
Wray, A. 2, 9, 10, 11, 14, 23, 24, 25,

26, 27, 29, 30, 31, 35, 37, 53, 59,
89, 91, 146
Wu, S. 142
Yearbook of Phraseology, The
(Europhras) 2
Yeung, L. 106
Yorio, C. 14, 74
Yuan, F. 145
Yudkin, M. 150
Zhu, W. 102

Obk1i Fundamentals of Formulaic Language An Introduction

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Obk1i Fundamentals of Formulaic Language An Introduction

Enviado por

Direitos autorais:

Formatos disponíveis

Fundamentals of

ALSO AVAILABLE FROM BLOOMSBURY

ISBN: HB: 978-0-5671-8641-6

1 Formulaic Language Research in a Historical Perspective

Across Decades and Continents 1

FUNDAMENTALS OF FORMULAIC LANGUAGE

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

marginalization in classic generative linguistic theory, a fundamental aspect

FUNDAMENTALS OF FORMULAIC LANGUAGE

The third criterion is still under quite a bit of scrutiny, particularly by

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

silly ass, a popular colloquial pejorative label at that time, as an example

Research traditions in formulaic language

Literary scholars working on epic sung poetry

Anthropologists and folklorists

FUNDAMENTALS OF FORMULAIC LANGUAGE

phenomenon. Their work covered a range of types of spoken language from

Philosophers and sociologists

Neurologists and neuropsychologists

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

more automatized chunks of language to use while speaking spontaneously.

FUNDAMENTALS OF FORMULAIC LANGUAGE

Oral formulaic genres

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

FUNDAMENTALS OF FORMULAIC LANGUAGE

Nattinger and DeCarrico (1992) identified a subset of formulaic language

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

Speech production and comprehension

FUNDAMENTALS OF FORMULAIC LANGUAGE

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

FUNDAMENTALS OF FORMULAIC LANGUAGE

There is a clue here about one prime function of formulaic language in

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

Schmidt (1983) conducted a well-known case study of the English

Lexical bundles research

FUNDAMENTALS OF FORMULAIC LANGUAGE

in a corpus, or, in the case of academic language, a range of disciplines.

Formulaic language is important in spoken and written language.

Formulaic language is defined in certain ways.

Formulaic language has been studied from a wide range of

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

POINTS TO PONDER AND THINGS TO DO

aving an idea of what formulaic language is, at least in definitions elaborated

FUNDAMENTALS OF FORMULAIC LANGUAGE

3 Polly put the kettle on.

Frequency and statistical measures

IDENTIFYING FORMULAIC LANGUAGE

FUNDAMENTALS OF FORMULAIC LANGUAGE

IDENTIFYING FORMULAIC LANGUAGE

Criteria checklists and native speaker intuition

FUNDAMENTALS OF FORMULAIC LANGUAGE

It seems that use of computer corpus analysis software has certain

Native speaker judgment

judgment thresholds over time.

knowledge we do not have at the surface level of awareness.

IDENTIFYING FORMULAIC LANGUAGE

While some checklists have been developed for specific populations,

Early list of criteria: Coulmas (1979)

separately or in other environments

Formulaicity in child first language speech: Peters (1983)

FUNDAMENTALS OF FORMULAIC LANGUAGE

Gradience of formulaicity: Wray and Namba (2003)

situation and/or register.