Você está na página 1de 278

READER PERCEPTIONS OF LINGUISTIC VARIATION

IN PUBLISHED ACADEMIC WRITING

By Jesse Egbert

A Dissertation

Submitted in Partial Fulfillment

Of the Requirements for the degree of

Doctor of Philosophy

in Applied Linguistics

Northern Arizona University

May 2014

Approved:

Douglas Biber, Ph.D., Chair

Mark Davies, Ph.D.

William Grabe, Ph.D.

Randi Reppen, Ph.D.


UMI Number: 3621088

All rights reserved

INFORMATION TO ALL USERS


The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.

UMI 3621088
Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author.
Microform Edition © ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code

ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346
ABSTRACT

READER PERCEPTIONS OF LINGUISTIC VARIATION


IN PUBLISHED ACADEMIC WRITING

JESSE EGBERT

The purpose of this dissertation is to investigate relationships between the


linguistic choices of writers and reader perceptions of writing quality and style in
published academic writing. The dual methodology introduced in this study consists of
Biber’s well-established Multi-Dimensional (MD) analysis, which is used to measure and
interpret co-occurrence patterns among linguistic features, and Stylistic Perception (SP)
analysis, a new method of investigating linguistic variation from the perspective of reader
perceptions. The three major goals of this study are to describe published academic
writing in terms of (1) its linguistic variation, (2) its perceived quality and style, and (3)
relationships between the linguistic choices of authors and the perceptions of readers.
The research in this study is based on a corpus of 150 samples of published
academic writing from three publication types (journal articles, university textbooks, and
popular academic books) in two disciplines (biology and history). After presenting a
detailed overview of this corpus, I describe the development and application of a
comprehensive framework for situational analysis to the different varieties in the corpus.
A new MD analysis of linguistic variation in the corpus reveals five dimensions of
variation that are functionally interpreted. Key patterns in language use along these
dimensions are explored within and across registers, disciplines and publication types.
For the SP analysis, an instrument was developed to measure reader perceptions of
writing quality and style and administered to 25 participant readers per text. A battery of
reliability assessments strongly supports the reliability and usefulness of this instrument.
Similar to the MD analysis, co-occurrence patterns among the perceptual items are
measured, revealing two underlying dimensions of perceptual variation. After
interpreting these dimensions and applying them to the texts in the corpus, I explore
variation in reader perceptions within and across registers, disciplines, and publication
types. Correlational techniques are used to measure relationships between the language
used by authors of published academic prose and the perceptions of readers. Regression
analyses show that the linguistic choices of authors can predict reader perceptions. All of
the quantitative results are complemented by thorough qualitative investigations of
discourse patterns in individual texts in order to provide a more complete description of
academic prose.
In summary, this dissertation helps to increase our understanding of the linguistic
and stylistic characteristics of published academic writing. These findings can be used to
prepare students to successfully read and comprehend writing in the university context.
This research also reveals the perceptions readers have towards different writing styles
which can be used to help writers produce successful register-specific prose and improve
the effectiveness of their individual writing styles.

ii
Jesse Egbert

© 2014

iii
ACKNOWLEDGEMENTS

When I began my dissertation research, I felt like I was starting a long and
challenging journey, alone. As I look back, I realize that I was right on two counts but
wrong on the third. The process was long and challenging, but I never felt alone. How
could I with so many great mentors, peers, family members, and friends? In many ways,
this dissertation represents my greatest collaborative research endeavor to date. And this
section is my chance to express gratitude to the greatest research team I’ll ever have.
Just over 50 years ago John Carroll published a paper titled ‘Vectors of prose
style’. Just over 25 years ago Doug Biber published his first book, Variation across
Speech and Writing. Those two publications were the inspiration and foundation for my
study. Thank you, John and Doug, for blazing a trail that I could follow.
Working closely with Doug Biber for the past several years has been one of the
greatest experiences of my life. For Doug, it was never really about my dissertation, my
Ph.D., or my academic future. It was about me. Thanks, Doug. Thanks for understanding
that my family is my first priority, and for reminding me to put them first when I got lost
in my work. Thanks for listening to all of my ideas—good and bad. Thanks for asking
hard questions and teaching me about good judgment. Thanks for pushing me as a
programmer and statistician. Thanks for getting me outdoors to climb and giving me a
tour of Sedona from the tops of peaks, buttes, and spires. I hope you are planning to
continue climbing for many years to come because I’m just getting started.
I owe much gratitude to the members of my dissertation committee—Randi
Reppen, William Grabe, and Mark Davies. Randi’s optimism, attention to detail, and
unfailing support have combined to help make my dissertation a strong study and a great
experience. Bill Grabe’s perceptive questions and insatiable desire to make sense of data
have permanently altered the way I do research. A two-sentence email I received from
Bill not long ago sums up his role in my life during the past several years. He simply
wrote, “Don’t get discouraged. Just keep pushing.” Mark Davies introduced me to corpus
linguistics and to the program at Northern Arizona University. I’m grateful to Mark for
being by my side for the past 6 years as I have grown from being a student in his
undergraduate grammar class to being his academic colleague and peer.
I am grateful for many other professors in Applied Linguistics, Educational
Psychology, and Statistics at Northern Arizona University who have taught and mentored
me. Bill Martin, Brent Burch, Derek Sonderegger, and Luke Plonsky have each spent
many hours with me discussing the methods, research design, and statistical procedures
of this and other studies. I am grateful to Shelley Staples who was the first person to read
every chapter of my dissertation; I’ll always be glad that we did this together. I am also
thankful to Geoff LaFlair for support and useful suggestions during our early morning car
rides to work.
My Mom and Dad deserve special thanks for teaching me from a young age that I
can do hard things, and for believing in me even when I didn’t believe in myself. My
older brother Rob has an uncanny ability to call me on the phone just before I reach my
breaking point. Thanks, Rob, for being my best friend. I have more siblings than space to
write about them. So, in brief, thank you Rob, Laura, Chris, Cameron, Chaz, Emily,
Lizzy, Ashlyn, Melissa, and Jenna; you have each made unique and valuable
contributions to what I’ve done and who I am.

iv
To Landon, Travis, Kaitlyn, and Lincoln: when you’re old enough to read this, I
want you to know that your Dad loves you and, in many ways, he wrote this big, long,
boring book for you. Until then, I added some pictures for you to look at (see Chapters 5-
7).
Finally, no one in this wide world has a more supportive wife than I. Rachelle
knows, by name, every professor and colleague I have ever worked with. When she asks
me about my day, I can tell her exactly who I met with and what research I worked on.
She then asks me very specific questions about the status of my various ideas, projects,
grant proposals, and writing. She is an exceptional listener when I talk through my ideas,
and she has offered valuable suggestions that have greatly improved my work. I am
convinced that Rachelle deserves a degree in Applied Linguistics when I graduate. I love
you, Rachelle. You have been the one by my side throughout nearly everything I have
experienced during the past 10 years, and that makes me the luckiest guy I know.

v
TABLE OF CONTENTS

ABSTRACT ........................................................................................................................ ii
ACKNOWLEDGEMENTS ............................................................................................... iv
TABLE OF CONTENTS ................................................................................................... vi
LIST OF TABLES ............................................................................................................. xi
LIST OF FIGURES ......................................................................................................... xiii
DEDICATION .................................................................................................................. xv
Chapter 1. Introduction ....................................................................................................... 1
1.1. Introduction .............................................................................................................. 1
1.1.1. Journal articles .................................................................................................. 1
1.1.2. University textbooks ......................................................................................... 2
1.1.3. Popular academic books ................................................................................... 2
1.2. Goals of the dissertation........................................................................................... 3
1.3. Outline of the dissertation ........................................................................................ 4
Chapter 2. Review of the Literature on Academic Writing and Writing Quality ............... 6
2.1. Introduction .............................................................................................................. 6
2.2. Publication Type and Discipline Variation in Academic Writing ........................... 6
2.2.1. Register as a predictor of linguistic variation ................................................... 6
2.2.2. Research on the register of academic writing ................................................... 7
2.2.3. Publication type and discipline variation within academic writing .................. 8
2.2.4. Interaction effects in academic language ........................................................ 14
2.3. Stylistics and Writing Quality ................................................................................ 15
2.3.1. Previous quantitative approaches to measuring author style .......................... 15
2.3.2. Previous approaches to stylistic perception .................................................... 16
2.3.3. Previous research on objective measures of writing quality ........................... 17
2.3.4. Measuring reader perceptions of writing quality ............................................ 27
2.3.5. Triangulating reader perceptions and text-linguistics ..................................... 28
2.4. Conclusion ............................................................................................................. 29
Chapter 3. Constructing and Analyzing a Corpus of Published Academic Writing ......... 31
3.1. Introduction ............................................................................................................ 31
3.2. Corpus collection procedures ................................................................................. 31
3.2.1. Operational definitions for the disciplines and registers................................. 31
3.2.2. Source text selection ....................................................................................... 32
3.2.3. Text formatting and cleaning .......................................................................... 34

vi
3.3. Corpus Description ................................................................................................ 34
3.4. Assessing the representativeness of the corpus ..................................................... 34
3.5. Methods for annotating and quantifying variables ................................................ 37
3.5.1. Text-linguistic features ................................................................................... 37
3.5.2. Stylistic perceptions ........................................................................................ 37
3.6. Summary ................................................................................................................ 38
Chapter 4. The Situational Characteristics of Published Academic Writing .................... 39
4.1. Introduction ............................................................................................................ 39
4.2. Definitions for target registers ............................................................................... 39
4.3. A framework for the situational characteristics ..................................................... 40
4.3.1. Participants ...................................................................................................... 41
4.3.2. Relations among participants .......................................................................... 41
4.3.3. Setting ............................................................................................................. 41
4.3.4. Subject matter ................................................................................................. 41
4.3.5. Purpose............................................................................................................ 42
4.3.6. Nature of data or evidence .............................................................................. 42
4.4. Analyzing the situational characteristics of the six registers ................................. 42
4.5. Trends in the situational characteristics of academic writing ................................ 46
4.5.1. Common characteristics across the registers and disciplines ......................... 46
4.5.2. Popular academic books ................................................................................. 46
4.5.3. University textbooks ....................................................................................... 48
4.5.4. Journal articles ................................................................................................ 49
4.5.5. Situational differences between biology and history texts ............................. 49
4.5. Summary ................................................................................................................ 50
Chapter 5. A Multi-Dimensional Analysis of Linguistic Variation in Published Academic
Writing .............................................................................................................................. 52
5.1. Introduction and background ................................................................................. 52
5.2. Carrying out the Multi-Dimensional analysis ........................................................ 52
5.2.1. Published academic writing along Biber’s (1988) Dimension 1 .................... 52
5.2.2. Linguistic features ........................................................................................... 54
5.2.3. Factor analysis ................................................................................................ 57
5.2.4. Dimension scores ............................................................................................ 57
5.3. Dimensions of register variation in academic writing ........................................... 57
5.3.1. Dimension 1: ‘Non-technical Synthesis versus Specialized Information
Density’ ..................................................................................................................... 59

vii
5.3.2. Dimension 2: ‘Definition and Evaluation of New Concepts’ ......................... 60
5.3.3. Dimension 3: ‘Author-centered Stance’ ......................................................... 60
5.3.4. Dimension 4: ‘Colloquial Narrative’ .............................................................. 60
5.3.5. Dimension 5: ‘Abstract Observation and Description’ ................................... 61
5.4. Linguistic variation in published academic writing ............................................... 61
5.4.1. Dimension 1: ‘Non-technical Synthesis versus Specialized Information
Density’ ..................................................................................................................... 63
5.4.2. Dimension 2: ‘Definition and Evaluation of New Concepts’ ......................... 67
5.4.3. Dimension 3: ‘Author-centered Stance’ ......................................................... 70
5.4.4. Dimension 4: ‘Colloquial Narrative’ .............................................................. 73
5.4.5. Dimension 5: ‘Abstract Observation and Description’ ................................... 77
5.5. Comparing the Register and Publication Type x Discipline Models ..................... 80
5.6. Interpreting variation among publication types ..................................................... 82
5.6.1. Popular academic books ................................................................................. 82
5.6.2. University textbooks ....................................................................................... 83
5.6.3. Journal articles ................................................................................................ 83
5.7. Conclusion ............................................................................................................. 84
Chapter 6. a Stylistic Perception Analysis of Published Academic Writing .................... 85
6.1. Introduction ............................................................................................................ 85
6.2. Methods.................................................................................................................. 85
6.2.1. Developing an instrument to measure writing quality .................................... 85
6.2.2. Data collection ................................................................................................ 89
6.2.3. Perceptual variation across demographic groups ............................................ 92
6.3. Reliability............................................................................................................... 95
6.3.1. Item reliability across raters and texts ............................................................ 95
6.3.2. Text reliability across raters and items ........................................................... 96
6.3.3. Rater reliability across items and texts ........................................................... 97
6.3.4. Summary of reliability results ......................................................................... 98
6.4. Performing the Multi-Dimensional analysis .......................................................... 98
6.4.1. Perceptual Variables ....................................................................................... 98
6.4.2. Factor Analysis and Dimension Scores .......................................................... 99
6.5. Dimensions of perceived writing quality in academic writing .............................. 99
6.5.1. Dimension 1: ‘Engaging and Easy to Read vs. Boring and Difficult to Read’
................................................................................................................................. 100

viii
6.5.2. Dimension 2: ‘Interactive Author Interpretation vs. Objective Information
Focus’ ...................................................................................................................... 100
6.5.3. Relationship between the two dimensions .................................................... 101
6.6. Stylistic perceptions of published academic writing ........................................... 101
6.6.1. Dimension 1: ‘Engaging and Easy to Read vs. Boring and Difficult to Read’
................................................................................................................................. 101
6.6.2. Dimension 2: ‘Interactive Author Interpretation vs. Objective Information
Focus’ ...................................................................................................................... 105
6.7. Summary: Instrument development and data collection ...................................... 108
6.8. Summary: Dimensions of stylistic perception in published academic writing .... 109
6.8.1. Dimensions of stylistic perception in published academic writing .............. 109
6.8.2. Interpreting variation between disciplines .................................................... 109
6.8.3. Interpreting variation among publication types ............................................ 110
6.9. Conclusion ........................................................................................................... 111
Chapter 7. Correlating the Text-Linguistics and Stylistic Perceptions of Published
Academic Writing ........................................................................................................... 112
7.1. Introduction .......................................................................................................... 112
7.2. Linguistic predictors of perceived writing quality ............................................... 112
7.2.1. Complete corpus ........................................................................................... 112
7.2.2. Discipline results ........................................................................................... 120
7.2.3. Publication type results ................................................................................. 124
7.3. Summary of findings............................................................................................ 140
Chapter 8. Synthesis and Conclusion............................................................................. 142
8.1. Introduction ......................................................................................................... 142
8.2. Linguistic variation and perceived quality of published academic writing ......... 145
8.2.1. Journal articles in biology and history ......................................................... 145
8.2.2. University textbooks in biology and history ................................................. 147
8.2.3. Popular academic books in biology and history .......................................... 148
8.2.4. Linguistic predictors of perceived writing quality ....................................... 150
8.3. Methodological advantages of this study............................................................ 151
8.3.1. Corpus design............................................................................................... 151
8.3.2. Instrument development............................................................................... 152
8.3.3. Interaction effects......................................................................................... 153
8.3.4. Methodological triangulation ....................................................................... 153
8.4. Implications......................................................................................................... 154

ix
8.4.1. For teachers .................................................................................................. 154
8.4.2. For authors, publishers, and editors ............................................................. 155
8.4.3. For researchers ............................................................................................. 155
8.4.4. For EAP instructors and administrators ....................................................... 156
8.5. Limitations and Future research.......................................................................... 157
8.5.1. Limitations ................................................................................................... 157
8.5.2. Need for replication studies ......................................................................... 158
References ....................................................................................................................... 160
Appendix A. Detailed Information for Each Text in the Corpus .................................... 169
Appendix B. Semantic Classes of Nouns, Verbs, and Adjectives and Formulaic Language
Lists ................................................................................................................................. 184
Appendix C. Scree Plot of the Six-Factor Solution for the Linguistic Data ................... 189
Appendix D. Full Factorial Structure Matrix of the Six-Factor Solution for the Linguistic
Data ................................................................................................................................. 190
Appendix E. Significance Testing for the Linguistic Data ............................................. 192
Appendix F. Stylistic Perception Survey ........................................................................ 197
Appendix G. Scree Plot of the Two-Factor Solution for the Perceptual Data ................ 200
Appendix H. Full Factorial Structure Matrix of the Two-Factor Solution for the
Perceptual Data ............................................................................................................... 201
Appendix I. Significance Testing for the Perceptual Data ............................................. 203
Appendix J. Multiple Regression Output........................................................................ 205
Appendix K. Full Correlation Matrix of All Linguistic Features and Dimensions and
Perceptual Dimensions and Items ................................................................................... 212

x
LIST OF TABLES

Table 2.1. Normed Counts of Nouns and Verbs in Speech and Academic Writing ........... 7
Table 2.2. Overview of previous corpus-based studies on journal articles, with lists
of the disciplines and linguistic features included. ............................................... 10
Table 2.3. Overview of previous corpus-based studies on university textbooks, with
lists of the disciplines and linguistic features included. ........................................ 12
Table 2.4. Overview of previous corpus-based studies on popular academic writing,
with lists of the disciplines and linguistic features included................................. 13
Table 2.5. Empirical studies on perceived writing quality…………………………………………19
Table 3.1. Sources for the texts in the corpus ................................................................... 33
Table 3.2. Text and Word Counts in the Academic Written English Corpus ................... 34
Table 4.1. Situational characteristics of journal articles, university textbooks, and
popular academic books ........................................................................................ 43
Table 5.1. Summary of the 56 linguistic features included in the final factor analysis .... 55
Table 5.2. Final factor structure of the five-factor solution .............................................. 58
Table 6.1. Final 38 items in the Stylistic Perceptions Scale. ............................................ 88
Table 6.2. Intraclass Correlation Coefficients for the 38 perceptual differential items. ... 95
Table 6.3. Intraclass Correlation Coefficient Results for Texts by Publication Type
and Discipline. ...................................................................................................... 97
Table 6.4. Final dimension structure of the two-factor solution. ...................................... 99
Table 7.1. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the entire corpus. ........................................................ 113
Table 7.2. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the biology sub-corpus. .............................................. 120
Table 7.3. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the history sub-corpus. ............................................... 124
Table 7.4. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the journal article sub-corpus. .................................... 125
Table 7.5. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the journal articles in biology sub-corpus. ................. 129
Table 7.6. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the journal articles in history sub-corpus. .................. 133
Table 7.7. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the university textbook sub-corpus. ........................... 133
Table 7.8. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the university textbooks in biology sub-corpus. ........ 134
Table 7.9. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the university textbook sub-corpus. ........................... 136
Table 7.10. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the popular academic book sub-corpus. ..................... 137
Table 7.11. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the popular academic books in biology sub-corpus. .. 137
Table 7.12. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the popular academic books in history sub-corpus. ... 138
Table 8.1. Summary: Distinctive linguistic and perceptual characteristics by register. 143

xi
Table B1. Semantic classes of nouns (see Biber, 2006b). .............................................. 184
Table B2. Semantic classes of verbs. .............................................................................. 186
Table B3. Semantic classes of adjectives (see Biber 2006b). ......................................... 187
Table B4. Formulaic Language....................................................................................... 187
Table D1. Factorial structure matrix of the six-factor solution for the linguistic data. .. 190
Table E1. ANOVA results for linguistic dimension 1. ................................................... 192
Table E2. Within Discipline Simple Effects for Linguistic Dimension 1 ...................... 192
Table E3. Within Register Simple Effects for Linguistic Dimension 1 ......................... 192
Table E4. ANOVA results for linguistic dimension 2. ................................................... 193
Table E5. Within Discipline Simple Effects for Linguistic Dimension 2. ..................... 193
Table E6. Within Register Simple Effects for Linguistic Dimension 2. ....................... 193
Table E7. ANOVA results for linguistic dimension 3. ................................................... 194
Table E8. Within Discipline Simple Effects for Linguistic Dimension 3. ..................... 194
Table E9. Within Register Simple Effects for Linguistic Dimension 3. ....................... 194
Table E10. ANOVA results for linguistic dimension 4. ................................................. 195
Table E11. Tukey’s post-hoc results for linguistic dimension 4. ................................... 195
Table E12. ANOVA results for linguistic dimension 5. ................................................. 196
Table E13. Within Discipline Simple Effects for Linguistic Dimension 5. ................... 196
Table E14. Within Register Simple Effects for Linguistic Dimension 5. ..................... 196
Table H1. Full factorial structure matrix for the perceptual data. .................................. 201
Table I1. ANOVA results for perceptual dimension 1. .................................................. 203
Table I2. Within discipline simple effects for perceptual dimension 1. ......................... 203
Table I3. Within register simple effects for perceptual dimension 1. ............................. 203
Table I4. ANOVA results for perceptual dimension 2. .................................................. 204
Table I5. Within discipline simple effects for perceptual dimension 2. ......................... 204
Table I6. Within register simple effects for perceptual dimension 2. ............................. 204
Tables J1 – J2. Multiple regression output for linguistic dimension predictors of
perceptual dimension 1 in the complete corpus. ................................................. 205
Tables J3 – J4. Multiple regression output for linguistic dimension predictors of
perceptual dimension 2 in the complete corpus. ................................................. 206
Tables J5 – J6. Multiple regression output for linguistic feature predictors of
perceptual dimension 1 in the complete corpus. ................................................. 207
Tables J7 – J8. Multiple regression output for linguistic feature predictors of
perceptual dimension 2 in the complete corpus. ................................................. 207
Tables J9 – J10. Multiple regression output for linguistic dimension predictors of
perceptual dimension 1 in the biology sub-corpus. ............................................ 208
Tables J11 – J12. Multiple regression output for linguistic dimension predictors of
perceptual dimension 2 in the biology sub-corpus. ............................................ 209
Tables J13 – J14. Multiple regression output for linguistic dimension predictors of
perceptual dimension 1 in the journal article sub-corpus. ................................ 210
Tables J15 – J16. Multiple regression output for linguistic dimension predictors of
perceptual dimension 2 in the journal article sub-corpus. ................................. 211

xii
LIST OF FIGURES

Figure 3.1. Mean Biber (1988) Dimension Scores for the History University
Textbook Sub-corpora in Conrad (1996a) and the present study ......................... 36
Figure 3.2. Mean Biber (1988) Dimension Scores for the Biology Journal Article
Sub-corpora in Conrad (1996a) and the present study.......................................... 36
Figure 5.1. Distribution of the registers and disciplines along Biber’s (1988)
Dimension 1 (Involved versus Informational Production) compared to 5
registers from Biber (1988) ................................................................................... 53
Figure 5.2. Registers along Dimension 1: ‘Non-technical synthesis versus specialized
information density’ .............................................................................................. 64
Figure 5.3. Marginal Means Plot for Dimension 1: ‘Non-technical Synthesis vs.
Specialized Information Density’ ......................................................................... 65
Figure 5.4. Registers along Dimension 2: ‘Definition and evaluation of new concepts’ . 68
Figure 5.5. Marginal Means Plot for Dimension 2: ‘Definition and Evaluation of
New Concepts’ ...................................................................................................... 69
Figure 5.6. Registers along Dimension 3: ‘Author-centered stance’ ............................... 71
Figure 5.7. Marginal Means Plot for Dimension 3: ‘Author-centered Stance’ ................ 72
Figure 5.8. Registers along Dimension 4: ‘Colloquial Narrative’ ................................... 75
Figure 5.9. Marginal Means Plot for Dimension 4: ‘Colloquial Narrative’ ..................... 76
Figure 5.10. Registers along Dimension 5: ‘Abstract Observation and Description’ ..... 78
Figure 5.11. Marginal Means Plots for Dimension 5: ‘Abstract Observation and
Description’........................................................................................................... 79
Figure 6.1. Distribution of participant ages. ..................................................................... 91
Figure 6.2. Distribution of participant educational background. ...................................... 92
Figure 6.3. Mean Dimension Scores for Males and Females. .......................................... 93
Figure 6.4. Mean Dimension Scores for Five Education Groups. .................................... 94
Figure 6.5. Mean Dimension Scores for Six Age Groups. ............................................... 94
Figure 6.6. Registers along Dimension 1: ‘Engaging and Easy to Read vs. Boring
and Difficult to Read’ ......................................................................................... 102
Figure 6.7. Marginal Means Plot for Publication Type and Discipline along
Dimension 1. ....................................................................................................... 103
Figure 6.8. Registers along Dimension 2: ‘Interactive Author Interpretation vs.
Objective Information Focus’ ............................................................................. 106
Figure 6.9. Marginal Means Plot for Publication Type and Discipline along
Dimension 2. ....................................................................................................... 107
Figure 7.1. Case study 1: PD scores, sample excerpts, and LD profiles for two texts
from the overall corpus. ...................................................................................... 119
Figure 7.2. Case study 2: PD scores, sample excerpts, and LD profiles for two texts
from the biology sub-corpus. .............................................................................. 123
Figure 7.3. Case study 3: PD scores, sample excerpts, and LD profiles for two texts
from the journal article sub-corpus. .................................................................... 127
Figure 7.4. Case study 4: PD scores, sample excerpts, and LD profiles for two texts
from the biology journal article sub-corpus. ....................................................... 130
Figure 7.5. Profile plot of mean ‘Abstract Observation and Description’ scores for
journal articles in biology across article sections. .............................................. 132

xiii
Figure 7.6. Case study 5: PD scores, sample excerpts, and LD profiles for two texts
from the biology textbook sub-corpus. ............................................................... 135
Figure 7.7. Case study 6: PD scores, sample excerpts, and LD profiles for two texts
from the history popular academic books sub-corpus. ....................................... 139
Figure C1. Scree plot of the six-factor solution for the linguistic data. .......................... 189
Figure G1. Scree plot of the two-factor solution for the perceptual data. ...................... 200

xiv
DEDICATION

For Rachelle.

Because you made this possible and meaningful.

And because Grandpa would have wanted it this way.

xv
CHAPTER 1. INTRODUCTION

1.1. Introduction

It is difficult to say exactly how long humans have been inquiring about scientific
matters. However, we do know that humans have been publishing their writing on
scientific topics for many centuries (see, e.g., Kronick, 1976; Meadows, 1981). The
emergence of published scientific writing has enabled the scientific community to use
previous scientific literature to help them generate research questions, develop research
methods, and interpret research findings. Additionally, science writing has allowed
members of the scientific community to focus their research efforts on generating and
transmitting new knowledge rather than needlessly repeating research done by others.
While the scientific community was, and arguably still is, an exclusive group of academic
elites, scientific findings are now published in many forms and for many audiences,
including students and the general public. Published academic writing is one of the most
important ways that scientists and humanists leave a permanent record of their questions,
methods, observations, and conclusions. However, our understanding of the linguistic
nature of academic writing is limited in many ways. There has been substantial research
on the linguistic characteristics of published academic writing, but very little is known
about the perceptions and attitudes of readers towards academic writing across registers,
disciplines, publication types, and individual author styles.
The overarching goal of this dissertation study is to measure relationships
between the language characteristics of published academic writing and the perceptions
of lay readers. This goal is accomplished through four major steps: (1) compilation and
description of a representative corpus of published academic writing (Chapters 3 and 4),
(2) analysis of the linguistic characteristics of the texts in the corpus (Chapter 5), (3)
analysis of stylistic perceptions of lay readers (Chapter 6), and (4) analysis of the
relationships between language use and reader perceptions (Chapter 7). The results of
these various steps contribute a wealth of information about the linguistic nature of
published academic writing and the relationships between the linguistic choices of
authors and reader perceptions of writing quality and style.
This chapter begins with a brief overview of the three publication types that are
investigated in this study. In this study, two disciplines (biology and history) are
represented within each of these publication types. Each of the resulting six textual
domains (e.g., journal articles in biology) comprises a register of published academic
writing. For the purposes of this study, a register is defined as “a [language] variety
associated with a particular situation of use (including particular communicative
purposes)” (Biber & Conrad, 2009, p. 6). A complete situational analysis of these six
registers of academic writing is presented in Chapter 4.

1.1.1. Journal articles

Previous researchers have categorized scientific writing into the following 4-stage
continuum: (1) intraspecialist, (2) interspecialist, (3) pedagogical, and (4) popular
(Cloitre & Shinn, 1985). The main distinction made between the first two stages is that
intraspecialist articles are written to a more narrow audience of experts who specialize in

1
a particular area of research, whereas interspecialist articles are written to a broader, more
interdisciplinary audience. While there are subtle differences between these two groups,
they share many things in common. Most importantly, they are both written by scholars
for other scholars. In this study, intraspecialist and interspecialist writing will be grouped
together within the publication type of journal articles. This scholar-to-scholar
publication type functions as a venue for the transmission of generated scientific
knowledge.
According to Gray (2011), “research articles typically report on new, developing
knowledge in the field, with the aim of increasing and proposing current disciplinary
knowledge” (p. 3). New knowledge generated through scientific research typically
appears in journal articles before any other printed form. Another key attribute of journal
articles is the rigorous peer-review process designed to uphold a high standard of quality
in research design and interpretation. In sum, as a publication type, published academic
journal articles represent the accepted body of developing knowledge within the scientific
community.

1.1.2. University textbooks

Cloitre and Shinn (1985) used the label ‘pedagogical’ to describe the third stage
of scientific writing, which they describe as the stage in which scientific knowledge is
developed and consolidated into a cumulative manual of instruction on a topic or set of
topics within a particular discipline. The publication type of university textbooks is
included in this study to represent pedagogical texts. In contrast with the scholar-to-
scholar nature of journal articles, university textbooks are typically scholar-to-student
texts. Gray (2011) describes textbooks as books written to readers with a developing
knowledge of a particular topic in order to introduce them to the established knowledge
within a particular discipline (p. 3). Textbooks are also an important part of academic
discourse socialization, used to help students learn the language, conventions, and tools
necessary to function and progress in academic settings, such as the university (see
Hyland, 2009). To summarize, textbook authors have a goal of transmitting the
established knowledge generated through scientific research to students or novice
readers.

1.1.3. Popular academic books

The final category in Cloitre and Shinn’s (1985) continuum of scientific writing is the
‘popular’ stage. Popular academic writing is the most recently developed of the four
stages. It is also the most dynamic and widespread of the three publication types
introduced here. Popular academic writing can be characterized as a scholar-to-public
publication type. However, it is recognized that many authors of popular academic prose
are not scholars (i.e., they are not experts in the discipline discussed in their publication).
It is also important to acknowledge that ‘the general public’ is an extremely
heterogeneous group in terms of background knowledge, interests, and reading ability
that no author of popular academic writing tries to accommodate with a single
publication. Popular academic texts are written in an effort to bridge the gap identified by
Fahnestock (1998) “between the public’s right to know and the public’s ability to

2
understand,” (p. 330). According to Cloitre and Shinn (1985), popular academic writing
also gives greater attention to issues such as health, technology, and the economy.
The body of popular academic writing has grown at an accelerated rate in recent
years. One sign of this growth is the establishment and increasing membership of the
National Association of Science Writers, which aims to “foster the dissemination of
accurate information regarding science through all the media normally devoted to
informing the public” (http://www.nasw.org/about-national-association-science-writers-
inc). Another sign of this growth is the establishment of degree programs at major
universities, such as the Massachusetts Institute of Technology in Science Writing, which
teaches skills in “writing about science, medicine, and technology for general readers”
(http://sciwrite.mit.edu/program-information/what-is-science-writing). In summary,
popular academic writing functions to transmit scientific findings and ideas to the general
public

1.2. Goals of the dissertation

The previous section has established the intellectual and cultural importance of
published academic writing. In order to more fully understand the linguistic
characteristics of this macro-register, this dissertation has three main goals:

1. To investigate linguistic variation across registers, publication types, and


discipines of published academic writing.

2. To investigate variation in reader perceptions across registers, publication types,


and disciplines of published academic writing.

3. To investigate linguistic predictors of perceived writing quality and style across


registers, publication types, and disciplines.

The first goal is to measure linguistic variation in published academic writing,


taking into account variation across registers, publication types, and disciplines. Many
previous studies have investigated the linguistic characteristics of registers of academic
writing. However, some of these studies are focused on describing these registers either
in isolation or in comparison with non-academic registers such as conversation or
newspaper articles. Although these studies provide useful descriptions of academic
writing, they do not provide a comparison of linguistic variability along Cloitre and
Shinn’s (1985) intraspecialist-interspecialist-pedagogical-popular continuum of scientific
writing. This study aims to take a comparative approach to describing linguistic variation
across six registers within the macro-register of published academic writing.
One often ignored aspect of academic writing is the relationship between
discipline and publication type. Despite the small amount of research that has shown that
these two variables often interact, most studies of academic writing fail to consider and
measure possible interaction effects. This can result in incomplete and even inaccurate
results regarding the nature of the linguistic variation in published academic prose. This
study will measure potential interaction effects between publication type and discipline in
order to fully account for these variables in the data.

3
The second goal of this study is to measure the perceptions of readers towards
published academic texts, both within and across different registers. The discussion thus
far has focused mostly on textual characteristics of academic writing. An equally
important yet often overlooked consideration is the impact academic writing has on the
reader. According to Gopen and Swan (1990), “It may seem obvious that a scientific
document is incomplete without the interpretation of the writer; it may not be so obvious
that the document cannot ‘exist’ without the interpretation of each reader” (p. 558). Most
previous research on the language of academic prose has ignored the reader, focusing
solely on the characteristics of the text. This study will measure and report reader
perceptions of academic writing across registers, disciplines and publications types using
a newly developed instrument.
The third and final goal of this study is to investigate linguistic predictors of
perceived writing quality and style across registers. Goals 1 and 2 investigate published
academic writing in terms of its linguistic variability and reader perceptions towards that
variability, respectively. In order to accomplish the third goal, I use quantitative and
qualitative techniques to measure relationships between the linguistic characteristics of
texts and reader perceptions of writing quality and style. This will ultimately provide a
linguistic framework for the analysis of writing quality and style.

1.3. Outline of the dissertation

This study comprises eight chapters that are organized into two parts. In the first part
(Chapters 1-4), I establish a foundation for the dissertation by introducing its aims,
situating it in previous literature, describing the corpus, and building and applying a
framework to describe the situational characteristics of the texts in the corpus. In Chapter
2, I present an overview of previous literature related to (a) register variation in published
academic writing and (b) writing quality and stylistic perception. The concluding section
of Chapter 2 summarizes gaps in the literature that I aim to fill in this dissertation.
Chapter 3 includes a description of the corpus used in this study, including a detailed
overview of the procedures used for text collection, a description of the contents of the
corpus, and an assessment of its representativeness. Chapter 3 also briefly introduces the
linguistic and perceptual variables that are measured in this study. Chapter 4 describes the
development of a comprehensive framework for the analysis of the situational
characteristics of published academic prose, and applies this framework to describe
situational variation within and across the registers included in the corpus.
In the second part (Chapters 5-8), I report the results of a series of quantitative
linguistic and perceptual analyses and interpret these results through the use of qualitative
techniques. In Chapter 5 I describe the methods and results of a new Multi-Dimensional
analysis of linguistic variation in the academic writing included in the corpus. In this
chapter I also interpret the dimension structures and use them to investigate register
variation and the relative effects of publication type and discipline on linguistic variation
in the corpus. In Chapter 6 I describe the development and application of Stylistic
Perception analysis, a new method of describing writing style from the perspective of
reader perceptions. After subjecting the results of a newly developed survey to a battery
of reliability assessments, I conduct a factor analysis and apply the resulting dimension
structures to the texts in the corpus in order to measure variability in reader perceptions

4
within and across the registers, publication types, and disciplines in the corpus. In
Chapter 7 I report the results of a series of correlational analyses to measure the
relationships between linguistic variation and perceived quality and style across registers,
publication types, and disciplines. I also use qualitative discourse analytic methods to
interpret the linguistic patterns in texts that vary in terms of perceived quality. I then draw
tentative conclusions regarding linguistic predictors of writing quality and style in
published academic prose.
Finally, in Chapter 8 I summarize the results of the empirical analyses carried out
in Chapters 5-7. I also highlight some of the advantages of the methods used in this study,
discuss implications of the study for several groups of people, and explain the limitations
of the study and areas for future research.

5
CHAPTER 2. REVIEW OF THE LITERATURE ON ACADEMIC WRITING
AND WRITING QUALITY

2.1. Introduction

The purpose of this chapter is to review previous research in two major areas. The
first area of research is linguistic variation in published academic writing. After
introducing the importance of register and sub-register as predictors of linguistic
variation, I introduce important situational variables of registers, focusing mostly on the
variables of academic discipline and publication type.
The literature review in Section 2.3 focuses on issues and research related to
stylistic variation and writing quality. I begin by presenting a brief overview of major
approaches to research on stylistic variation. I then review recent literature on writing
quality.
I conclude this chapter with a discussion of the need for additional research
studies that account for statistical interactions among factors related to linguistic
variability in published academic writing. I also establish the need for research that (a)
investigates reader perceptions of writing quality and (b) correlates reader perceptions
with linguistic and stylistic variables.

2.2. Publication Type and Discipline Variation in Academic Writing

The goal of this section is to present a review of the literature on published


academic writing, with an emphasis on linguistic variation across registers. Section 2.2.1
focuses on research that has established register as an important predictor of language
use. Section 2.2.2 presents research focused on linguistic variation within the macro-
register of academic writing. Section 2.2.3 reviews the body of research that has
investigated language use across various registers of academic writing, focusing on the
three publication types included in this study: journal articles, university textbooks, and
popular academic writing. Finally, in Section 2.2.4 I present a sample of research that has
looked at statistical interactions in academic language.

2.2.1. Register as a predictor of linguistic variation

Corpus linguistic research has established register as a key predictor of linguistic


variation (see, e.g., Atkinson and Biber, 1994; Biber & Conrad, 2009: Appendix A).
However, Biber (2012) recently noted that register differences are often disregarded by
authors of reference works and linguistic research on grammatical and lexico-
grammatical patterns in English. These authors attempt to describe “general English,”
ignoring wide variation in patterns of English across registers, or situations of use.
Descriptions of “general English” are commonly based on a corpus designed to
represent the full range of speech and writing in English. Patterns of language use are
then identified within the entire corpus without reference to register-based variation
within the corpus. An analysis of nouns and verbs in Davies’ (2008-) Corpus of
Contemporary American English (COCA) will suffice to illustrate some of the problems
that result from ignoring register variation. Taken as a whole, COCA seems to show that
nouns are used more than verbs in “general English” (i.e., the average of Speech and
Academic Writing (see Table 2.1).

Table 2.1. Normed Counts of Nouns and Verbs in Speech and Academic Writing

“General” Speech Academic writing

Nouns 215.57 163.93 258.90


Verbs 181.20 207.88 148.24

However, when we compare the rates of occurrence for nouns and verbs in speech to
those in academic prose, we discover that although nouns are more common than verbs in
academic texts, the opposite is true in speech. A look at results from COCA across all of
the registers and sub-registers revealed that the “general” English findings reported above
do not accurately represent any single variety of English. These numbers are central
tendencies calculated based on the full range of variability in the corpus, and they fall
either well above or well below the rates of occurrence for each of the register and sub-
register categories in COCA. Two important points to notice here are that ignoring
register differences (a) causes us to miss important information about variability in the
use of nouns and verbs across registers and (b) gives us a misleading representation of
how these features are used in English.
Although many reference works and research studies ignore register variation, there
have been important advances in the area of register analysis in recent years. The most
comprehensive cross-register description of the English language is Biber et al.'s (1999)
Longman Grammar of Spoken and Written English (LGSWE). The LGSWE presents
descriptions of grammatical and lexico-grammatical features in English that are based on
the results of corpus research. For many of these features information is included about
their distributions across four major registers (conversation, news, fiction, and academic
prose of English. The register comparisons in the LGSWE bring to light important
variability in English use that is ignored by most other reference grammars.

2.2.2. Research on the register of academic writing

The register of academic writing has received a great deal of attention during the past
few decades. Research on academic writing has focused on a variety of individual
linguistic features. For example, research on lexical bundles, or frequent formulaic
sequences, has shown them to be less frequent in academic writing than in spoken
registers (e.g., Biber, Conrad, and Cortes, 2004; Hyland, 2012). Additionally, recent
research has refuted earlier claims about academic writing by showing it to be phrasally
compressed rather than clausally elaborated (Biber and Gray, 2010; Biber, Gray, and
Poonpon, 2011). There has also been an abundance of research revealing the prevalence
of markers of stance (Biber, 2006a) and evaluation (Hunston, 2011) in academic prose.
Analyses of individual linguistic features have revealed meaningful differences
between academic writing and other registers. However, some of the most important
recent findings about academic writing have been revealed through the use of Multi-

7
Dimensional (MD) analysis. MD analysis is a method developed by Biber (1988) in
which the researcher calculates the normed rates of occurrence for linguistic variables in
a corpus. Factor analysis is then used to reduce these variables down to a much smaller
set of functionally interpretable dimensions of linguistic variation (see Introduction to
Conrad and Biber, 2001). MD analyses have repeatedly shown academic writing to be
more informationally dense, explicit, and abstract than informal speech (see, e.g., Biber,
1988).

2.2.3. Publication type and discipline variation within academic writing

Linguistic descriptions of academic writing have focused on a number of potential


predictors of linguistic variation, including time, sub-section, publication type, and
discipline. Diachronic change in academic writing has received some attention in recent
years (see, e.g., Biber, 2004; Biber & Finegan, 1997; Biber, Egbert, Gray, Szmrecsanyi,
& Oppliger, to appear; Szmrecsanyi & Hinrichs, 2008). However, academic writing is not
considered from a diachronic perspective in this study.
Research on linguistic variation across sub-sections of academic texts has also
revealed important insights. For example, Conrad’s (1996b) dissertation research
revealed that the methods sections of ecology research articles tend to be much more
informational than the discussion, results, and introduction sections. Similarly, using MD
analysis Biber and Finegan (2001) showed important variability in dimension scores
across the major sections of medical research articles. While not the primary focus of the
present study, variation across sub-sections of journal articles is discussed briefly in
Chapter 7.
It is clear that register differences should be considered as potential predictors of
linguistic variation. However, this raises questions about the most appropriate level of
granularity, or specificity, in defining register categories. Gries (2006) addresses
questions related to granularity in defining register categories in a study on linguistic
variability within and between corpora. He concludes that the degree of granularity (i.e.,
register, sub-register, etc.) is closely related to the variability within a sample of texts. As
with any sample of data we expect the variance to be larger within a less homogenous
sample (e.g., register) and smaller within a more homogenous sample (e.g., sub-register).
One important finding from Gries’ study is that registers (or sub-registers) differ not only
in their rates of occurrence for a given feature but also in the degree of variability in the
use of that feature across the texts in the sample.
This supports the results of Biber’s (1988) Multi-Dimensional analysis which
revealed a wide range of variation within certain registers (e.g., academic prose) and a
much smaller amount of variability within other registers (e.g., official documents) (see,
e.g., p. 176). While these findings seemed surprising at the time, especially with regard
to the large range of variation in academic writing, subsequent empirical research during
the past two and a half decades has attributed much of this variability to the many
publication types and disciplines within the general register of academic writing. Of all
the situational parameters relevant to the definition of academic registers, publication
type (e.g., journal article, textbook) is the most important predictor of linguistic variation
in this study. However, discipline-based variation is also thoroughly investigated
throughout this study.

8
The next three sections include summary tables of basic information about previous
empirical linguistic studies of the three publication types of academic writing that are
investigated in this study: journal articles, university textbooks, and popular academic
books. These summary tables contain a comprehensive (although not exhaustive) survey
of research that is both relevant and relatively recent. While there is a substantial amount
of research that compares disciplinary variation, comparative research on publication
type variation is relatively limited. The vast majority of the studies included in the
surveys below include more than one discipline. However, many of them include only
one publication type. This, combined with the fact that publication type is the primary
variable of interest in this study, was the main reason for organizing the following
sections based on publication type rather than the discipline. In the few cases where the
same study included more than one of the publications types considered here (e.g., Grabe,
1984), the study appears in each relevant table.

2.2.3.1. Journal articles

Table 2.2 contains basic information about 33 empirical studies of academic


journal articles. Several of these studies focus primarily on reporting patterns in the use
of features related to stance and evaluation (Koutsantoni, 2006; McGrath & Kuteeva,
2012; Parkinson & Adendorff, 2005). One specific set of features that has received a
great deal of attention is hedges, which are linguistic devices used to qualify or otherwise
lessen the impact of a statement or claim. Hedging has been identified as a key
characteristic of professional science writing that is used by authors to present new
claims in a persuasive manner (Vasquez & Giner, 2009) without risking
overgeneralization or hasty conclusions (Hu & Cao, 2011; Hyland, 1996).
The use of nouns and noun phrases has recently emerged as an important
characteristic of academic journal articles. In recent work on noun phrase complexity in
journal articles, Biber & Gray (2010) and Gray (2011) have shown that linguistic features
associated with phrasal compression (e.g., pre-modifying nouns) are extremely common
in informationally dense written registers such as academic journal articles. Additionally,
Kooyalan and Mumford (2011) and Parkinson and Adendorff (2005) have investigated
patterns in the use of nominalizations in journal articles.
MD analysis has been a useful methodology for measuring variability within
journal articles and differences between journal articles and other registers. For example,
Grabe (1984) and Conrad (1996b) both used MD analysis and found that journal articles
are extremely informationally focused in relation to other registers. Gray (2011) built on
this previous work by using MD analysis to investigate variation across journal articles
from a variety of academic disciplines. She showed that the natural sciences tend to
contain more linguistic features associated with dense information packaging than the
social sciences and humanities.
Other areas of fruitful research in the literature on professional academic writing
include personal pronoun use (Harwood, 2005; Hyland, 2001; Kuo, 1999; Martinez,
2005), and vocabulary features (Biber et al., 2002; Hyland & Tse, 2007; Martinez et al.,
2009; Vongpumivitch et al., 2009).

9
Table 2.2. Overview of previous corpus-based studies on journal articles, with lists
of the disciplines and linguistic features included.

Study Disciplines Linguistic Features


Biber, Csomay, various vocabulary-based discourse units
Jones & Keck
(2002)
Biber & Gray biology, medicine, ecology, grammatical features of complexity
(2010) physiology, education,
psychology, history
Conrad (1996b) ecology various grammatical and lexico-grammatical
features (MD analysis)
Dahl (2008) economics, linguistics new knowledge claims
Diani (2008) linguistics, history, emphasizers
economics
Grabe (1984) various various grammatical and lexico-grammatical
features (MD analysis)
Gray (2011) applied linguistics, biology, various grammatical and lexico-grammatical
history, philosophy, physics, features
political science
Harwood (2005) business management, 1st person pronouns
computer science,
economics, physics
Hewings & business anticipatory it, extraposed subjects
Hewings (2002)
Hu & Cao (2011) applied linguistics hedges, boosters
Hyland & Tse sciences, engineering, social academic vocabulary (AWL)
(2007) sciences
Hyland (1996) molecular biology hedges
Hyland (2001) philosophy, sociology, 2nd person pronouns, interjections, inclusive
applied linguistics, physics, pronouns, questions, directives
electrical engineering,
marketing, mechanical
engineering, biology
Hyland (2008) electrical engineering, lexical bundles
biology, business, applied
linguistics
Koutsantoni electrical engineering stance features
(2006)
Kooyalan & humanities, social sciences nominalization, subordination, non-finite
Mumford (2011) clauses, prepositional phrases, nominal pre-
modification
Kuo (1999) sciences personal pronouns
Lin & Evans various article structure
(2012)
Marco (2000) medicine collocational frameworks

10
Martinez (2005) biology 1st person pronouns
Martinez, Beck & agriculture vocabulary
Panza (2009)
McGrath & mathematics stance, engagement
Kuteeva (2012)
Murillo (2012) business reformulation markers
Oliveira & agrarian sciences, biology, reporting verb functions
Pagano (2006) engineering, health
sciences, human sciences,
linguistics, social sciences
Parkinson & various various organizational and discourse-level
Adendorff (2004) features
Parkinson & natural sciences passivization, nominalization, evaluation,
Adendorff (2005) hedging
Peacock (2011) biology, chemistry, physics, introductory it
environmental science,
business, language and
linguistics, law, public and
social administration
Varttala (1999) medicine hedging
Vazquez & Giner marketing, biology, hedges, boosters
(2009) mechanical engineering
Vogel (2010) theoretical physics lexical cohesion
Vongpumivitch, applied linguistics vocabulary (AWL)
Huang & Chang
(2009)
Warchal (2010) linguistics conditional clauses

2.2.3.2. University textbooks

Table 2.3 displays basic information about 21 corpus-based studies of the language of
university textbooks. Several of these linguistic descriptions have used MD analysis as
the major methodology. In a large-scale study of linguistic variability across university
registers, Carkin (2001) used MD analysis to compare the linguistic characteristics of
biology and macroeconomics textbooks with classroom teaching in the same disciplines.
Biber et al. (2002) described patterns of language use in university textbooks across a
large number of disciplines. Both of these studies included spoken registers, and many of
the differences they found relate to key differences between spoken and written academic
language. Grabe (1984), on the other hand, used MD analysis to compare university
textbooks to two other academic publication types: journal articles and popular science
periodicals. MD analysis has also been used to identify meaningful differences between
pedagogical and professional academic prose. Conrad (1996a) showed that university
textbooks are generally less informationally dense and more overt in their use of
argumentation than published journal articles. In a more recent MD analysis, Egbert
(2013) investigated stylistic variation within the writing of individual authors and across
multiple authors in two disciplines of introductory university textbooks.

11
Table 2.3. Overview of previous corpus-based studies on university textbooks, with
lists of the disciplines and linguistic features included.

Study Disciplines Linguistic Features


Bertilson & art history, economics, pronouns, examples, images
Fierke (1980) English, history, mathematics,
music, physics, political
science, psychology, sociology
Biber et al. business, education, various grammatical and lexico-grammatical
(2002) engineering, humanities, features (MD analysis)
natural science, social science
Biber, Conrad business, education, lexical bundles
& Cortes engineering, humanities,
(2004) natural science, social science
Byrd (1997) U.S. history names
Carkin (2001) biology, macroeconomics various grammatical and lexico-grammatical
features (MD analysis)
Chen (2008) electrical engineering lexical bundles
Cline (1972) various Dale-Chall Readability Formula
(vocabulary, sentence length)
Conrad (1996a) ecology, history various grammatical and lexico-grammatical
features (MD analysis)
Egbert (2013) psychology, geology various grammatical and lexico-grammatical
features (MD analysis)
Grabe (1984) various various grammatical and lexico-grammatical
features (MD analysis)
Hsu (2011) business vocabulary
Hyland (1994) various hedging
Hyland (1999) microbiology, marketing, metadiscourse
applied linguistics
Kurzman anthropology, Black history, Nelson Denny scores (polysyllabic words)
(1974) economics, history, political
science, sociology
Love (1991) geology various discourse features
Love (1993) geology various discourse features
Miller (2011) various sentence length, word length, vocabulary
(AWL), nominalizations, noun modifiers
Parkinson & various various organizational and discourse-level
Adendorff features
(2004)
Parkinson & natural sciences passivization, nominalization, evaluation,
Adendorff hedging
(2005)
Pride (1975) biology, English, history Dale-Chall Readability Formula
(vocabulary, sentence length)
Tadros (1989) law various discourse features
12
Researchers have also investigated a variety of organizational and discourse-level
features in university textbooks, including examples and images (Bertilson, 1980),
metadiscourse (Hyland, 1999), discourse structure (Love, 1991; 1993), and enumeration
(Tadros, 1989).

2.2.3.3. Popular academic writing

Table 2.4 contains information on the disciplines and linguistic features included in 16
studies of popular academic writing.

Table 2.4. Overview of previous corpus-based studies on popular academic writing,


with lists of the disciplines and linguistic features included.

Study Disciplines Linguistic Features


Adams-Smith medicine various lexical, syntactic and organization
(1987) features
Biber & Gray various Nouns, nominalizations, relative clauses,
(2013) noun + of, noun + noun
Fahnestock natural sciences figures of speech, metaphor, antithesis
(2004)
Hyland (2010) various organization, stance, pronouns
Kapon (2013) physics various discourse-level features
Kidd (1988) natural sciences various lexical, readability and discourse-
level features
Kranich (2011) various hedging
Lischinsky business management examples
(2008a)
Lischinsky business management various rhetorical and discourse-level features
(2008b)
Myers (2003) various various discourse-level features
Oliveira & agrarian sciences, biology, reporting verb functions
Pagano (2006) engineering, health sciences,
human sciences, linguistics,
social sciences
Parkinson & various various organizational and discourse-level
Adendorff features
(2004)
Parkinson & natural sciences passivization, nominalization, evaluation,
Adendorff hedging
(2005)
Varttala (1999) medicine hedging
Varttala (2001) business, technology, hedging
medicine
Vogel (2010) theoretical physics lexical cohesion

13
Compared with university textbooks and journal articles, there is a relatively small body
of empirical research on popular academic writing. The research that has been done has
typically focused on the natural sciences rather than the social sciences. For the purposes
of this study, popular academic writing will be defined as publications written for a
general, non-specialist audience on scientific or academic topics.
Important recent work has looked at various phrasal characteristics (e.g., nouns,
nominalizations, noun + noun sequences) in non-specialist (multi-disciplinary) journals
(Biber & Gray, 2013). There has also been some research on the organizational
characteristics (Adams-Smith, 1987; Hyland, 2010; Parkinson & Adendorff, 2005) and
the rhetorical features (Fahnestock, 2004; Lischinsky, 2008a; 2008b) of popular academic
writing. Additional research has looked at the use of hedging in popular academic writing
(Kranich, 2011; Parkinson & Adendorff, 2004; Varttala, 1999; 2001). Finally, there have
also been studies focused on features of stance and evaluation in the literature on popular
academic writing (Hyland, 2010; Parkinson & Adendorff, 2005).
Despite the body of research on register and discipline variation, there is still a great
deal we do not yet understand about registers and disciplines of academic writing. On
this note, Biber (2006b:227) calls for additional studies that investigate “particular
university registers at a much more specified level,” and Hyland (2008:20) suggests that
more cross-discipline comparisons are needed to create “a fuller picture of community-
specific practices”.

2.2.4. Interaction effects in academic language

In addition to the need for further investigations of the independent effects of


variables such as publication type and discipline on linguistic variability within registers
of academic language, more research is needed to investigate statistical interactions
between these and other factors. In language research, an interaction exists between two
factors when the linguistic patterns in a particular level of one factor (e.g., publication
type) are determined by the level of another factor (e.g., discipline). In other words, the
two factors are not independent of each other or they are non-additive.
There is some research that has shown the existence of interacting factors in
descriptive language research. For example, Biber, Egbert, Gray, Szmrecsanyi, and
Oppliger (to appear) identified a strong register (news, personal letters, science articles)
by time (1650-1999) interaction in the use of genitives and pre-modifying nouns in the
ARCHER corpus. Likewise, Biber and Gray (2013) showed a sub-register (specialist
science, specialist social science, multidisciplinary science, humanities) by time (1965,
1985, 2005) interaction in the use of nouns and noun + noun constructions. Biber, Gray,
and Staples (under review) found that, for many linguistic variables associated with
grammatical complexity, mode (spoken, written) interacted significantly with task
(independent, integrated) in TOEFL iBT responses. Csomay (2007) identified statistical
interactions between discipline (business, education, engineering, humanities, natural
sciences, social sciences) and speaker (teacher, student) as well as discipline and level of
instruction (lower division, upper division, graduate). Finally, using a research approach
similar to the one adopted in the present study, Conrad (1996a) applied Biber’s (1988)
MD analysis framework and reported significant interaction effects between discipline
(ecology, history) and publication type (research articles, textbooks) on two of the five

14
dimensions. However, interaction between factors was not the focus of Conrad’s (1996a)
study. Therefore, her interpretations were mostly focused on main effect differences.
This small handful of studies shows that factors such as time, publication type,
mode, and discipline can interact in meaningful ways. However, the possibility of
interaction is often overlooked in analyses of corpora that contain multiple factors. In
studies where statistical tests are used to measure interaction, inappropriate subsequent
analyses are sometimes carried out (e.g., testing for main effects instead of simple effects
in the presence of an interaction), thus affecting the interpretations of the interaction
effects. Finally, in studies where these analyses are appropriately carried out, the
qualitative interpretation of interaction effects and corresponding main effects or simple
effects often lacks the detail and depth necessary for a complete understanding of the
relationships between the interacting factors.

2.3. Stylistics and Writing Quality

This section presents a review of the literature on author style and writing quality.
Section 2.3.1 begins with a broad overview of quantitative approaches to stylistics
research, and Section 2.3.2 reviews the literature on stylistic perception. Section 2.3.3
contains a review of efforts to develop and apply objective measures of writing quality.
Section 2.3.4 focuses on previous studies that have considered reader perceptions of
writing quality, and Section 2.3.5 concludes with a review of the few studies that have
attempted to measure the relationships between linguistic variation and reader
perceptions of writing style.

2.3.1. Previous quantitative approaches to measuring author style

The definition of linguistic style adopted in this study comes from Biber &
Conrad (2009): “linguistic patterns associated with styles are not functional. Rather these
are features associated with aesthetic preferences, influenced by the attitudes of the
speaker/writer about language. That is, [an author] has attitudes about what constitutes
‘good style’ resulting in the manipulation of language for aesthetic purposes” (p. 18).
Although quantitative investigations of stylistic variation, or studies in ‘statistical
stylistics’ are increasing in popularity, a great deal of research on author style is
performed using qualitative methods. It is also the case that many statistical stylistics
studies, especially those that apply multivariate statistics, are focused on issues of
authorship identification and natural language processing rather than linguistic variation
(Tuldava, 2004). Additionally, quantitative stylistic analyses of linguistic variation often
focus on the language of literature, typically investigating the use of only a small set of
linguistic features by just one author (see discussions in Biber, 2011; Egbert, 2012).
These studies have played an important role in identifying and quantifying patterns of
stylistic variation. However, most statistical stylistics studies use the author of a text as
their only independent variable, and their findings are solely based on the interpretations
of the researcher, which often lack depth and substance (Sandell, 1977).

15
2.3.2. Previous approaches to stylistic perception

There are only a few studies that report reader perceptions of stylistic or linguistic
variation in published academic prose. However, the goal of assessing subjective
judgments of text characteristics has been discussed for several decades. As far back as
the 1950s, there have been debates regarding the utility of two traditionally opposing
frameworks for stylistic analysis: subjective and objective. Originally published in 1959,
Riffaterre (1967) argues that subjective perceptions of an author’s writing are the only
real way of studying its style. He believed that quantitative linguistic norms are both
“unobtainable” and “irrelevant,” and that the concept of style is only realized subjectively
by an individual decoder within a given context (p. 425). Milic (1967) echoes these ideas
by criticizing any attempt to create stylistic taxonomies, claiming that writing style is
entirely idiosyncratic and thus cannot be systematically categorized. In contrast, others
maintain the reality of ‘group styles’, which can be identified through the investigation of
linguistic variation across texts (Hendricks, 1976; DiMarco & Hirst, 1993). A large body
of more recent linguistic research supports the latter position by identifying linguistic
norms that are both obtainable and relevant.
However, Riffaterre’s (1967) claims are intriguing because of his passionate focus
on the perceptions of intended readers, or “the consciously selected target of the author”
(p. 419). Hirst (2005) highlights the need for more emphasis on the reader in his
discussion of the three major traditional approaches to computational linguistics:
objective text meaning, authorial intent, and subjective text meaning. He proposes that
“computational linguistics needs to move away from the solely objective in-text view of
text-meaning…and reclaim both the subjective in-reader and authorial in-writer views
(2007, p. 8; see also Hirst, 2008). One relatively recent attempt at this was a small-scale
study of reader perceptions by Morris (2009), which measured the perceived lexical
cohesion and lexical semantic relations in Reader’s Digest articles, discovering
variability that supports the aforementioned need to return to a “view of text meaning that
includes text, reader, and writer” (p. 148). The main challenge associated with including
subjective perceptions of texts is to take an entirely subjective reaction to writing style
and transform it into “an objective analytic tool,” in order to “transform value judgments
into judgments of existence” by identifying underlying patterns (Riffaterre, 1967, p. 419).
Several decades ago, Crystal (1972) emphasized the value of subjective measures
of style, including those that assess the “intuition of the lay language-user,” and argued
for the development of “much more refined statistical and data analysis” in order to
“establish the generalizability of our stylistic intuitions” (p. 110). He boldly calls on
stylisticians “to face up to the necessity of devising techniques for coping with evaluative
criteria and relating these to our own, more familiar, linguistic ones. And such techniques
do not exist” (p. 106).
Sandell (1977) overviews a small handful of early attempts to fill this gap. The
most notable is Carroll’s (1960) innovative use of factor analysis to study both objective
and subjective variables of texts (see section 3.2.2). Carroll set out to measure prose style
using variables of two types, objectively quantified linguistic variables and subjective
perceptions from a handful of ‘expert judges’. The results of Carroll’s factor analysis
revealed clear underlying dimensions of variation in prose style. An equally important
finding was his discovery that subjective variables correlated more strongly with other

16
subjective variables than they did with objective linguistic variables. Although the results
of Carroll’s study may seem unsurprising at first, they present two important findings.
First, underlying dimensions of stylistic variables exist and can be reliably measured; and
second, subjective reader perceptions and objective linguistic variables represent separate
yet related constructs. In essence, Carroll showed that there is value in both objective
text-meaning and subjective text-meaning approaches to measuring prose style. In other
words, it seems that the meaning of a text is both in the text and in the reader.
One of the primary goals of this dissertation will be to gain a better understanding
of underlying dimensions among two sets of variables: (a) the objective linguistic
characteristics of a text; and (b) the subjective perceptions of readers. This will enable us
to then investigate existing correlations between the linguistics of a text and the
subjective reactions and attitudes of readers.

2.3.3. Previous research on objective measures of writing quality

In order to better understand writing quality, this review of literature surveys


previous research that has attempted to measure this construct using objective measures.
In recent decades, two of the major approaches to measuring writing quality have been
(1) cohesion and coherence and (2) readability. After a brief overview of the theoretical
underpinnings of these two constructs, I present a summary table of eighteen studies that
have (a) empirically investigated writing quality from these approaches, and (b) included
some measure of reader perceptions. Finally, the results of these studies will be
synthesized and applied to the goals of this dissertation study.
The first area of writing quality that has been researched extensively is the
relationship between textual cohesion and coherence. Although cohesion and coherence
have been discussed for many decades, a resurgence of interest has followed the work of
Halliday & Hasan (1976) in Systemic Functional Linguistics (SFL), which emphasizes
the important roles of these two textual properties. Within SFL, cohesion “occurs when
the interpretation of some element in the discourse is dependent on that of another”
(Halliday & Hasan, 1976, p. 4). Coherence is related to the connection between discourse
structure and meaning (Schiffrin, 1987). A great deal of the early work in this area has
been theoretical in nature, with few efforts to empirically validate any of the claims (see
Mannes & Kintsch, 1987). Reinhart (1980) and Johns (1986), for example, propose
lengthy lists of linguistic features that characterize coherent writing. Among these
characteristics is the property of cohesion, or what Reinhart terms ‘linear connectedness’
(1980, p. 167). Textual cohesion and coherence are also the foundation of the Rhetorical
Structure Theory of discourse organization (see Mann & Thompson (1988).
Another strand of research in this area comprises objective, empirical studies on
variables related to cohesion and coherence. For example, Witte & Faigley (1981) found
a positive relationship between subjective ratings of text quality and lexical, reference,
and conjunctive cohesive ties. In contrast, McNamara et al. (2009) found no statistically
significant relationship between text quality and coreference or connectives, two types of
cohesion measured by Coh-Metrix program. Coh-Metrix is a tool designed to
automatically compute textual coherence and readability using measures based on theory
and findings from computational linguistics and psycholinguistics. In an earlier study,
McNamara et al. (1996) used variables that were eventually integrated into the Coh-

17
Metrix tool to show that readers with less background knowledge benefit from frequent
signals of coreference, whereas the opposite is true for readers with more background
knowledge.
Additional research has used Coh-Metrix as a new measure of text readability.
However, in order to understand the variables measured by Coh-Metrix it will be useful
to briefly review the history of text readability. As early as the late nineteenth century,
researchers and educators have attempted to develop objective measures of text
readability. These resulting ‘readability formulas’ are still popular and frequently include
variables such as word difficulty, word frequency, word length, sentence length, or a
combination of these. Arguably the most widely used readability formula is the Flesch
Reading Ease Score, which accounts for words per sentence and syllables per word, and
standardizes the score to a scale of 0 – 100 (see Flesch, 1948). While these formulas have
been shown to be highly reliable and, in a few cases, strong predictors of actual reading
comprehension, empirical research has ultimately shown them to be poor measures of
text readability (see Bruce, Rubin, & Starr, 1981). As enticing as it is to find a simple
measure of text readability, the reality is that readability is a complex, multifaceted
construct (Bailin & Grafstein, 2001; Benjamin, 2012). This is sufficiently demonstrated
by a number of studies in Table 2.5 (Leroy, Helmreich, & Cowie, 2010; Fulcher, 1997;
Pitcher & Fang, 2007; Davison & Cantor, 1982). However, in each of these studies, the
authors propose other measures of text readability that are better predictors of perceived
readability or reading comprehension. The most widely used alternative measure of text
readability in recent years has been the Coh-Metrix tool introduced above (Green, Unaldi,
& Weir, 2010; Swenson, 2008; Crossley, Greenfield, & McNamara, 2008; Crossley,
Allen, & McNamara, 2011). However, not all of the Coh-Metrix measures are reliable
predictors of readability, and there seems to be a consensus that the three best Coh-
Metrix predictors of readability are lexical coreferentiality, sentence similarity, and word
frequency. These studies, along with the other empirical studies mentioned above, are
summarized in Table 2.5 below.

18
Table 2.5. Empirical studies on perceived writing quality

Study Texts Participants Subjective Text-linguistic Results


measures variables

Hidi & Baird Three versions 44 Grade 4 students The students were The authors wrote three The base texts and salient
(1988) of a short text and 66 Grade 6 assessed using versions of one text. texts were recalled better
about the lives students written free recalls The ‘base text’ was than the resolution texts
of great immediately after written to be coherent by both age groups, but
inventors reading the text and interesting. The not significantly so.
and after one ‘salient text’ had However, the salient and
week. They were additional elaborations. resolution texts were rated
also asked to rate The ‘resolution text’ as more interesting than
which texts were induced surprise. the base text.
the most
interesting.

Kelly, Knight, News texts 117 undergraduate Seven 7-point ‘straight-news’ vs. Narrative style was
Peck, & Reel (environmental students semantic ‘narrative news’ perceived as ‘less
(2003) and crime) differential scales interesting’, but more
(categories = ‘clear’, ‘informative’,
interest, ‘accurate’, and
informativeness, ’believable’
credibility)

Hilton, Motes, & CPA Review 180 upper-division Thirty-one 5-point The ad was For each of the four style
Fielden (1989) advertisement accounting majors Likert scale items manipulated based on constructs (personal,
(categories = personal features forceful, colorful, and
personal, forceful, (personal pronouns, direct), there was a
colorful, direct) personal names, statistically significant
questions), forceful ‘writer/reader perceptual
features (imperatives, match’.
SVO sentences,
subordinators), colorful
features (figures of
Table 2.5. Empirical studies on writing quality

Study Texts Participants Subjective Text-linguistic Results


measures variables

speech,
metaphor/simile,
adjectives, adverbs),
and direct features
(action first vs. last)

Witte & Faigley 10 freshman Two independent 4-point holistic -number of errors Higher rated essays are
(1981) essays, 5 with raters scale -essay length longer, and have larger T-
highest scores -T-unit length units and fewer errors.
and 5 with -restrictive modifiers High rated essays had
lowest scores -no. of cohesive ties more cohesive ties overall
(out of 90) -types of cohesive ties and per T-unit. High rated
essays also had more
immediate and mediated
cohesive ties and more
reference, conjunctive,
and lexical cohesion. On
the other hand, low rated
essays had more mediated
and mediated-remote
cohesive ties.

McNamara, Science 56 students entering Background Linguistic coherence Readers with little
Kintsch, Songer, encyclopedia Grades 7 – 10 questionnaire, signals at the local and background knowledge
& Kintsch entry forty-one short global levels. benefit from more text
(1996) answer -pronouns coherence signals, but
comprehension -sentence connectives high-knowledge readers
questions and a -topic headers benefitted more from
concept card fewer.
sorting task.

20
Table 2.5. Empirical studies on writing quality

Study Texts Participants Subjective Text-linguistic Results


measures variables

McNamara, 120 essays Five writing tutors Standardized Five classes of The two measures of
Crossley, & from holistic writing variables from Coh- cohesion (coreference and
McCarthy (2009) undergraduate rubric (1 – 6) Metrix: connectives) did not
students -word features (n = 14) distinguish between high
-syntactic complexity and low quality essays.
(n = 8) However, syntactic
-coreference (n = 13) complexity, lexical
-connectives (n = 13) diversity, and word
-lexical diversity frequency were strong
(n = 5) predictors of essay quality.

Leroy, 16 sentences Eighty-six Participants Four specific sentence Classic readability


Helmreich, & from online undergraduate and labelled the text structures: measures cannot explain
Cowie (2010) medical graduate students they perceived to -passive voice any of the results. There
documents be the most -extraposed subjects were very strong effects of
administered in difficult for them -complex noun phrases noun phrase complexity
sets of four and also for others. -function words (as measured by head
noun frequency and
number of NP
constituents) where simper
noun phrases were
recognized as simpler.
Some sentences with more
function words were
perceived as easier.

Britton, Gulgoz, Two versions Thirty Participants read Any features that the The focus of this study
& Glynn (1993) of 20 undergraduate each pair of rewriters thought was not the particular
university students textbook passages would improve the linguistic features that
textbook (original and learnability (knowledge improve learnability.

21
Table 2.5. Empirical studies on writing quality

Study Texts Participants Subjective Text-linguistic Results


measures variables

passages rewritten) and retention after 24 Rather, the researchers


(original and indicated which hours) of the text. were interested in whether
rewritten) they would student perceptions of
remember better in learnability are congruent
24 hours. with actual measured text
learnability. In 19/20
(95%) of the cases, the
students’ perceptions of
learnability accurately
reflected a text’s actual
learnability.

Fulcher (1997) Texts from Five ‘experts’ in the Participants were Flesch Reading Ease The correlation between
eight Overseas field of reading and asked to Scores: predicted rank order based
Development writing individually rank -syllables per word on Flesch scores and the
Administration the texts in order -words per sentence agreed ranks was .31. The
publications of perceived Expert judge criteria: judges differed radically
difficulty, agree on -linguistic structure in the criteria they used to
sequence of -contextual structure judge text difficulty, but
difficulty, and -conceptual structure they all agreed that
agree on the -reader-writer sentence length was a poor
reasons for this relationship (pronouns, measure of readability.
order. tense, and voice)

Green, Unaldi, & 42 textbook Two expert judges Judges rated the Web VocabProfile: The expert judges’ ratings
Weir (2010) passages; 42 (PhD in applied texts for degree of -characters per word ranged from 33% to 52%
IELTS reading linguistics) subject specificity, -lexical density agreement, and their
test passages rhetorical -word frequency classifications ranged
organization and -type-token ratio from 80% to 85%
cultural references. Coh-Metrix: agreement. IELTS

22
Table 2.5. Empirical studies on writing quality

Study Texts Participants Subjective Text-linguistic Results


measures variables

-words/sentence passages did not differ


-sentences/paragraph significantly from
-NP length introductory
-modifiers per NP undergraduate textbook
-words before main vb passages for most features.
-logical operators IELTS passages did use
-high-level constituents fewer academic words and
-anaphor reference infrequent words.
-argument overlap
-content word overlap
-sentence similarity
-hypernym value-nouns
Flesch Reading Ease

Swenson (2008) Six short Sixty graduate -Nelson Denny Coh-Metrix: Together, LSA and Flesch
stories about students Reading Test for -LSA scores accounted for 15%
historical comprehension -causal cohesion of the variance in
figures and reading speed Flesch Reading Ease comprehension and 10%
of the variance in speed.
Causal cohesion was not a
significant predictor of
readability.

Pitcher & Fang Twenty sample The authors Text quality was -reading recovery level Reading recovery levels
(2007) texts from examined by rating -text length could not be reliably
levelled story structure, -high frequency words measured by any of the
readers across story endings, -Flesch-Kincaid level objective linguistic or
4 levels rhythmic language, -Fry readability level subjective quality
and naturalness. variables. However, this is
more likely a function of

23
Table 2.5. Empirical studies on writing quality

Study Texts Participants Subjective Text-linguistic Results


measures variables

the levelling system than


the variables.

Davison & Four original The authors The differences -Fry readability level The authors conclude that
Cantor (1982) adult texts and between the two -Dale-Chall scale the features considered by
their versions of each these two formulas are
‘simplified’ text were overly simplistic. They
counterparts catalogued. support this by showing
(designed for that sentence length does
grades 8-10) not always contribute to
complexity. They also
discuss the many aspects
of a text that are not
measured by these scales.

Heilman, -Corpus 1 - Grade school Grade school NLP language Language modelling
Collins- 362 online teachers and teachers assigned modelling approach - predicted readability level
Thompson, texts, each textbook authors Corpus 1 texts into builds a prediction more accurately than the
Callan, & rated for grade L1 levels (1-12); model based on a grammatical features, but
Eskenazi (2007) level textbook authors sample of ‘training the grammar features did
-Corpus 2 - assigned Corpus 2 texts’ with known add additional predictive
ESL reading texts into L2 levels categories power. Grammar appears
textbook (2-5) Grammatical features: to play an important role
passages from -passive voice in L2 readability than L1.
across levels -past participle
-perfect tense
-relative clauses
-continuous tenses
-modals

24
Table 2.5. Empirical studies on writing quality

Study Texts Participants Subjective Text-linguistic Results


measures variables

Stubbs (2001) Technical Sixty US Navy -Multiple choice -word frequency The edited texts were
manual text in sailors comprehension -word ambiguity modified according to
two versions: test -metaphor features identified in
unedited and -Likert scale -sentence complexity previous research as
edited for reading attitude -inferences having an impact on
clarity measure -syntax consistency readability. No
-syntax complexity statistically significant
-clause order differences were found
-passive voice between the readers of the
-negation edited and unedited texts.
-relative clauses It was hypothesized that
-+/- that complements this was due to the overall
-genitives easiness of the unedited
-pronoun use version to begin with.
-paragraph topics
-cognitive load

Crossley, Two corpora of Textbook authors Authors and Seven Coh-Metrix Authentic texts use
McCarthy, beginning ESL and editors editors modified feature categories: significantly more causal
Louwerse, & textbooks: authentic texts into -causal cohesion verbs, stem overlap, noun
McNamara simplified and ‘simplified’ the -connective and logical overlap, argument overlap,
(2007) authentic texts operators conditional constructions,
-coreference measures and low frequency parts of
-density of POS speech. On the other hand,
-polysemy and simplified texts had
hypernymy significantly more
-syntactic complexity semantic similarity
-word frequency between sentences, nouns,
syntactic complexity,
familiar words and

25
Table 2.5. Empirical studies on writing quality

Study Texts Participants Subjective Text-linguistic Results


measures variables

frequent words. The


authors state that the
greater syntactic
complexity in the
simplified texts is likely
due to their heavy
dependence on noun
phrases to form meaning.

Crossley, 32 academic 200 Japanese EFL Cloze tests to Three Coh-Metrix The combination of the
Greenfield, & texts selected learners measure reading measures: three Coh-Metrix
McNamara based on the comprehension -lexical coreferentiality variables in a multiple
(2008) Dale-Chall -sentence similarity regression model resulted
scale -word frequency in an R2 of 0.86. It was
determined that the Coh-
Metrix measures were
much more accurate
predictors of text
readability than the
Flesch, Dale-Chall,
Bormuth, or Miyazaki
readability formulas.

Crossley, Allen, 300 news texts Small team of Texts were Three Coh-Metrix Again, the combination of
& McNamara that were also authors simplified into measures: the three Coh-Metrix
(2011) simplified into three levels. -lexical coreferentiality measures accounted for
three levels of Idioms, passives, -sentence similarity much more variance than
100 texts each and phrasal verbs -word frequency the Flesh Reading Ease
were removed. Scores.

26
Although the results of the eighteen studies in Table 2.5 represent only a sample
of the research on text quality, these studies are particularly relevant to the dissertation
research I have proposed because they each measured reader perceptions. Two important
patterns emerged from the studies reviewed above. First, the findings from research on
cohesion and coherence have not always agreed, but there seems to be growing support
for the positive relationship between measures of textual cohesion (e.g., sentence
similarity; lexical coreferentiality) and the variables of reading comprehension and
subjective readability measures (Crossley et al., 2008; Crossley et al., 2011). Second, the
research has shown repeatedly and quite conclusively that simple ‘readability formulas’
are inadequate measures of text readability and comprehension. However, more recent
studies have introduced a handful of linguistic variables that may be better measures of
text readability and quality. Most of these features are included in the linguistic analyses
presented in Chapters 5 and 7.
In conclusion, the empirical studies that have investigated perceived text quality
(see Table 2.5) have demonstrated the complexity and multidimensionality of this
construct. Most of these studies have at some point suggested the need to consider
variables of individual reader differences. The next section will present a summary of
research that has included perceptual measures of writing quality and style.

2.3.4. Measuring reader perceptions of writing quality

Although there has been a great deal of interest in the topic of writing style, most
of this research has focused on the measurement of textual variables. Exceptions to this
include several of the studies included in Table 2.5. One relatively common approach to
text quality is the construct of interestingness. In the 1980s Hidi and Baird began
proposing interestingness as an important affective component of reading (Hidi & Baird,
1986; Hidi, 2001). Empirical research in this area seems to suggest that more interesting
texts and text segments facilitate better reading comprehension and recall (Hidi & Baird,
1988). For example, Schraw et al. (1995) proposed the following six potential
characteristics of interesting texts: comprehensibility, cohesion, vividness, reader
engagement, evocative emotional reactions, and prior knowledge. However, not all of
these features were consistently related to perceived interest. This seems to be a trend
with much of the research in the area of interest. Many of the proposed predictors of text
interestingness are either unsupported by empirical evidence or plagued by radically
inconsistent results in the literature. Another example of this is the results of Kelly et al.
(2003), which showed that narrative news texts were less interesting and more
informative and credible than ‘straight-news’ texts. Schraw et al. (1995) attribute the
inconsistency of the research in this area, at least in part, to the interaction between text
characteristics and readers’ individual differences.
Other measures of reader perceptions have included measures of perceived text
difficulty. For example, Leroy, Helmreich, and Cowie (2010) and Fulcher (1997) both
measured relationships between perceived text readability and traditional readability
measures. In both studies, the researchers failed to identify meaningful relationships
between perceived text difficulty and objective measures of text readability, thus raising
questions about the validity of these measures.

27
Other research studies have included reader perceptions in order to measure text
memorability and the match between the perceptions of the reader and the writer. In a
small-scale study, Britton, Gulgoz, and Glynn (1993) found that student perceptions of
text memorability accurately reflected how well participants learned the material in 95%
of the cases. In another study, Hilton, Motes, and Fielden (1989) revealed a strong
relationship between the text perceptions of the readers and the writers of the texts.
Although this handful of studies covers a wide range of measured perceptual
parameters, there are at least two important insights that can be drawn from these studies.
First, based on the quantitative reports from the studies in this survey, reader perceptions
seem to be a reliable measure of writing quality. Second, it is quite clear that the
perceptions of readers are an extremely important consideration when assessing the
quality of written texts. Finally, these studies suggest that there is a wide range of
parameters that comprise reader perceptions, including interestingness, difficulty, and
memorability. Despite the limited pool of previous research studies on reader perceptions
of writing quality, the data from these studies strongly supports the inclusion of
perceptual variables in studies of writing quality. The next section proposes a dual
approach to measuring writing quality and style that triangulates objectively measured
linguistic features and reader perceptions.

2.3.5. Triangulating reader perceptions and text-linguistics

Methodological triangulation has been used for decades by social scientists as a


means of explaining behavior by studying it from two or more perspectives (Cohen &
Manion, 2000, p. 254). One example of methodological triangulation in the field of
corpus linguistics is the relatively common practice of combining quantitative and
qualitative methods in an analysis in order to better understand linguistic variation.
Although corpus linguists typically have extensive knowledge of the discourses they
study, they are often outside of the target audience of the texts in their corpora. This
outsider status limits their ability to determine and interpret the impact that linguistic and
stylistic characteristics have on readers within the target audience of the texts under
study. One method of overcoming this limitation is to triangulate reader perceptions with
the results of text-linguistic research.
The previous section contains an overview of the few research studies that have
endeavored to measure relationships between text-linguistics and reader perceptions.
However, these studies are limited in (a) the range of the perceptual items included and
(b) the range of the linguistic items they included. To my knowledge, there have only
been two previous studies that have attempted to measure relationships between a wide
range of reader perceptions and a comprehensive set of relevant linguistic features:
Carroll (1960) and Egbert (2013).
In Carroll’s (1960) study ‘expert judges’ were asked to read passages and report
their perceptions on 29 scales. These responses, along with 38 objective linguistic
measurements, were included in a factor analysis. This resulted in six factors, or
underlying variables, based on the co-occurrence patterns of the objective and subjective
measures on each text. The first and most important factor in Carroll’s results was labeled
‘General Stylistic Evaluation’. This factor was composed entirely of subjective perceptual
variables that denoted a general positive or negative assessment of the texts. With the

28
exception of factor 3 and factor 6, Carroll’s factor interpretations were based more
heavily on the perceptual variables than the objective linguistic features. To use Carroll’s
words, it seems that “although the style of literary passages can be indexed in certain
ways mechanically, it cannot be evaluated mechanically!” (p. 289).
In a more recent study, Egbert (2013) used a dual methodology to measure
student perceptions of linguistic variation in textbook passages. In that study
undergraduate university students read textbook passages from two disciplines
(psychology and geology) and rated them using a new instrument, the Perceptions of
Effectiveness, Comprehensibility, and Organization (PECO) Scale. After quantifying 74
key linguistic features of university textbooks, Biber’s MD analysis was used to identify
and interpret underlying ‘dimensions’ of linguistic variation in introductory textbook
prose. This resulted in five interpretable dimensions of variability in textbook language.
Statistical correlations between the perceptual and linguistic variables suggested that
academic involvement and elaboration, colloquial discourse, academic clarity, and
contextualized narration are related to student perceptions of textbook effectiveness,
comprehensibility, and organization.
The studies reviewed in Section 2.3.3 suggest that reader perceptions are reliable,
important, and varied. Additional research from Carroll (1960) and Egbert (2013) offers
strong support for the usefulness of reader perceptions. Moreover, these two studies have
demonstrated the usefulness of triangulating reader perceptions with text-linguistic
variables in order to achieve a more complete understanding of the stylistics and quality
of written discourse.
The methodology proposed in this dissertation uses Stylistic Perception (SP)
analysis to investigate linguistic and stylistic variation from the perspective of audience
perceptions. In SP analysis multiple participants are asked to read each text sample in a
corpus and respond to a series of semantic differential items designed to measure their
perceptions of the style of the text (e.g., quality, readability, relevance). Factor analysis is
then used to reduce the items to a smaller set of stylistically interpretable dimensions
which can be used to explain variation in the corpus. The results of this method will then
be correlated with the results of a MD analysis of linguistic variation in the same texts. It
is hypothesized that the combined strengths of these two methodological approaches will
result in a richer and more complete understanding of register variation in published
academic writing.

2.4. Conclusion

In this chapter I began with a review of the literature on register and discipline
variation in academic writing. In Section 2.2 I identified the following two gaps in the
literature:

1. We need more linguistic research on publication type and discipline variation in


order to fully understand register variation in published academic prose.
2. We need to explore possible interaction effects between situational variables such
as discipline and publication type in order to accurately and completely account
for patterns of variation.

29
The most important research gaps were identified in Section 2.3, which focused on
stylistics and writing quality:

1. We need more research on reader perceptions, which has been shown to be an


important measure of writing style.
2. We need more research that triangulates the text-linguistics and reader
perceptions of written texts in order to better understand writing quality and style.

The previous section introduces a dual method of measuring stylistic variation by


correlating the results of a method of measuring reader perceptions (SP analysis) and a
method of measuring linguistic variation (MD analysis). There are several advantages of
simultaneously investigating linguistic variation within texts and reader perceptions of
writing style and quality. This method allows us to quantify the relationships between the
linguistic choices of authors and the perceptions of readers, which improve our
understanding of what makes quality writing. This provides the information needed to
assess long-held yet largely untested assumptions about the effects of different writing
styles on readers. Another benefit of this method is that it allows researchers to account
for the perceptions of readers, allowing them to base their conclusions about linguistic
variation on more than their own interpretations of linguistic variation. The dual
methodology proposed in this dissertation makes it possible to achieve interpretations
that are more objective and reliable.
Before reporting the methods and results of the dual methodology introduced
here, I introduce the corpus used in this study (Chapter 3) and the development and
application of a comprehensive framework for describing the situational characteristics of
published academic writing (Chapter 4).

30
CHAPTER 3. CONSTRUCTING AND ANALYZING A CORPUS OF
PUBLISHED ACADEMIC WRITING

3.1. Introduction

In the first two chapters I introduced the need for more studies that explore reader
perceptions of the linguistic and stylistic choices made by academic writers. In an effort
to fill this gap, I needed a corpus that met specific criteria of representativeness, including
the number, length, topic, and source of texts. As there were no existing corpora that met
these criteria, it was necessary to design and construct a new corpus to achieve the
objectives of this study.
In this chapter I describe the design and construction of the corpus used in this
study and assess its representativeness. Chapter 4 explains the development and
application of a framework for the comprehensive description of the situational
characteristics of published academic writing. Chapters 5 and 6 present the methods and
results of the MD analysis and SP analysis, respectively. Chapter 7 contains a series of
analyses used to measure the relationships between linguistic variability in published
academic writing and the perceptions of lay readers. Finally, Chapter 8 synthesizes and
interprets the findings from this study and addresses implications and future research.
The purpose of this chapter is to present a detailed description of the methods
used to construct and analyze the corpus used throughout this dissertation. Section 3.2
contains the operational definitions and the text selection process for each of the three
registers included in this study: journal articles, university textbooks, and popular
academic writing (see also Sections 1.1.1 – 1.1.3 and 2.2.3.1 – 2.2.3.3). The contents and
representativeness of the actual corpus are then described and assessed in Sections 3.3
and 3.4. Finally, in Section 3.5 the methods used to identify and quantify the variables of
interest are outlined.

3.2. Corpus collection procedures

3.2.1. Operational definitions for the disciplines and registers

Before designing and constructing the corpus it was necessary to develop


operational definitions for the disciplines and publication types I set out to represent in
the corpus. Publication type variation is the primary focus of the analyses performed in
this study. However, in order to increase the generalizability of the findings and answer
additional questions regarding interaction effects, I sampled an equal number of texts
from two discipline—biology and history—within each publication type. By way of
definition, biology is a natural science concerned with the study of living organisms.
Biology is usually studied quantitatively using experimental or observational methods.
History, on the other hand, is the study of past human and societal actions and events, and
is traditionally classified with the humanities (see Gray, 2011, p. 73). History is most
often studied qualitatively through observing and interpreting primary sources.
For the purposes of this study, journal articles are defined as primary research
articles published in peer-reviewed academic journals that transmit new or developing
knowledge to specialists. While there are a variety of publication types other than

31
primary research reports that may be included with academic journals (e.g., book
reviews, letters to the editor, introductions to a special issue), these have been excluded
from the corpus in this study.
University textbooks are defined in this study as published resources of
instruction that transmit established knowledge to students as the core reading material
for a university course. University professors often assign supplementary reading
material, such as reading packs, reference books, and online information. However, only
core pedagogical books were considered for inclusion in the corpus.
Operationalizing popular academic writing proved to be more difficult. In some
ways, the definition presented here is narrower than the traditional conceptualization of
popular academic writing; in other ways it is broader. For this study, I defined popular
academic writing as published non-fiction books that transmit interesting, entertaining, or
newsworthy academic findings or ideas to the non-specialist public. Although a great
deal of popular academic writing is published in newspapers and other periodicals,
writing from those publications was excluded from this corpus.

3.2.2. Source text selection

This section outlines the decisions made during the text selection process. It
should be noted that the actual texts in the corpus are only short excerpts (500-600
words) from the source texts described in this section. The selection process for the text
excerpts varied by register. However, in every case, the text passages are coherent and
largely self-contained, beginning and ending at paragraph boundaries, where possible.
This was done to reduce the impact of using only partial text samples on the linguistic
analyses and reader perceptions.
The selection criteria for journal article samples in the corpus were essentially the
same as the method used by Gray (2011). Gray’s (2011) corpus of academic journal
articles was selected from journals identified by experts in each of the six disciplines. For
each of the two disciplines, I chose to sample from seven of the journals identified by
Gray’s (2011:50) expert informants. Each of the fourteen journals is peer-reviewed and
currently in print (see list in Table 3.1).
Previous research has identified important linguistic differences across the four
traditional journal article sections (introduction, methods, results, discussion), especially
in quantitative research reports, such as those published in biology journals. Therefore the
short passages were selected in a stratified manner from across the various sections of the
journal articles. For the biology texts this process consisted of simply alternating among
the four key article sections, resulting in a roughly equal number of texts from each
section. There tends to be more variability in the organization of journal articles in
history. However, each of the articles contained an introduction, a main body, and a
conclusion. Therefore I chose to alternate my selections among those three main sections
in order to represent potential linguistic variability. All of the journal articles were
published between 2005 and 2012.
For convenience, three textbook samples for each discipline were taken from the
TOEFL 2000 Spoken and Written Academic Language (T2K-SWAL) corpus (see Biber,
2006b:23-31). The remaining textbook samples were selected from one of two electronic
academic book databases: ebrary.com or MyiLibrary.com (see Table 3.1). These are

32
subscription-based online repositories of academic books that were accessed through
Northern Arizona University’s library system. While a large proportion of the books
available through these two sources are not university textbooks according to my
operational definition above, texts were only selected if they were explicitly distinguished
as university course textbooks by the book’s author(s) or publisher. Text excerpts were
selected from a variety of chapters and locations within chapters in order to avoid a
sampling bias. The textbooks in this sample were published between 1999 and 2012.

Table 3.1. Sources for the texts in the corpus

Journal Articles—Biology University Textbooks— Popular Academic


Biology Books—Biology

1. Journal of Natural History 1. T2K-SWAL Corpus 1. New York Times Books


2. Microbial Ecology 2. ebrary
3. Journal of Cell Biology
4. American Journal of
Physiology
5. PNAS
6. Applied and
Environmental
Microbiology
7. Conservation Biology

Journal Articles—History University Textbooks— Popular Academic—


History History

1. American Historical 1. T2K-SWAL Corpus 1. New York Times Books


Review 2. MyiLibrary
2. Historical Research 3. ebrary
3. Journal of Urban History
4. The Western Historical
Quarterly
5. Journal of Colonialism
and Colonial History
6. Journal of World History
7. Journal of Women’s
History

The popular academic text samples were selected from the New York Times
Books website which contains the first one or two chapters from books reviewed by the
New York Times. All of the popular academic books in this sample were published after
1997. Samples were drawn from the second chapters of the relatively few books for
which they were included. In each case, these were texts that clearly fit each aspect of the
operational definition established above and clearly belonged to the appropriate discipline
(history or biology). While there have been no studies that have investigated linguistic

33
variability across chapters of popular academic writing, this is a potential limitation of the
study as only writing from the beginning of these books has been represented in this
corpus.
Each text sample in the corpus is relatively short, being only 500-600 words long,
with an overall mean of 566 words. This was done for purposes of practicality, in order
to ensure that participants could read the full text sample and rate it on each of the
perceptual scales in a reasonable amount of time. (See Appendix A for detailed
information regarding each of the individual texts in the corpus).

3.2.3. Text formatting and cleaning

The 150 text passages were copied from the source text. All visual aids (pictures,
figures, tables, charts, graphs) were removed along with any references to them. Use of
visuals ranged from almost none to several per page, and including these would have
introduced an unnecessary confounding variable. Use of indentation, bold, italics,
headings, and sub-headings in the source text was retained. Each text was saved as a
plain text file for the linguistic analysis and as a Word document for use in the survey.
Finally, a header was included in each of the plain text files that contained the following
information: publication type, discipline, author(s), title, source, page(s), and word count.

3.3. Corpus Description

The corpus in this study is a balanced collection of texts from three publication
types and two disciplines of academic writing, comprising a total of 150 texts and 84,908
words. In Section 3.4 I address the representativeness of the text samples in the corpus.
Table 3.2 displays a descriptive summary for the contents of the final corpus used in this
study.

Table 3.2. Text and Word Counts in the Academic Written English Corpus
Register Biology History Total

Popular Academic 25 (14,306) 25 (14,349) 50 (28,655)

Textbooks 25 (13,875) 25 (14,042) 50 (27,917)

Journal Articles 25 (14,067) 25 (14,269) 50 (28,336)

Total 75 (42,248) 75 (42,660) 150 (84,908)


*Word counts are in parentheses

3.4. Assessing the representativeness of the corpus

There was a time in the not-so-distant history of corpus linguistics research when
it was commonplace to construct corpora composed of relatively short text segments
rather than full texts. Some examples include the Brown corpus and the Lancaster-Oslo-
Bergen (LOB) corpus, both of which comprise text samples of c. 2,000 words, and the

34
London-Lund corpus, which contains 5,000 word samples. However, dramatic increases
in computing speed and power, as well as the increasing availability of electronic texts,
scanners, and optical character recognition programs has led to a surge in the construction
of full text corpora. For most contemporary corpus-based research investigations,
constructing full text corpora is both appealing and achievable. This study is
unconventional because in addition to subjecting the corpus texts to linguistic analysis, I
also gathered data regarding the perceptions of participants who actually read each of the
texts in the corpus. For the purposes of this analysis of perceptions I determined that
readers could reasonably be expected to read between about 1 single-spaced page (1”
margins; Times New Roman; 12 point font). This left me with texts that are between 500
and 600 words in length.
While the short text samples can be seen as a limitation of this study, Biber (1993)
found that frequent lexico-grammatical features are quite stable in relatively short texts.
In an effort to investigate the “optimal text length,” Biber (1993) used texts from the
LOB and the London-Lund corpus to measure the stability of a variety of linguistic
features across 200-word text segments. He summarized his findings by stating:
“Common linear linguistic features are distributed in a quite stable fashion within texts
and can thus be reliably represented in relatively short text segments” (p. 252).
Furthermore, Biber (1993) reported that most grammatical features “occur in the first 200
words, with relatively few grammatical categories being added after 600 words” (p. 251).
These findings offer strong evidence for the representativeness of the text length used in
this study. However, the representativeness of the specific texts in this corpus is an
empirical question.
In order to determine the extent to which the 500-600 word texts in this corpus
represent the registers they were sampled from, I compare the linguistic patterns in this
corpus with those in two sub-corpora from Conrad’s (1996a) dissertation study that were
designed to represent the exact same registers (university textbooks in history and journal
articles in biology). Conrad’s (1996a) corpus contained 3,200 word samples of journal
articles and 5,000 word samples from university textbooks, both from two disciplines:
biology, specifically ecology, and history, specifically American history. Conrad (1996a)
presents the results of an MD analysis, using the dimension structures of Biber’s (1988)
first five dimensions. I performed the same analysis on the texts in my corpus, using the
same set of linguistic features to calculate dimension scores for each text and mean
dimension scores for each register. The results from my analyses can be compared with
Conrad’s (1996a) results for university textbooks in history in Figure 3.1 and journal
articles in biology in Figure 3.2.
The results presented in Figures 3.1 and 3.2 reveal that the patterns of linguistic
variation across the various dimensions are almost identical between my corpus and
Conrad’s (1996a) corpus, which is composed of texts that are much longer than those in
my corpus. This lack of deviation between the two corpora offers strong support for the
representativeness of the corpus used in this study. The results of these case studies
combined with Biber’s (1993) evidence for the representativeness of short texts are
sufficient to support the use of 500-600 word texts samples in this corpus.

35
Figure 3.1. Mean Biber (1988) Dimension Scores for the History University
Textbook Sub-corpora in Conrad (1996a) and the present study

10

0
D1 D2 D3 D4 D5
-5

-10

-15

-20

-25

Present study Conrad (1996)

Figure 3.2. Mean Biber (1988) Dimension Scores for the Biology Journal Article
Sub-corpora in Conrad (1996a) and the present study

10

0
D1 D2 D3 D4 D5
-5

-10

-15

-20

-25

-30

Present study Conrad (1996)

36
3.5. Methods for annotating and quantifying variables

3.5.1. Text-linguistic features

Most of the linguistic features included in this analysis were selected based on previous
MD analyses of academic writing, such as Conrad (1996a), Biber (2006b), and Gray
(2011). In each of these studies, the Biber Tagger1 was used to grammatically annotate
the corpus. Like other taggers, the Biber’s tagger identifies the part of speech for each
word in the corpus. However, unlike most other taggers, the Biber Tagger assigns
additional information to each word, such as verb tense, aspect, and voice, semantic
categories, lexico-grammatical associations, and stance features. Most of these features
can then be counted by running an additional program called TagCount, also developed
by Biber. This calculates the sum for each feature in each text and then normalizes the
count to per thousand words. After performing an extensive literature review and an
analysis of potential linguistic variables, I chose to include 50 features from Biber’s
TagCount output in this study. In addition to the linguistic variables introduced above,
the following six features were included in this study: nouns as pre-nominal modifiers,
the percent of the text composed of academic vocabulary and common core vocabulary
(1-500 and 501-3,000), and the number of common phrasal verbs and academic lexical
bundles. This resulted in a total of 56 linguistic features that were included in the final
analysis.

3.5.2. Stylistic perceptions

In addition to computing rates of occurrence for the linguistic features, each text
was rated by 25 independent readers in order to measure their perceptions of the writing
quality of each text. This section offers only a brief overview of the development and use
of that survey, but more detail can be found in Chapter 6.

3.5.2.1. Developing a measure of reader perceptions of writing quality

In order to measure reader perceptions of writing quality a survey instrument was


developed using a series of semantic differential items (see Chapter 6). After developing
an initial set of survey items based on previous literature, extensive pilot research was
conducted in order to increase the reliability of the items in the survey. The final survey
contained 38 items designed to measure a wide range of reader perceptions about writing
quality and style.
Each of the 150 texts in the corpus, along with a survey link, was then
electronically administered to 5 independent readers. After reading the text in its entirety,
each reader was instructed to rate the text on all of the perceptual items. This resulted in
38 perceptual measurements from 25 readers on each of the 150 texts. The full
methodology used during this stage of the study is explained in Chapter 6.

1
The Biber Tagger is a probabilistic and rule-based computer program written by Biber (1988). Over the past 25 years,
this tagger has been used for many large-scale corpus analyses during which it has been revised and improved (e.g.,
Biber, 1988; Biber et al., 1999; Biber, 2006b).

37
3.6. Summary

I began this chapter by establishing the need to create a new, specialized corpus
that could be used to answer the research questions of this study. In Section 3.2 I gave an
overview of the procedures used to collect the corpus, including operational definitions
for the registers and disciplines, sampling procedures, and text formatting and cleaning.
Section 3.3 contains a complete description of the contents of the corpus. Section 3.4
contains a discussion of issues related to the representativeness of short text samples and
presents the results of a case study that offers strong support for the relatively short
sample texts contained in the corpus. Finally, Section 3.5 introduces the methods used to
(a) automatically annotate and quantify the linguistic variables in the corpus and (b)
collect reader perceptions of the text samples.
This corpus is used for the analyses reported in Chapters 5, 6, and 7. In the next
chapter I present the methods and results of a comprehensive analysis of the situational
characteristics of the registers in the corpus.

38
CHAPTER 4. THE SITUATIONAL CHARACTERISTICS OF PUBLISHED
ACADEMIC WRITING

4.1. Introduction

Registers are situationally defined language varieties. In order to perform a


complete register analysis, a researcher must: (1) define the situational characteristics of
the register(s) under study, (2) describe the linguistic features of interest, and (3) interpret
the linguistic patterns based on their functions within the situational context of the
register(s) (see Biber & Conrad, 2009, pp. 6-12). Comprehensive descriptions of registers
are usually the result of a cyclical, iterative process in which each of these three key
components is considered many different times in various stages of the research project.
In order to successfully represent a set of registers, the researcher must first establish
operational definitions for the target registers. Only then can the researcher develop and
apply an appropriate sampling design for the texts in the corpus. Once the researcher has
established the situational features of the target domain and sampled relevant texts, the
external representativeness of the corpus can be assessed through an analysis of the
situational characteristics of the actual texts in the corpus. The researcher can then use
this understanding of the situational characteristics of the corpus to select relevant
linguistic features and interpret patterns in their use.
The purpose of this chapter is to present the results of a comprehensive analysis of
the situational characteristics of the three publication types (journal articles, university
textbooks, and popular academic books) and two disciplines (biology and history)
included in this study. Section 4.2 briefly defines the operational definitions for the target
registers. Section 4.3 introduces the framework used for this analysis. Section 4.4
presents the quantitative and qualitative results of the analysis of those situational
characteristics. Finally, Section 4.5 presents an overview of the similarities and
differences among the six registers represented in the corpus.

4.2. Definitions for target registers

The following operational definitions for the three publication types are used in
the sampling of texts and throughout this study (see also Sections 1.1 and 3.2.1).

Journal Article: A primary research article published in a peer-reviewed academic


journal that transmits new or developing knowledge to specialists.

University Textbook: A published manual of instruction that transmits established


knowledge to students as the core reading material for a course in a university context.

Popular Academic Books: A published non-fiction book that transmits interesting,


entertaining, or newsworthy academic findings or ideas to the non-specialist public.

The purpose of the following section is to develop a framework that can be used
to describe the key situational characteristics of published academic writing within these
three target registers.

39
4.3. A framework for the situational characteristics

The situational framework adopted in this study is the product of three main
sources. The first source is situational frameworks from previous studies on register
differences, namely Biber and Conrad (2009), Biber (1988), Conrad (1996a), and Gray
(2011). As Gray (2011) noted, Biber and Conrad’s (2009) framework is particularly
useful for making general distinctions among broad register categories, such as place of
communication (private, public) and production circumstances (real time, planned,
scripted, revised and edited). This is the approach taken in Biber’s (1988) situational
analysis of spoken and written registers. The situational framework adopted here contains
characteristics from all but two of Biber and Conrad’s (2009) top-level situational
categories. As the channel and production circumstances are essentially identical across
the six registers in this study, it was not necessary to describe the texts in terms of these
categories. In order to achieve the descriptive granularity necessary to capture the more
fine-grained differences among registers in a relatively narrow corpus study, such as this
one, I also referred to the frameworks developed by Conrad (1996a) and Gray (2011) as I
developed the framework presented here. These studies were particularly useful because
they were both focused on describing written academic registers. Therefore, many of the
situational categories in my framework were informed by Conrad (1996a) and Gray
(2011). One important difference between my framework and Gray’s (2011) framework
is the absence of a ‘Textual Layout and Organization’ category because my texts were
standardized and visual elements were removed. The ‘Methodology’ and ‘Explicitness of
Research Design’ categories were also not useful for my study because they are only
applicable to journal articles.
The second source I used in order to develop this situational framework is
previous literature on the registers and disciplines included in this corpus. These studies
are among those reviewed in Chapter 2. As many of these studies are not based on
empirical research, it is possible that some of the characteristics will prove less than
useful in functionally interpreting the linguistic patterns revealed by this study. However,
in an effort to follow Conrad’s (1996b, p. 42) guidelines for situational frameworks, I
aimed to be as comprehensive as possible in the development of my situational
framework.
The final source used to develop the situational framework adopted here is my
own analysis of the characteristics of the target registers in this study. In addition to
reading many of the texts in this corpus, I reviewed several other texts from these
registers to gain a sense of their distinguishing situational characteristics. This was
especially valuable in the case of popular academic books because of my unfamiliarity
with this register.
The consideration of these three sources, previous situational frameworks,
relevant literature on these registers, and my own situational analyses, did not occur at a
single point in time. This was an iterative process that continued throughout the course of
the study. While most of the final categories in the situational framework existed in some
form before the data were analyzed linguistically and interpreted functionally, a number
of the sub-categories were added during and after these stages were carried out. In
addition, almost all of the categories underwent some amount of change during the course

40
of the study in order to adjust their meaning or fine-tune their roles in defining the
situational characteristics of the registers included in the study.
The situational framework developed for this study is organized into seven
categories, comprising seventeen specific sub-categories. Each of these categories will be
defined and discussed below.

4.3.1. Participants

Like Gray (2011), my focus in the participants category is the writers of the texts
in the corpus. This is done in typical register analyses because little to nothing is known
about the actual readers of these texts; it is possible to make assumptions about the target
or ideal audience and, to a lesser extent, about the actual readers of these texts. The first
sub-category in the participants category is a quantitative measure of the number of
authors of a given text and is divided into three categories (1, 2-4, and 5+), following
Gray (2011). The second sub-category is a measure of the educational levels of the
authors, and the third sub-category is a measure of the percent of authors who have
educational backgrounds that are relevant to the discipline they are writing in. The final
participant sub-category records the professions of the authors.

4.3.2. Relations among participants

The first sub-category in the relations among participants category is a simple


qualitative generalization regarding the relationship status between the writer and reader
of the text. The second sub-category aims to define the degree of shared knowledge
between the writer and the reader. As mentioned above, one limitation of many corpus
studies of written communication is that they lack of information about the actual readers
of the texts in the corpus. While this study does have information about readers, they are
not meant to represent the target readership for the six registers. Therefore, the
descriptions for both of these sub-categories are based on the assumed target audience for
the texts.

4.3.3. Setting

For the purposes of this study, setting is divided into two sub-categories:
community and time. Community is defined generally as the cultural or institutional
‘space’ or context in which this writing is generally read and interpreted. The second sub-
category refers to several time periods that are relevant to the construction and
interpretation of a text, including the time about which the author is writing, the time in
which the text is written, and the time period for which the writing was intended.

4.3.4. Subject matter

The subject matter category refers to the topic and the level of specificity the
author uses to write about that topic. The topic sub-category refers to the content,
material, or subject matter covered by the text. The level of specificity sub-category deals

41
with the degree of generality or specificity of the topic and the narrowness or broadness
of the text’s focus.

4.3.5. Purpose

The purpose category has direct reference to the writer’s objectives for writing a
particular text. Purpose, as defined here, also refers indirectly to a reader’s purpose for
reading a text. While there are certainly many possible objectives for writing or reading a
text, I will attempt to define author and reader purposes for each register that capture the
main objectives of the respective participant groups.

4.3.6. Nature of data or evidence

The final situational category has five sub-categories, each of which plays an
important role in distinguishing between the various types of data or evidence presented
by the author and the way that evidence is explained and interpreted. The first sub-
category describes the object under study for each of the registers. Level of evidence
refers to whether the evidence presented by the author is primary or secondary (see
Conrad, 1996a, p. 114). The next two sub-categories, description of procedures and
explanation of evidence, refer to the depth or intensity of an author’s presentation of
methods and data, respectively. In both cases, the depth of an author’s presentation will
be coded on a three point scale (none, mention, extensive) taken from Conrad
(1996a).The final sub-category, interpretation of data or evidence, deals with the
perspective used by the author to interpret the object of study for the reader.

4.4. Analyzing the situational characteristics of the six registers

After developing operational definitions for each of the situational characteristics,


I analyzed the three publication types (popular academic books, university textbooks, and
journal articles) in both of the two disciplines (biology and history) in terms of all
seventeen situational categories described above. Table 4.1 displays the results of these
analyses. Under each of the publication type headings there are two columns, one for
each discipline. However, in cases where the situational characteristics of a particular
publication type did not differ between the two disciplines, the two discipline columns
were merged into one column.

42
Table 4.1. Situational characteristics of journal articles, university textbooks, and popular academic books

Situational Popular Academic Books University Textbooks Journal Articles


characteristic
Biology History Biology History Biology History

Participants
Number of
authors
1 23 24 10 19 1 22
2-4 2 1 15 6 19 3
5+ 0 0 0 0 5 0

Author --PhD: 74% --PhD: 60% PhD: 100% PhD: 100%


education --PhD/MD: 7% --JD: 12%
--MD: 4% --MD: 4%
--JD: 4% --MA: 8%
--MS: 4% --BA: 8%
--BA: 7% --Less: 8%

Relevant author 93% 96% 100% 100% 100% 100%


education

Author Professor: 67% Professor: 69% Professor: 100% Professor or Researcher: 100%
profession Researcher: 7% Politician: 8%
Curator: 11% Writer: 15%
Physician: 4% Lawyer: 4%
Writer: 11% Military: 4%

43
Relations among participants
Author-reader *Scholar*-public Scholar-student Scholar-scholar
relationship

Shared Very little shared technical Limited shared technical High level of shared technical
knowledge knowledge assumed knowledge, depending on level of knowledge
textbook

Setting
Community -Popular/mainstream -University -Academic/scholarly community

Time -discoveries from past (often - concepts from past research; -ongoing/real-time; situated in
recent or ongoing) research, with often applied to current issues previous literature
a focus on current relevance and
entertainment

Subject matter
Topic Newsworthy, rare, or interesting Important principles, concepts, Experimental Ideas and
topics or issues theories, and ideas or observations
observational about human
biological history
research

Level of Specific topic with broad focus General topic area with some Specific topic with narrow focus
generality/ specific explanations
specificity

44
Purpose
Entertain, praise, and synthesize Teach important ideas or concepts Transmit Transmit
ideas and findings information author’s ideas
about new and
scientific observations
findings
Nature of data or evidence
Object of study Living Historical Living Historical Measures of Historical
organisms events, organisms events, living documents and
cultures, cultures, organisms artifacts
peoples, and peoples, and
artifacts artifacts

Level of Secondary Secondary Secondary Secondary Primary Primary


evidence

Description of None or None or Mention to Mention Extensive Mention to


procedures mention mention extensive extensive

Explanation of Mention Mention Mention to Mention Extensive Mention to


evidence extensive Extensive

Interpretation of -Interpreted based on implications -Interpreted based on relevance to -interpreted -interpreted


data or evidence for public other concepts and educational based on based on author
objectives previous claims
literature and
data

45
4.5. Trends in the situational characteristics of academic writing

The results of the analysis of situational characteristics displayed in Table 4.1


reveal many important patterns across the registers, publication types, and disciplines in
the corpus. After summarizing a number of general similarities shared by all or most of
the texts in the corpus, sub-sections 4.5.2 through 4.5.4 will focus on trends within each
of the various sub-corpora. Text excerpts will be included in these last four sub-sections
to exemplify the key patterns revealed by the situational analysis, with a sample of
relevant features bolded for emphasis.

4.5.1. Common characteristics across the registers and disciplines

As mentioned above, there are some situational characteristics that are shared by
all or most of the texts in the corpus. These shared characteristics are not less important
than the characteristics that revealed differences across sub-corpora. On the contrary,
these similarities are critical in that they make it possible to control for some amount of
variability in the corpus, thus enabling a more focused investigation of the linguistic
variability that can be attributed to register differences.
The texts in this corpus are taken from published writing that was planned and
extensively revised and edited. Each of these texts is directed toward an un-enumerated
set of addressees. These texts will exist for decades or centuries and, therefore, there is no
possible way of identifying the number or nature of those groups of addressees. In
addition, as displayed in Table 4.1, almost all of the authors are scholars with advanced
degrees, and most have educational backgrounds that are relevant to their writing. Most
of the authors also hold professional positions as professors or researchers. Finally, at a
general level, each of the texts in the corpus shares the purpose of disseminating
scientific or academic information.

4.5.2. Popular academic books

Before describing the situational characteristics of the popular academic books in


this corpus, it should be stressed that the sample in this corpus contains only one type of
popular academic writing: high-profile popular academic books. The majority of the texts
in the popular academic books sub-corpus were single-authored. Eighty-one percent of
the fifty popular academic books were written by scholars with doctoral degrees, in
comparison with the university textbooks and journal articles which were written
exclusively by authors with PhD degrees. A small number of the authors of popular
academic books had an educational background that is not relevant to the discipline they
are writing about. Furthermore, some of the authors of popular academic books are
writers or journalists rather than scholars. These findings are quite different from claims
made by others that popular academic writing is typically written by journalists or non-
experts (see, e.g., Charney, 2003; 2004). While this may be the case in other types of
popular academic writing, the popular academic books in this corpus are most often
written by authors who possess advanced degrees in their field.
Popular academic writing is typically targeted to a heterogeneous, mainstream
public audience that has very little assumed technical knowledge or specialization. The

46
popular academic writing in this corpus tends to focus on specific past (often recent)
discoveries that are newsworthy, rare, or sensational in an entertaining or celebratory
manner:

4.1 This stunning hypothesis was put forward in 1980 by some of our
colleagues in the Geology and Geophysics Department when we were
graduate students at Berkeley…Almost immediately after the asteroid
hypothesis was published in 1980, both the press and the public became as
engrossed in the issues as our scientific community at Berkeley did.
[PA_BI_09]

4.2 Passionate debates about abortion derive from motivations to control


female reproduction that are far older than any particular system of
government, older than patriarchy, older even than recorded history.
Male fascination with the reproductive affairs of female group members
predates our species. Young women of my daughters' generation take for
granted a historically unique situation. [PA_BI_17]

This supports claims from Fahnestock (1984) that popular academic writing often
contains praise for specific researchers as well as the scientific community in general. In
addition, writing within this register often establishes problems and presents solutions
from science or academia (see Adams-Smith, 1987):

4.3 It is impossible to predict the number of new secessionist efforts likely


in the future. But whatever the number, it seems inevitable that some of
these efforts will present agonizing choices for U.S. leaders. If recent
history is any guide, U.S. policy makers will be unprepared to deal well
with future dilemmas. [PA_HI_07]

4.4 If all known HIV types have gained the virulence to cause AIDS and
death, is this virulence now increasing or decreasing? Does the HIV of
the Western epidemic cause AIDS faster now than in the early 1980s?
Preliminary evidence is discouraging at first glance. [PA_BI_04]

4.5 The conquest or control of any disease requires the efforts of many.
However, several who became prominent by identifying, isolating, or
curing viral infections have been singled out by history as heroes. This
book also examines the research of medical investigators which
eventually linked certain diseases with specific viruses and led to their
ultimate control. [PA_HI_02]

While the object of study, as well as the amount of emphasis on evidence and
procedures, varies by discipline, the evidence used in the popular academic texts is
secondary and interpreted based on its implications for a broad public audience:

47
4.6 William W. Howells […] has spent decades studying the skulls of human
populations throughout the world. He has demonstrated that Neanderthal
skulls have characteristics that never occur in modern human beings;
modern human beings, conversely, have features that never occur in
Neanderthals. In fact, the specific skeletal "evidence" cited by
multiregionalists to support their theory appears to be irrelevant.
[PA_BI_22]

4.7 Contemporary Jewish historians focus on the creation of the state of


Israel as a sign of real hope, an antidote to post-Holocaust despair and a
token of vigorous national recovery. […] We are more aware now of the
continuity of Jewish life in the Land of Israel, and the important
contributions that Israel has made to Judaism worldwide. [PA_HI_14]

4.5.3. University textbooks

The university textbooks in this corpus were authored by professors who hold
relevant doctoral degrees. More than half of the textbooks were single-authored. In each
case, the primary target audience of these textbooks comprises university students with a
limited background in the subject matter. These books are organized into chapters in
which important topics or concepts are introduced on a basic level first before the
introduction of more sophisticated information:

4.8 Before attempting to understand descriptions, comparison tables, or


identification keys here or in manuals, or to communicate with others
about tree identification, one first has to become familiar with the
morphology of vegetative and reproductive parts and descriptive
terminology for these parts-the necessary jargon of dendrology.
[TB_BI_01]

4.9 To understand the savage repressions that took place in Spain, the
character of the revolutionary civil wars of the first half of the twentieth
century must be kept in mind. These were conflicts of the transition to
“classical modernity” […] [TB_HI_12]

The evidence presented in these textbooks is secondary and is interpreted based


on its relevance to other concepts and educational objectives:

4.10 Today, the most comprehensively studied and understood fungi are the
yeasts and eukaryotic model organisms Saccharomyces cerevisiae and
Schizosa-ccharomyces pombe. […] For example, the cosmopolitan
(worldwide) genus Fusarium and their toxins associated with fatal
outbreaks of alimentary toxic aleukia in humans were extensively studied
by Abraham Joffe. [TB_BI_07]

48
4.11 The identification of spatial patterns will provide clues about how
biogeography contributes to the distribution of microbial diversity.
Recall from Section 1.1 that water availability, oxygen levels, and
nutrient status have been linked to microbial activities at the local scale.
[TB_BI_17]

4.5.4. Journal articles

Like the university textbooks, the sample of journal articles included in this
corpus were written exclusively by professors or researchers with doctoral degrees in
their field of specialization. Whereas biology journal articles are co-authored in every
case but one, the history articles are single-authored in most cases. Peer-reviewed journal
articles are highly specialized texts. Thus the authors and readers of these texts share a
substantial amount of technical knowledge. The text in academic journal articles is often
organized as a chronological report of a study in which a researcher begins by
establishing one or more unexplained phenomena and attempts to explain those
phenomena with the use of evidence (see Adams-Smith, 1987):

4.12 One can recognize that the identification of some Chinese ceramic shards
by class remains problematic in older publications: only jar shards and
Longquan green-glazed stonewares have been identified reliably,
whereas other classes must be approached with caution. [JA_HI_17]

4.13 At the time of the present study this species was still not formally
described and was treated under the name C. ocellifer Spix, 1825, a very
common Brazilian lizard. The sample obtained in the present study
provided evidence for the entity constituting a new species, and allowed
its formal description (Rocha et al. 1997). [JA_BI_03]

In both cases, the focus of the research is narrow, and the goal is to generate and
disseminate ideas and findings from primary research in order to add to the understanding
of the academic community.

4.5.5. Situational differences between biology and history texts

There were a number of discipline-related patterns that emerged from the


situational analysis. It was much more common for biology texts to be written by more
than one author, especially in journal articles. It was also more common for the authors of
popular academic texts to hold an advanced degree. The evidence in journal articles is
primary for both disciplines. However, biology articles tend to report much greater detail
regarding the procedures and evidence from the study:

4.14 The concentration of free SO was adjusted to 36 mg liter 1 in three


rounds over a period of 2 weeks. Finally, the wines underwent depth
filtration with filter sheets (EK grade; Seitz-Schenk, Germany)

49
followed by 0.45- m membrane filtration (142-mm nylon disk filter;
Millipore) before bottling. [JA_BI_04]

Furthermore, the evidence used by biologists is dramatically different from that used by
historians. History research is largely qualitative, with researchers reporting their own
interpretations of observational data:

4.15 Some such reading, I would contend, greatly illuminates the (at times)
disproportionate importance with which the telegraphic network is
invested in what we might call the collectively constructed mutiny ur-
narrative; […] In short, I would suggest that the telegraph in India
functions as a superbly apt trope for British colonial subjectivity […]
[JA_HI_06]

4.16 From the attention she paid to the preservation of her manuscripts on her
deathbed, we may gather how much these manuscripts meant to her.
Instead of burning them, […], Jin Yi looked to these very manuscripts to
establish, and broadcast, an iconic image of herself that closely
resembles Lin Daiyu. [JA_HI_21]

Research in biology, on the other hand, uses the scientific method in order to develop
research questions or hypotheses, collect and analyze experimental or observational data,
and interpret the findings:

4.17 No significant difference was found between groups after 1 wk of diet


supplementation (Fig. 2). However, after 5 wk of dietary intervention, all
four treated groups presented a slight but significant increase in plasma
antioxidant potential (Fig. 2). Values (mmol/l) ranged from 1.72 (SD
0.16) for untreated mdxCv animals to 2.35 (SD 0.55) for animals that
received EGCG or PTX. [JA_BI_01]

4.18 We observed some clear patterns in the aquatic hyphomycete


communities associated with leaf litter during decomposition. […] As our
study showed, taxonomic richness and sporulation rates were low after a
week of incubation, increasing thereafter. [JA_BI_18]

Sections 4.4.2 through 4.4.5 have described and illustrated some of the important
situational characteristics of journal articles, university textbooks, and popular academic
books. Section 4.5 summarizes this chapter and establishes the importance of these
situational characteristics for this study.

4.5. Summary

In this chapter, I have emphasized the importance of analyzing the situational


characteristics of the texts in a register study. I have developed a comprehensive
framework for analyzing the situational characteristics of journal articles, university

50
textbooks, and popular academic writing across disciplines. This situational framework
was used to describe the non-linguistic characteristics of the texts in the corpus. The
results of this analysis revealed important differences in the situational contexts of the six
registers, and also helped to identify some variation across the three publication types and
the two disciplines. Finally, trends in the situational characteristics of the six registers
were explained and exemplified using text excerpts from the corpus.
The results of this situational analysis have revealed many situational differences
among the three publication types and the two disciplines. Accordingly, these results
have established concrete and specific situational parameters that can be used to interpret
linguistic and perceptual patterns across registers in this study. This a priori situational
analysis gives me the ability to interpret the results of this study in a systematic fashion
by looking back at pre-established situational patterns in order to interpret findings.
Therefore, the patterns and findings from this analysis will be used extensively in
Chapters 5 – 8 as key evidence in the functional interpretation of the linguistic and
perceptual results of the study.

51
CHAPTER 5. A MULTI-DIMENSIONAL ANALYSIS OF LINGUISTIC
VARIATION IN PUBLISHED ACADEMIC WRITING

5.1. Introduction and background

The primary purpose of this chapter is to present a comprehensive description of


the linguistic characteristics of three publication types (journal articles, university
textbooks, and popular academic books) in two disciplines (biology and history). This
description is both quantitative, including the results of a MD analysis and a series of
ANOVAs and corresponding post hoc tests, and qualitative, including interpretations that
are based on situational differences among the three publication types and
methodological and stylistic differences between the two disciplines.
In Chapter 2, I discussed the need for more research that investigates linguistic
variation across publication types and disciplines of published academic writing. In
addition to the need for a better understanding of these variables individually, there is
also a need to understand the extent to which and the ways in which discipline and
publication type interact within registers. More importantly, the linguistic variation and
perceived quality and style of the texts in this corpus must be described before we can
determine the relationships between them. The MD analysis reported in this chapter is
performed in an effort to better understand the linguistic characteristics of published
academic writing across the six registers in this corpus.

5.2. Carrying out the Multi-Dimensional analysis

In this section I offer a detailed overview of the various methodological stages of the
new MD analysis carried out in this study. However, I first present a plot of the sub-
corpora in this study along Biber’s (1988) Dimension 1 ‘Involved versus Informational
Production’ (see Figure 5.1). Figure 5.1 also includes five of the major registers included
in Biber’s (1988) study for purposes of comparison. In Section 5.3 I then present the
dimension structures and their interpretations, display plots of the registers, and show text
excerpts that exemplify the dimension patterns.

5.2.1. Published academic writing along Biber’s (1988) Dimension 1

The first dimension of Biber’s (1988) MD analysis was not only the most important
dimension in that study, but a similar dimension structure has also surfaced in nearly
every MD analysis since. Linguistically, this dimension captures a nominal versus verbal
parameter, in which verbal features load positively and nominal features load negatively.
Based on this nominal-verbal pattern, this dimension also represents an involved versus
informational parameter of communication where the positive, verbal features are
associated with involved, interactive (usually spoken) texts, and the negative, nominal
features are associated with informational, non-interactive (usually written) texts.

52
Involved Production
35 ----------------------------------------- FACE-TO-FACE CONVERSATION (M = 35.3)

30

//

----------------------------------------- PREPARED SPEECHES (M = 2.2)

0
----------------------------------------- GENERAL FICTION (M = -0.8)

-5

-10
POPULAR ACADEMIC BOOKS—BIOLOGY (M = -11.5)

UNIVERSITY TEXTBOOKS—BIOLOGY (M = -13.6)


POPULAR ACADEMIC BOOKS—HISTORY (M = -14.4)
-15 ----------------------------------------- ACADEMIC PROSE (M = -14.9)

UNIVERSITY TEXTBOOKS—HISTORY (M = -18.1)


----------------------------------------- OFFICIAL DOCUMENTS (M = -18.1)
-20 JOURNAL ARTICLES—HISTORY (M = -19.32)

JOURNAL ARTICLES—BIOLOGY (M = -23.74)


-25
Informational Production

Figure 5.1. Distribution of the registers and disciplines along Biber’s (1988)
Dimension 1 (Involved versus Informational Production) compared to 5 registers
from Biber (1988)

53
This dimension also represents another related communicative parameter of oral versus
literate discourse. The positive features are associated with an oral style of discourse
characterized by non-edited production due to on-line processing. The negative features,
on the other hand, are associated with a literate discourse style that is planned and
carefully edited (see Biber, 1988, p. 107).
Based on these interpretations of this dimension, it is no surprise that the mean
dimension scores of the six sub-corpora in this study all fall on the negative side. It is
interesting to note the spread of the registers along this dimension. On average, the
journal articles are the most informational, followed by university textbooks and popular
academic writing. The discipline differences are not nearly as clear-cut. Biology journal
articles are more informational than those in history, but the opposite pattern occurs for
the disciplines of university textbooks and popular academic writing. This suggests a
publication type x discipline interaction effect.

5.2.2. Linguistic features

One of the most important steps in MD analysis is identifying a comprehensive set of


relevant linguistic features and calculating a normed rate of occurrence for each feature in
each text. Seventy-eight linguistic features were originally considered for inclusion in this
MD analysis. Most of this list of features was composed of features that were identified
as important variables in previous research, especially MD analyses, on academic
writing. I included additional features that I expected to be important variables based on
my own experience with academic prose. All but six of the features in this list were
annotated in the texts using the Biber Tagger and counted using Biber’s TagCount
program. The remaining six features, pre-modifying nouns, Core Vocabulary List (1-
500), Core Vocabulary List (501-3,000), Academic Vocabulary List, common phrasal
verbs, and common academic lexical bundles, were identified and counted through
alternative means.
Pre-modifying nouns were counted in the tagged texts using a Perl program in order
to determine the number of times a noun modified another noun. Davies and Gardner’s
online tool, Wordandphrase.info was used to calculate the percentage of each text that
was composed of words from the (a) Academic Vocabulary List (500 most common
academic words), and (b) Core Vocabulary List (1-500 and 501-3000 most common core
words). The rate of occurrence for common phrasal verbs in each text was counted using
an AntConc batch search of the phrasal verbs that appeared in the lists of frequent phrasal
verbs reported in Gardner and Davies (2007) and Biber et al.’s (1999) Longman
Grammar of Spoken and Written English. The same method was used to calculate the
rate of occurrence for academic lexical bundles using a combined list of Simpson-Vlach
and Ellis’s (2010) Academic Formulas List and Biber, Conrad, and Cortes’s (2004) list of
frequent bundles in university textbooks.
Two stages of analysis were used to determine which features would be most
important to include in this study. In the first stage the descriptive statistics for each
linguistic feature (minimums, maximums, and means) were reviewed. This resulted in the
elimination of sixteen features because their frequencies were too low to be included in
the factor analysis. In the second stage, I ran a series of preliminary factor analyses in
order to determine the set of linguistic variables that would result in the optimal factor

54
solution. Based on these analyses, I eliminated six more linguistic variables that were not
correlating at least moderately with other linguistic variables or with the sets of variables
in a particular factor. This resulted in a set of 56 features that are: (a) theoretically and
empirically interesting, (b) frequent enough in the corpus to be analyzed, and (c) related
to other features in the set. These 56 linguistic features served as the basis of the factor
analysis described in the next section. Each of these features, along with descriptions or
examples, can be seen in Table 5.1.

Table 5.1. Summary of the 56 linguistic features included in the final factor analysis

Linguistic Feature Description/Example

A. Nouns and Pronouns


1. 1st person pronouns e.g., I, we
2. 3rd person pronouns e.g., she, they
3. pronoun ‘it’ all instances of ‘it’
4. demonstrative pronouns this, these, that, those
5. all nouns all words tagged as nouns
6. nominalizations e.g., explanation, investigation
7. animate nounsa e.g., professional, researcher, student
8. process nounsa e.g., achievement, effect, selection
9. cognition nounsa e.g., analysis, decision, experience
10. other abstract nounsa e.g., agreement, choice, style
11. concrete nounsa e.g., book, retina, stem
12. technical nounsa e.g., chromosome, thesis, word

B. Verbs
13. possibility, permission and ability modals can, could, may, might
14. prediction modals will, would, shall, be going to
15. verb BE all forms of verb BE
16. verb HAVE all forms of verb HAVE
17. activity verbsa e.g., follow, make, obtain
18. communication verbsa e.g., acknowledge, explain, write
19. mental verbsa e.g., determine, think, understand
20. aspectual verbsa e.g., begin, cease, keep
21. suasive verbsa e.g., agree, insist, recommend
22. phrasal verbs (frequent) a e.g., end up, point out, set up

C. The Verb Phrase


23. past tense e.g., determined, showed, identified
24. present tense e.g., concludes, argues, treats
25. progressive aspect e.g., is becoming, are causing
26. agentless passive voice passives with no specified agent
27. infinitives all instances of to + (adverb) + verb

55
D. Adjectives
28. all attributive adjectives adjectives that pre-modify nouns
29. all predicative adjectives adjectives in post-predicate position
30. topic adjectives (attributive) e.g., political, economic, physical

E. Adverbs
31. general adverbs all words tagged as adverbs
32. time adverbs e.g., early, again, now
33. certainty adverbs e.g., undoubtedly, obviously
34. amplifiers e.g., absolutely, extrememly
35. emphatics e.g., a lot, for sure, really

F. Coordination and Subordination


36. adverbial conjuncts e.g., however, therefore, moreover
37. phrasal coordinating conjunctions e.g., but
38. subordinating conjunctions e.g., as, except

G. Clauses Marking Stance


39. that-clause controlled by non-factive e.g., argue, claim, show, tell
(communication) verb
40. that-clause controlled by factive e.g., demonstrate, conclude
(certainty) verb
41. that-clause controlled by likelihood verb e.g., appear, seem, suggest
42. that-clause controlled by attitudinal e.g., afraid, aware, surprised
adjective
43. that-clause controlled by stance noun e.g., claim, possibility, assumption
44. to-clause controlled by stance adjective e.g., certain, appropriate, easy

H. Nominal Modifiers
45. that relative clause relative clauses with that as pronoun
46. wh-relative clauses on subject position e.g., the researcher who stated…
47. wh-relative clauses on object position e.g., the source which he cited…
48. wh-relative clauses pied piping e.g., the source from which he cited…
49. pre-modifying nouns e.g., stem cell, war fund, crop yield

I. Lexical Features
50. COCA Core Vocabulary (1-500) 500 most frequent words
51. COCA Core Vocabulary (501-3,000) 501-3,000 most frequent words
52. COCA Academic Vocabulary List 3,500 frequent academic words
53. Academic lexical bundlesa frequent academic lexical bundles
54. word length average number of letters per word

J. Other
55. All prepositions any word tagged as a preposition
56. wh-clauses all clauses with a wh-complementizer
a
See Appendix B for the complete list used to calculate the rates for this feature

56
5.2.3. Factor analysis

After identifying and counting the linguistic features described above, factor analysis
was performed on the normed rates of occurrence for the full set of 56 features across the
texts in an effort to reduce them down to a smaller set of interpretable underlying
dimensions. The statistical software R was used to perform the factor analysis procedure
(R Development). This was done using the R function ‘fa’ (factor analysis) within the
‘psych’ library, with a principal axis factoring method with a Promax rotation (Revelle,
2012). The scree plot of eigenvalues showed a clear break between factors 6 and 7; thus,
a six factor solution was used (see Appendix C). The full factorial structure of the six
factor solution is displayed in Appendix D. However, it was determined that only the first
five dimensions were interpretable. The cumulative percentage of shared variance
accounted for by the first five factors was 31%. Variables were only included in the
analysis if they achieved a minimum factor loading threshold of +/- .30, and each variable
was included in the factor where it loaded the strongest. After determining which
variables belonged to each of the factors, the positive features were separated from those
that loaded negatively.

5.2.4. Dimension scores

The next step was to calculate dimension scores for each text in the corpus by first
standardizing the rates of occurrence for each linguistic feature to a mean of 0 and a
standard deviation of 1 using the z-score formula: . This was done in order to
ensure that all features have an equal influence on a text’s dimension score. The
standardized counts for the negatively loading features were then summed and subtracted
from the sum of the counts for the positively loading features for each dimension. This
resulted in five dimension scores for each text. These dimension scores were then used to
calculate mean dimension scores for each of the disciplines and registers in the corpus.

5.3. Dimensions of register variation in academic writing

Each of the five dimensions was interpreted using an iterative three stage process that
included: a) studying the co-occurrence patterns of linguistic features, b) investigating
these patterns in the texts, and c) reviewing the results of previous MD analyses. Table
5.2 displays the positive and negative features associated with each dimension. For each
dimension, the associated linguistic features are grouped according to their functional
interpretation.

57
Table 5.2. Final factor structure of the five-factor solution

Dimension 1: Non-technical Synthesis versus Specialized Information Density


Positive features:
Non-technical: COCA Core Vocabulary (1-500) (.61); general adverbs (.59);
amplifiers (.43); certainty adverbs (.37); emphatics (.36)
Synthesis: adverbial conjuncts (.51); phrasal coordinating conjunctions (.39)
Other: verb HAVE (.36); that-relative clauses (.36)

Negative features:
Specialized: technical concrete nouns (-.31)
Information Density: pre-nominal modifiers (-.73); nouns (-.73);
agentless passive voice (-0.42)

Dimension 2: Definition and Evaluation of New Concepts


Positive features:
Definition: present tense (.73); verb BE (.59); predicative adjectives (.53); concrete
nouns (.30)
Evaluation: non-finite to-clauses controlled by stance adjectives (.68); possibility,
permission and ability modals (.67); that-clauses controlled by attitudinal adjectives (.57);
prediction modals (.38)
Concepts: demonstrative pronouns (.40); pronoun ‘it’ (.34)
Other: Academic lexical bundles (.58)

Negative features:
NONE

Dimension 3: Author-centered Stance


Positive features:
Author-centered: communication verbs (.58), mental verbs (.58), suasive verbs (.42);
human nouns (.41); cognition nouns (.39); 1st person pronouns (.36)
Stance: that-clauses controlled by communication verb (.53); that-clause controlled by
likelihood verb (.46); that-clause controlled by stance noun (.38); that-clause controlled
by certainty verb (.30)
Other: infinitives (.53)

Negative features:
NONE

58
Dimension 4: Colloquial Narrative
Positive features:
Colloquial: common phrasal verbs (.47)
Narrative: past tense verbs (.59); 3rd person pronouns (.39); aspectual verbs (.44)
Other: activity verbs (.54), progressive aspect (.37)

Negative features:
(Non-)colloquial: Academic Vocabulary List (-.64)

Dimension 5: Abstract Observation and Description


Positive features:
Abstract Observation: nominalizations (.71); word length (0.65); process nouns (.59);
other abstract nouns (.47)
Description: attributive adjectives (.53); topic adjectives (.47)
Other: Core Vocabulary (501-3000) (.42)

Negative features:
Other: time adverbials (-.32)

5.3.1. Dimension 1: ‘Non-technical Synthesis versus Specialized Information Density’

In order to interpret Dimension 1, it may be most useful to begin by looking at the


features that loaded negatively. Three of the features on this dimension are nominal in
nature, including all nouns, pre-modifying nouns (e.g., activation sequence), and
technical concrete nouns (e.g., chromosome, diagram). Texts with these features share at
least two functional characteristics: specialization of information and density of
information. Nominal structures, in general, transmit information to the reader, but the
use of technical concrete nouns suggests that the information is specialized or technical,
and the use of nouns as nominal pre-modifiers adds additional layers of informational
density. Further evidence of the informational focus of the negative side of this
dimension is the presence of agentless passives. Passives are common in writing,
especially writing that presents abstract, decontextualized information. Agentless
passives are especially common in academic writing, a register that commonly
deemphasizes the agent, usually the researcher or writer, in order to focus on what was
done rather than who did it.
The positive end of this dimension, in contrast, contains features such as general
adverbials, certainty adverbials, emphatics (e.g., very, highly), and amplifiers (e.g., a lot,
just, really) which have interactive and affective functions. Emphatics and amplifiers are
both used as adverbial intensifiers to add emotive emphasis to a message. In addition, a
high percentage of frequent vocabulary suggests non-technical discourse, and phrasal
coordination and adverbial conjuncts (e.g., alternatively, consequently) function to
connect ideas in a coherent synthesis.
These features, contrasted with the negative features suggest a continuum that is
based on two related functional considerations. The first functional consideration is the
amount of technical language that is used to transmit the information in the text. The

59
negative end of this dimension is highly technical, and the positive end is non-technical.
The second functional consideration is the number and broadness of the sources of
information used in the writing. Texts on the negative end tend to present dense
information from a small number of narrow sources (e.g., results of a scientific
experiment), while texts with high positive scores contain a synthesis of information
taken from many broad sources (e.g., historical overview of major advances in a
scientific field). Taken together, the positive and negative features on Dimension 1
support a label of ‘Non-technical Synthesis vs. Specialized Information Density’.

5.3.2. Dimension 2: ‘Definition and Evaluation of New Concepts’

Dimension 2 highlights a facet of linguistic variation that is quite different from


Dimension 1. This dimension has no negative features. The co-occurrence of present
tense verbs, BE verbs, concrete nouns, and predicative adjectives is common in
definitions and explanations of terms and concepts (e.g., sodium is a reactive mineral).
The features of possibility and prediction modals and the two clause types controlled by
stance adjectives show added evaluation from the author regarding prediction and
possibility. Finally, the features of ‘it’ pronouns and demonstrative pronouns are both
frequently used as anaphoric or exophoric referents to abstract concepts. Taken together,
these features support a Dimension 2 label of ‘Definition and Evaluation of New
Concepts’.

5.3.3. Dimension 3: ‘Author-centered Stance’

Like Dimension 2, the third dimension also has no negative features. The presence of
communication verbs (e.g., say, assert), mental verbs (e.g., think, reveal), suasive verbs
(e.g., agree, urge), and cognitive nouns (e.g., consideration, idea) shows that the
language in texts with high positive scores are likely to be focused on discussions of
ideas and possibly debates. The co-occurrence of several types of that-clauses controlled
by various stance features shows that these texts are also likely to contain a large degree
of author stance. This is further supported by the presence of first person pronouns. In
sum, the co-occurrence of these features shows a pattern in which texts with high positive
scores are more likely to contain writing in which the authors use rhetorical or otherwise
subjective language in order to establish their position. On the other hand, low scores on
this dimension are more likely to be objective and devoid of the authors’ views and
interpretations. These features support a Dimension 3 label of ‘Author-centered Stance’.

5.3.4. Dimension 4: ‘Colloquial Narrative’

The positive features on Dimension 4 include past tense verbs, third person pronouns,
and aspectual verbs (e.g., begin, finish). These are strong indicators of narrative prose.
Phrasal verbs (e.g., look up, take over), which also load positively, are relatively common
in informal spoken registers. Academic words, which are less common in colloquial
registers, load negatively on this dimension. Together, these features suggest a narrative
writing style that is relatively easy to process. Based on the co-occurrence of these
features, Dimension 4 was labeled ‘Colloquial Narrative’.

60
5.3.5. Dimension 5: ‘Abstract Observation and Description’

Positive-loading features for the final dimension include nominalizations and


moderate frequency vocabulary, suggesting the use of relatively common words with
dense informational packaging. The addition of abstract nouns (e.g., arrangement,
transition) and topic adjectives (e.g., natural, physical) show a high degree of
abstraction. These features are contrasted with time adverbials (e.g., now, tomorrow),
which are concrete and descriptive in nature. The positive features seem to function as
linguistic tools used by authors to characterize and interpret their findings. In addition,
the positive features seem to be associated with abstract interpretation rather than the
explanation of concrete evidence. Therefore, the label of ‘Abstract Observation and
Description’ is used here.

5.4. Linguistic variation in published academic writing

In this section I present a series of quantitative and qualitative analyses of


linguistic variation in published academic writing. For each of the five linguistic
dimensions I quantitatively describe the dimension scores for each of the registers using
two different perspectives or models. In the first model, which I will label the Register
Model, I use vertical profile plots to graphically display and describe the distribution of
each of the six individual registers, which are defined situationally using the framework
established in Chapter 4, according to their dimension scores. In the Register Model I
also calculate one-way ANOVAs and Tukey HSD post hoc tests for each of the
dimensions, using register as the independent variable. The results for the ANOVAs can
be seen in the caption of the profile plot for each dimension. The results of the Tukey
HSD are reported to the left of the register labels within the profile plots in the form of
groupings using letters (e.g., A, AB, etc.). Within the Tukey HSD groupings, pairwise
differences are statistical in cases where two registers do not have the same letter next to
them.
In the second model, or Publication Type x Discipline Model, I use marginal
means plots to graphically display the six groups as levels within the factors of discipline
(biology and history) and publication type (journal article, university textbook, and
popular academic book). In this study, publication type is a combination of many
situational characteristics, including audience, purpose, community, and topic (see Table
4.1). The main reason for investigating the effects of discipline separately from
publication type is that these two variables have emerged as critically influential sources
of variability in register studies. By teasing out the differential effects of discipline and
publication type, we are able to more completely account for variability across the
registers of academic prose. In the Publication Type x Discipline Model I use 2x3
factorial ANOVAs and appropriate post hoc tests in order to perform statistical tests of
significance on the effects of publication type, discipline, and the interaction between
them. After presenting the quantitative results for the two models for each dimension, I
will interpret the linguistic patterns through a qualitative analysis of several text excerpts.
In keeping with this study’s theme of triangulating methodological approaches,
there are two goals of this dual approach to presenting the quantitative results. The first
goal is to present the results from multiple perspectives in order to learn as much as

61
possible about the quantitative results. The second goal is to better understand the
relationship between discipline and publication type within published academic writing.
In other words, the second goal is to learn whether (a) publication type and discipline
function as two independent predictors of language use in academic writing or (b)
publication type and discipline combine to form more narrow register categories.
The 2x3 factorial ANOVAs are performed in order to test for significant
interaction effects between publication type and discipline. An interaction term was
included in the model to test whether the effect of discipline is determined by publication
(and vice versa) or whether the effects of discipline and publication type operate
independently of one another. In each case, the significance of the interaction term was
tested and reported first because the outcome of this test determines whether it is most
appropriate to perform tests for significant main effects or simple effects. In cases where
no statistical interaction was found, the main effects of the two factors were investigated
and interpreted.
According to Howell (2007:400-401), when a statistical interaction is discovered,
main effects should be interpreted only when they can still be meaningfully interpreted.
If, however, the effect of the interaction renders the main effects uninterpretable, simple
effects ANOVAs should be used to test for the effect of one factor within one level of the
other factor (Kuehl, 2000, p. 179). In cases where a statistical interaction was discovered,
the nature and degree of the interaction was investigated to determine whether the main
effects were still interpretable and meaningful. In cases where (a) the interaction effect
was significant and (b) the main effect could not be meaningfully interpreted, simple
effects were investigated in order to measure the effect of one factor (e.g., discipline) at
one level of the other factor (e.g., journal articles). In some of the cases where a
significant simple effect or main effect was found, post hoc Tukey HSD pairwise mean
comparisons were performed to determine which pairs differed significantly.
The factorial ANOVAs were performed using an a priori alpha criterion of α = .05 to
test for differences between disciplines and publication types. In cases where a
significant interaction effect was discovered, simple effect contrasts were carried out to
test for significant differences between levels of one factor on one level of the other. In
cases where it was determined that the factors do not interact, tests for the significance of
the main effects of each factor were carried out.
Before running the ANOVAs the normality of the dependent variables (linguistic
dimension scores) was assessed for each group. Using an a priori z-score cutoff of +/-
3.29, the standardized dimension scores for each of the observations were checked for
each dimension. Dimension 2 had two outliers (z = 3.92; 4.37) and Dimension 3 had one
outlier (z = 3.51). Despite these outliers, all of the groups met the criteria of the Shapiro-
Wilk normality statistic (p < .05). Additionally, all of the groups met the normality
criteria of the Q-Q normality plots, skewness and kurtosis standardized scores. As such, I
proceeded with the analyses without modifying the data since the preponderance of the
evidence supported the normality of the distribution. Dimension scores for each
individual text in the corpus are contained in Appendix A. The complete results for the
factorial ANOVAs and corresponding main or simple effects results can be found in
Appendix E.

62
5.4.1. Dimension 1: ‘Non-technical Synthesis versus Specialized Information Density’

Figure 5.2 displays the spread of mean Dimension 1 scores for each of the
registers. The only negatively loading register was biology journal articles, which
received a large negative mean score of -11.52. University textbooks in biology were
approximately neutral, and popular academic books in biology received the highest
positive mean score. Interestingly, although there was much less variability across the
three history publication types, they spread out in the same order as the biology texts,
with popular academic writing receiving the highest positive score followed by textbooks
and journal articles. Clearly this dimension is very useful as a measure of intra-
disciplinary variation across publication types. Within both of the disciplines the
publication types follow a pattern that is not at all surprising. We would expect popular
academic writing to be the least technical and to contain the most synthesis of broad
sources. On the other hand, we would expect journal article writing to contain technical
language about a small number of specialized sources.
There was a significant interaction effect between publication type and discipline
for Dimension 1, F(2, 144) = 23.46, p < .001. The R2 was .514, indicating that the
interaction effect between publication type and discipline accounts for approximately
51.4% of the variability in Dimension 1 scores. Figure 5.3 below displays the marginal
means for the Dimension 1 scores of each publication type within the two disciplines.
The plot shows a clear difference in the trends for the publication types within biology
versus history. Therefore, I will investigate the simple effects rather than the main
effects. Whereas biology popular academic writing contains more features associated
with non-technical synthesis than history popular academic books, biology journal
articles contain much more specialized information density than those in history. Simple
effects ANOVAs for the discipline pairs within each publication type showed no
significant differences between popular academic writing and textbook writing.
However, the Dimension 1 scores for biology journal articles were significantly lower
than those for history, F(1, 144) = 64.23, p < .001. Furthermore, while there were no
simple effects differences among the history publication types, statistically significant
differences were found between the biology publication types, F(2, 144) = 64.17, p <
.001.

63
Non-technical Synthesis Positive features:
10 Verbs: verb HAVE (.36)
Adverbs: general adverbs (.59), amplifiers (.43), certainty
adverbs (.37), emphatics (.36)
Lexical: COCA Core Vocabulary (1-500) (.61)
A – Popular Academic Books—Biology (M = 5.21, SD = 4.95) Coordination: adverbial conjuncts (.51), phrasal coordinating
5 conjunctions (.39)
Nominal Modifiers: that relative clauses (.36)
AB – Popular Academic Books—History (M = 3.10, SD = 4.95)
AB – University Textbooks—History (M = 2.15, SD = 5.64) Negative features:
Nouns: pre-nominal modifiers (-.73); nouns (-.73), technical
B – Journal Articles—History (M = .62, SD = 5.34) concrete nouns (-.31)
0 B – University Textbooks—Biology (M = .17, SD = 5.44) Verbs: agentless passive voice (-0.42)

-5

-10

C – Journal Articles—Biology (M = -11.52, SD = 5.34)

-15
Specialized Information Density

Figure 5.2. Registers along Dimension 1: ‘Non-technical synthesis versus specialized information density’
F(5, 140) = 30.40, p < .001, R2 = .51

64
Figure 5.3. Marginal Means Plot for Dimension 1: ‘Non-technical Synthesis vs.
Specialized Information Density’

8
6
4
2
Dimension 1 Scores

0
-2
Biology
-4
History
-6
-8
-10
-12
-14
Popular Academic Textbooks Journal Articles

The linguistic patterns of Dimension 1 are qualitatively illustrated below through a


series of short excerpts from texts that represent those patterns. In order to highlight the
linguistic differences between journal articles and popular academic books in biology, the
two most common negative and positive features in both excerpts are highlighted. For the
positive features, COCA Core Vocabulary is bolded and general adverbs are italicized.
For the negative features, nouns are in SMALL CAPS and pre-modifying nouns are
underlined.

5.1 BONOBOS are not on their WAY to becoming HUMAN any more than we
are on our WAY to becoming like them. Both of us are well-established,
highly evolved SPECIES. We can learn something about ourselves from
watching BONOBOS, though, because our two SPECIES share an
ANCESTOR, who is believed to have lived a "mere" six million YEARS or
so ago. [PA_BI_08, D1 score: 15.22]

Excerpt 5.1, taken from a popular academic book on biology, received a Dimension 1
score of 15.22. Eighty percent of the words in this 60-word passage are on the COCA
Core Vocabulary list of the 500 most frequent words in English. There are also three
general adverbs (highly, though, ago). In comparison, only 34% of the words in excerpt
5.2, taken from a biology journal article, come from the COCA Core Vocabulary list, and
excerpt 5.2 contains no general adverbs.

5.2 An upstream ACTIVATION SEQUENCE (UAS)–containing P ELEMENT,


GS1664, inserted upstream of the FOXO GENE, disrupts the embryonic
axonal SCAFFOLD when crossed to the panneuronal DRIVER ELAVGAL4.

65
The PHENOTYPE is attributable to elevated FOXO LEVELS, as ELAVGAL4-
driven OVEREXPRESSION of a UAS-FOXOWT TRANSGENE yields an
equivalent PHENOTYPE. FOXO is enriched in a SUBSET of MOTOR NEURON
NUCLEI. [JA_BI_24, D1 score: -21.67]

A comparison of two of the negatively loading features reveals that nouns make up
only 9% of excerpt 5.1 in comparison with 43% in excerpt 5.2. Additionally, excerpt 5.2
contains nine pre-modifying nouns, whereas excerpt 5.1 contains none. These stark
contrasts within a single academic discipline reveal the importance of this dimension in
making publication type distinctions. However, this is much more apparent in biology
than in history.
In addition to intra-disciplinary patterns, this dimension also reveals important
patterns between the two disciplines. For example, the range of mean dimension scores is
much greater within biology than history. While all of the reasons for this pattern are not
entirely clear, it certainly seems that the biology writers in this corpus are more
concerned with or skilled at adapting their writing to their target audience.
With regard to the large Dimension 1 score difference between journal articles in
biology and history, it could be argued that the discipline of biology deals with more
technical concepts and specialized topics than history.

5.3 Aiming at extending WOMEN'S RIGHTS to the PRIVATE SPHERE, WOMEN'S


RIGHTS ACTIVISTS called ATTENTION to VIOLENCE against WOMEN and to
WOMEN'S reproductive RIGHTS. They symbolically erected a GLOBAL
TRIBUNAL on the VIOLATION of WOMEN'S HUMAN RIGHTS on the
OCCASION of the VIENNA WORLD CONFERENCE on HUMAN RIGHTS
(1993). One YEAR later, at the CAIRO CONFERENCE on POPULATION and
DEVELOPMENT, they explicitly demanded the RECOGNITION of
reproductive RIGHTS as HUMAN RIGHTS—and obtained it. [JA_HI_20, D1
score: -9.03]

An investigation of excerpt 5.3, taken from the history journal article text with the largest
negative Dimension 1 score, offers some support for this interpretation. This passage
contains a relatively large number of nouns (33) and pre-modifying nouns (9). However,
it is worth noting that most of the pre-modifying nouns in excerpt 5.3 are inherently
different from those used in 5.2. The pre-modifying nouns in 5.2 and 5.3 are examples of
compression within the noun phrase used to condense descriptive information into fewer
words. However, many of the pre-modifiers in 5.3 are titles that are used to label events
(Vienna World Conference) or movements (Women’s Human Rights). These proper
noun labels seem to be relatively uncommon in biology journal articles.
Excerpt 5.3 also contains a high percentage of words from the COCA Core
Vocabulary list. An investigation of the percentage of COCA Core Vocabulary in journal
articles from the two disciplines reveals that on average history journal articles contain
about 12% more frequent vocabulary than journal articles in biology. In addition, on
average, biology journal articles contain five times more technical concrete nouns than
history journal articles. While these are by no means the only important univariate
differences between these two disciplines of journal articles, they offer evidence that the

66
language in biology journal articles is indeed more technical and specialized than the
language found in history journal articles. This is not to say that research in history is any
less complex or specialized in nature. Rather, this suggests that the language used by
history research is more likely to be familiar to a non-specialist audience.

5.4.2. Dimension 2: ‘Definition and Evaluation of New Concepts’

Figure 5.4 displays the mean Dimension 2 scores for each of the registers in the
corpus. Within the discipline of history, the mean scores follow a trend that is very
similar to the history scores on Dimension 1, where popular academic books have the
highest score, followed closely by textbooks and journal articles. This shows,
unsurprisingly, that history popular academic books and textbooks both contain more
language associated with defining and evaluating new concepts than history journal
articles. Similar to Dimension 1, there is a much wider range of variation within the
discipline of biology. However, the trend within this dimension is quite different. It is not
surprising to find that biology journal articles have the least amount of language
associated with defining and evaluating new concepts.
At first glance it was surprising to discover that biology textbooks received the
highest Dimension 2 score by such a large margin. However, this pattern makes more
sense once we consider the fact that history writing deals more with events, peoples,
records, and artifacts than it does with concepts. Additionally, we would expect
pedagogical writing within biology to use more language associated with defining
concepts than popular writing because one of the major goals of pedagogical writing is to
transmit information regarding new concepts in a way that students can understand and
retain.
The publication type x discipline interaction effect was also significant for
Dimension 2, F(2, 144) = 10.53, p < .001, R2 = .286. Because there is no evidence for a
clear pattern across publication types in the two disciplines, the simple effects will be
investigated rather than the main effects. The most striking difference that can be seen in
Figure 5.5 is between biology and history textbooks. The simple effects ANOVA results
showed this difference to be significant, F(1, 144) = 27.12, p < .001, whereas there were
no statistically significant discipline differences for the other two publication types. The
extremely high Dimension 2 scores for biology textbooks also contributed to significant
differences across the three publication types of biology, F(2, 144) = 23.76, p < .001.

67
Definition and Evaluation of New Concepts Positive features:
8 Nouns and Pronouns: demonstrative pronouns (.40), concrete
(.30), pronoun ‘it’ (.34)
A – University Textbooks—Biology (M = 6.59, SD = 8.91)
Verbs: possibility, permission and ability modals (.67), verb BE
6 (.59), prediction modals (.38)
The Verb Phrase: present tense (.73)
Adjectives: predicative adjectives (.53)
Clauses Marking Stance: non-finite to-clauses controlled by
4
stance adjectives (.68), that-clauses controlled by attitudinal
adjectives (.57)
Lexical Features: Academic lexical bundles (.58)
2
B – Popular Academic Books—Biology (M = 1.37, SD = 3.93) Negative features:
NONE
0
BC – Popular Academic Books—History (M = -.70, SD = 4.84)
BC – University Textbooks—History (M = -1.40, SD = 4.90)

-2 BC – Journal Articles—History (M = -2.08, SD = 2.88)

-4 C – Journal Articles—Biology (M = -3.99, SD = 5.11)

-6

Figure 5.4. Registers along Dimension 2: ‘Definition and evaluation of new concepts’
F(5, 140) = 11.55, p < .001, R2 = .29

68
Figure 5.5. Marginal Means Plot for Dimension 2: ‘Definition and Evaluation of
New Concepts’

6
Dimension 2 Scores

2
Biology
0
History
-2

-4

-6
Popular Academic Textbooks Journal Articles

Text excerpts 5.4 and 5.5, both taken from university textbooks in biology, will be
used to demonstrate the co-occurrence patterns of the positive features of this dimension.
In excerpt 5.4 and 5.5 I have highlighted modals of possibility, permission and ability and
modals of prediction in bold. BE verbs are in SMALL CAPS, predicative adjectives are
underlined, demonstrative and ‘it’ pronouns are double underlined, and present tense verb
phrases are italicized. Excerpt 5.4 contains examples of the positive features on
Dimension 2 to define and evaluate new concepts. The pronouns are used as referents for
the concepts being defined and evaluated.

5.4 Besides a cellular response to infection, we ARE also protected by our


complement system. This IS a collection of proteins that act together to
produce a cascade response. Even a weak signal can BE amplified in this
way to elicit a strong response. The complement system has two major
effects. It can act directly on invading microbes or it can act in
association with antibody to cause cell lysis. It does so by puncturing
holes in the microbial cell membrane. [TB_BI_15, D2 score: 16.32]

Excerpt 5.5 further demonstrates the evaluation of new concepts. Each of the four
sentences in this excerpt contains a modal and a present tense verb. There are also three
BE verbs and two predicative adjectives. This excerpt is a clear example of university
textbook language associated with the author’s evaluation of a new concept.

5.5 An example might BE defining how we determined the end of a larval


stage. This would BE important in taxonomic groups such as fish, which
do not end larval life with the dramatic pupation found in insects. The

69
statement of measurement operations might BE simple, referring only to
standard units such as kilograms, meters, and seconds. The statement
might include complex procedures, such as those of Winberg (1971) for
calculating the production rate of a population. [TB_BI_20, D2 score:
16.73]

The group with the lowest mean Dimension 2 score was biology journal articles. The
texts in this group received negative scores on this dimension in almost every case.
Excerpt 5.7 below demonstrates the type of language common to biology journal articles.
With the exception of a small number of BE verbs, the language of definition and
evaluation is completely absent from this text.

5.6 Cells WERE lysed with NETN buffer (0.5% NP-40, 150 mM NaCl, 50 mM
Tris, and 1 mM EDTA) at 4°C. Cell debris WAS removed by
centrifugation, and the supernatant WAS incubated with 5 µg of the
appropriate antibody and protein A beads at 4°C for 4 h. For the S protein
IP, cell lysate WAS incubated with S protein Agarose (EMD) at 4°C for 2
h. The pellet WAS washed with NETN buffer three times, eluted in
Laemmli sample buffer, and analyzed by Western blot as described
previously.

5.4.3. Dimension 3: ‘Author-centered Stance’

Along Dimension 3, within the discipline of biology, the publication types follow a
clear trend in which popular academic books received a high positive score, university
textbooks received a slightly negative score, and journal articles received the lowest
negative score. This shows, unsurprisingly, that journal articles in biology are the least
author-centered of all of the registers.
The Dimension 3 data presented in Figure 5.7 suggest a change in direction in the
trends for the biology and history publication types, resulting from the higher amount of
Author-centered Stance in history journal articles relative to those from biology. The
ANOVA results confirmed that there is indeed a significant, yet marginal interaction
effect F(2, 144) = 3.15, p = .046, R2 = .173. The simple effects ANOVAs revealed no
significant differences between biology and history in textbooks or popular academic
writing. However, a significant difference was discovered between the two disciplines
within journal articles, F(1, 144) = 11.24, p = .001. Simple effect differences were also
found among the three publication types of biology, F(2, 144) = 9.47, p < .001, but not
between the history publication types. These results show that biology writing contains
incrementally less Author-centered Stance as the expertise and specialization of the target
audience increases across the three publication types. In history this is true between
popular academic writing and textbooks, but history journal articles actually contain more
Author-centered Stance than university textbooks.

70
Author-centered Stance Positive features:
4 Nouns and Pronouns: human (.41), cognition (.39), 1st person
pronouns (.36)
A – Popular Academic Books—History (M = 2.61, SD = 5.82)
Verbs: communication verbs (.58), mental verbs (.58), suasive
A – Popular Academic Books—Biology (M = 2.57, SD = 5.81) verbs (.42)
The Verb Phrase: infinitives (.53)
2 Clauses Marking Stance: that-clause controlled by non-factive
verb (.53), that-clause controlled by likelihood verb (.46), that-
AB – Journal Articles—History (M = 1.26, SD = 5.36)
clause controlled by stance noun (.38), that-clause controlled by
factive verb (.30)

Negative features:
0 NONE

ABC – University Textbooks—History (M = -.96, SD = 4.62)

BC – University Textbooks—Biology (M = -1.89, SD = 5.27)


-2

C – Journal Articles—Biology (M = -3.74, SD = 4.62)

-4

-6

Figure 5.6. Registers along Dimension 3: ‘Author-centered stance’


F(5, 140) = 6.02, p < .001, R2 = .17

71
Figure 5.7. Marginal Means Plot for Dimension 3: ‘Author-centered Stance’

1
Dimension 3 Scores

-1

-2
Biology
History
-3

-4

-5
Popular Academic Textbooks Journal Articles

Two text excerpts will be presented to demonstrate some of the key differences
between journal articles (5.7) and popular academic books (5.8) on this dimension. In
these two excerpts 1st person pronouns are bolded, clauses marking stance have a dotted
underline, and communication verbs and cognitive nouns are in SMALL CAPS. Excerpt 5.7
contains none of the features associated with positive Dimension 3 scores. This text
contains a detailed, yet dense description of a species of animal, but despite being nearly
100 words long evidence of author-centered stance is almost entirely absent.

5.7 Thoracic segments ferruginous dorsally, yellowish ventrally. Legs brown,


tibiae without spines, tibial spurs 0:2:4, asymmetrical and well-developed;
tarsi with spines randomly distributed on posterior surface, first tarsomere
with length equal to the sum of the others, arolium cordate. Fore wing
triangular with outer margin straight. Dorsal surface ferruginous brown;
oblique, postmedian band from costal margin to inner margin, two dorsal
white spots on band: one between R3 and R4, the other between R4 and
R5, the latter not visible in males; whitish spot across the distal end discal
cell and between M1 and M2, more evident in females. [JA_BI_07, D3
score: -9.97]

Excerpt 5.8, on the other hand, contains explicit evidence of author-centered stance.
This short 68-word passage contains five 1st person pronouns, four clauses marking
stance, and three communication verbs. This text contains a large amount of stance
features that are focused on the author’s position about the subject matter.

5.8 This book PROPOSES that our minds evolved not just as survival machines,
but as courtship machines. Every one of our ancestors managed not just to

72
live for a while, but to CONVINCE at least one sexual partner to have enough
sex to produce offspring. […] Following this insight, I shall ARGUE that the
most distinctive aspects of our minds evolved largely through the sexual
choices our ancestors made. [PA_BI_24, D3 score: 18.99]

It comes as no surprise that within each register, history writing contains more author-
centered stance than its biology counterpart. However, it was interesting to find that the
history publication types did not follow the same within-discipline trend as the biology
publication types. History textbooks were the only publication type within history to
receive a negative score on this dimension, and the dimension scores for journal articles
and popular academic books in history were more similar to each other than to the history
textbooks. In excerpt 5.9, taken from a history journal article, contains communication
verbs (inform, call), cognitive nouns (recognition, memory), and 1st person pronouns that
are used by the author to influence the readers’ interpretation of the writing.

5.9 From the ATTENTION she paid to the preservation of her manuscripts on
her deathbed, we may gather how much these manuscripts meant to her.
[…] We also become aware of the success of Jin Yi's efforts, when a
woman poet like Wang Zhenyi INFORMS us of the effect that these
writings achieve. […] In this sense also—to return to the iconized women
as mouthpieces of male sentiments—the complexity and dynamism of
women's writing culture in late imperial China well exceed a single
paradigm of male influences or male manipulation, and CALL for our
RECOGNITION of the efforts of those women who negotiated their ways
into the cultural MEMORY of their times. [JA_HI_21, D3 score: 5.89]

One of the key differences between journal articles in biology and history is the
interpretation of the data or evidence. The data or evidence in biology journal articles is
typically interpreted objectively based on previous literature and the data itself. In
contrast, the data or evidence in history articles is most often interpreted based on author
claims (see Chapter 4). This distinction becomes apparent through the co-occurrence
patterns in Dimension 3.

5.4.4. Dimension 4: ‘Colloquial Narrative’

Figure 5.8 shows the results for Dimension 4, indicating that the biology
publication types generally contain the least features associated with colloquial narrative,
and the history publication types generally contain the most. It is also clear based on this
figure that within the two disciplines popular academic books contain more linguistic
features associated with colloquial narrative than university textbooks and journal
articles. Within the two disciplines, university textbooks also contain more colloquial
narrative than journal articles.
Unlike the first three dimensions, there is no significant interaction effect for
Dimension 4. This shows that Dimension 4 scores for publication types do not depend on
the discipline and vice versa. This is clearly shown by the nearly parallel lines in Figure
5.9. The main effects were significant for both discipline, F(1, 144) = 32.85, p < .001,

73
and register, F(2, 144) = 8.51, p < .001. This reveals that there are significantly more
features associated with Colloquial Narrative in history than in biology. In order to
determine where the publication type differences lie, a post hoc analysis was conducted
using Tukey HSD pairwise mean comparisons. These analyses showed that journal
articles contain significantly less Colloquial Narrative than popular academic writing (p <
.001) but not textbooks (p = .056).

74
Colloquial Narrative Positive features:
3 Nouns and Pronouns: 3rd person pronouns (.39)
Verbs: activity (.54), common phrasal (.47), aspectual (.44)
A – Popular Academic Books—History (M = 2.42, SD = 3.95)
A – University Textbooks—History (M = 2.41, SD = 3.78)
The Verb Phrase: past tense (.59), progressive aspect (.37)
2 Clauses Marking Stance:

Negative features:
Lexical Features: Academic Vocabulary List (-.64)
1

AB – Popular Academic Books—Biology (M = .35, SD = 3.81)


0 AB – Journal Articles—History (M = .11, SD = 2.98)

-1

-2

BC – University Textbooks—Biology (M = -2.21, SD = 4.06)

-3
C – Journal Articles—Biology (M = -3.23, SD = 2.66)

-4

Figure 5.8. Registers along Dimension 4: ‘Colloquial Narrative’


F(5, 140) = 10.62, p < .001, R2 = .27

75
Figure 5.9. Marginal Means Plot for Dimension 4: ‘Colloquial Narrative’

2
Dimension 4 Scores

-1 Biology

-2
History

-3

-4
Popular Academic Textbooks Journal Articles

Two text excerpts, one from history (5.10) and one from biology (5.11) will be used
to demonstrate the stark contrast between disciplines along this dimension. In each of the
excerpts for this dimension, I have highlighted the past tense and progressive aspect verbs
in bold, 3rd person pronouns in SMALL CAPS, and the academic words in italics.

5.10 The details of Pixodarus’ offer were based on an understandable mistake, but
Olympias and HER friends stirred up Alexander, telling HIM that this was
another sign that Philip was trying to replace HIM as royal heir. Alexander
took the bait. HE wrote Pixodarus on HIS own, without telling Philip. HE
offered himself as husband for the Carian’s daughter, saying that Arrhidaeus,
disability made HIM unsuitable. For Alexander to offer a political alliance
with a foreign power, without HIS father’s knowledge, represented an
enormous risk; Alexander must have felt HIMSELF to be living on a cliff edge.
[TB_HI_08, D4 score: 8.10]

Whereas only 5% of the history excerpt (5.10) is composed of academic vocabulary, 27%
of the words in the biology excerpt (5.11) are academic vocabulary. On the other hand,
excerpt 5.11 had only one 3rd person pronoun and no past tense or progressive aspect verbs,
and excerpt 5.10 has an abundance of both.

5.11 Salmonellae are Enterobacteriaceae that are widely distributed in the


environment and include more than 2000 serotypes. The Salmonella
numbers in wastewater range from a few to 8000 organisms/100 mL; THEY
are the most predominant pathogenic bacteria in wastewater and cause
typhoid and paratyphoid fever, and gastroenteritis. [TB_BI_06, D4 score:
-6.69]

76
It can also be seen that both disciplines follow the same trend in which popular
academic writing contains the most colloquial narrative, followed by university textbooks
and journal articles, in that order. This trend can be seen in excerpts 5.12 (popular
academic books-history) and 5.13 (journal article-history) below, which contain the same
highlighting as excerpts 5.10 and 5.11.

5.12 Mitrokhin's most anxious moment came when HE arrived at his weekend
dacha to find a stranger hiding in the attic. HE was instantly reminded of the
incident a few years earlier, in August 1971, when a friend of the writer
Aleksandr Solzhenitsyn had called unexpectedly at HIS dacha while
Solzhenitsyn was away and surprised two KGB officers in the attic who
were probably searching for subversive manuscripts. Other KGB men had
quickly arrived on the scene and Solzhenitsyn's friend had been badly
beaten. [PA_HI_04, D4 score: 8.93]

Text excerpts 5.12 and 5.13 contain exactly the same number of words, which allows
for convenient comparisons. Excerpt 5.12 contains 11 past tense and progressive aspect
verbs, three 3rd person pronouns, and 2% academic words. In comparison, excerpt 5.13
contains nine past tense and progressive aspect verbs, no 3rd person pronouns, and 28%
academic words.

5.13 The exchange of decorative designs and mutual influence often rendered
indistinguishable set roles of producer and consumer, creating a global
culture. Along these same themes, the circuitous routes on which visual
images of production traveled reinforce the view that "china" (porcelain)
effected a global culture. Specific to the pictorial themes discussed in this
article, production as a visual theme was consumed as a product in itself,
and the mode of viewing, a historically constructed visuality about
porcelain production, was what gained global purchase. [JA_HI_18, D4
score: -3.78]

The text excerpts presented above have demonstrated at least two patterns that exist
in the data for Dimension 4. The first pattern, shown in excerpts 5.10 and 5.11, is that the
history publication types generally contain more features associated with colloquial
narrative than their biology counterparts. The second pattern, seen in 5.12 and 5.13, is
that within both disciplines the publication types follow a similar trend in which popular
academic writing contains the most features of colloquial narrative, followed by
university textbooks and journal articles.

5.4.5. Dimension 5: ‘Abstract Observation and Description’

The results for the various registers along Dimension 5 can be seen in Figure 5.10. An
evaluation of the texts in this corpus also highlighted the degree of abstraction associated
with the use of the positive features on this dimension.

77
Abstract Observation and Description Positive features:
3 A – Journal Articles—History (M = 2.96, SD = 3.80) Nouns and Pronouns: nominalizations (.71), process nouns
(.59), other abstract nouns (.47)
Adjectives: attributive (.53), topic (.47)
Lexical Features: word length (0.65); COCA Core Vocabulary
(501-3000) (.42)
2
Negative features:
Adverbs: time adverbials (-.32)

1
AB – University Textbooks—Biology (M = .84, SD = 4.65)
AB – University Textbooks—History (M = .64, SD = 4.54)

-1

B – Journal Articles—Biology (M = -1.39, SD = 4.19)


B – Popular Academic Books—History (M = -1.50, SD = 4.67)
B – Popular Academic Books—Biology (M = -1.72, SD = 4.66)
-2

Figure 5.10. Registers along Dimension 5: ‘Abstract Observation and Description’


F(5, 140) = 4.33, p = .001, R2 = .13

78
Whereas history journal articles used more of these features than any other group,
biology journal articles used comparatively few. In contrast, it is interesting to note the
lack of disciplinary differences along this dimension within the publication types of
popular academic books and university textbooks.
A significant interaction effect between discipline and publication type was found
for Dimension 5, F(2, 144) = 4.012, p =.02, R2 = .131. Simple effects ANOVAs confirm
the results displayed in Figure 5.11. There are significant differences among the three
history publication types, F(2, 144) = 6.34, p = .002, but not among the publication types
in biology. The only publication type simple effects that were found were between the
two disciplines of journal articles, F(1, 144) = 12.01, p = .001, showing that history
journal articles contain significantly more Abstract Observation and Description than
biology journal articles.

Figure 5.11. Marginal Means Plots for Dimension 5: ‘Abstract Observation and
Description’

3
Dimension 5 Scores

Biology
0
History
-1

-2
Popular Academic Textbooks Journal Articles

In the following text excerpts I will highlight nominalizations in bold, attributive


adjectives in SMALL CAPS, abstract and process nouns in italics, and 501-3,000 frequency
words with an underline.

5.14 Along these SAME themes, the CIRCUITOUS routes on which VISUAL images of
production traveled reinforce the view that "china" (porcelain) effected a
GLOBAL culture. Specific to the PICTORIAL themes discussed in this article,
production as a VISUAL theme was consumed as a product in itself, and the
mode of viewing, a historically constructed visuality about porcelain
production, was what gained GLOBAL purchase. [JA_HI_18, D5 score: 9.76]

79
A comparison of excerpts 5.14 and 5.15 shows a stark contrast between the
amount of Abstract Observation and Description in history and biology journal articles.
The history excerpt contains four nominalizations, seven attributive adjectives, two
abstract or process nouns, and 32% 501-3,000 frequency words. In contrast, the biology
excerpt contains no nominalizations, one attributive adjective, no abstract or process
nouns, and 15% 501-3,000 frequency words.

5.15 MT1-MMP antibody (MAB3328) was purchased from Millipore. The


antibodies against FAK, paxillin, and p130Cas were from BD. Anti-Src
antibody (sc-18) and anti-GST (sc-138) were purchased from Santa Cruz
Biotechnology, Inc. The phospho-Src family antibody pY416 was from
Cell Signaling Technology, the MONOCLONAL vinculin antibody was from
Sigma-Aldrich, anti-Flag antibody from Cell Signaling Technology, and
anti-GFP antibody from Roche. [JA_BI_25, D5 score: -7.99]

Popular academic writing in both disciplines contained less Abstract Observation


and Description than the other registers, showing that this register uses concrete rather
than abstract language. Excerpt 5.16 exemplifies this concrete prose style.

5.16 This exposure turns them brown and converts some of their sugars first to
alcohol and then to ACETIC acid, which we know best in the form of
vinegar. The ACETIC acid kills the shoot and releases other flavour
molecules. Phenylethylamine (PEA) forms during this fermentation
stage. The beans are then roasted to remove most of the ACETIC acid, and
milled, which causes the cocoa fat to become molten. [PA_BI_12, D5
score: -8.85]

This excerpt contains two nominalizations, three attributive adjectives, zero abstract and
process nouns, and 12% 501-3,000 frequency words. A comparison between 5.15 and
5.16 shows a dramatic difference between the two. Whereas 5.15 discusses abstract
concepts such as themes, images, culture, and visuality, 5.16 explains the process of
making cocoa using concrete terms such as sugars, acid, beans, and cocoa fat. These
examples offer strong evidence for the interpretation of ‘Abstract Observation and
Description’ for this dimension, in which texts with high positive scores tend to be
verbose discussions of abstract concepts and texts with high loading tend to be concise
descriptions of concrete entities and processes.

5.5. Comparing the Register and Publication Type x Discipline Models

Two models were used to quantitatively analyze linguistic variability in the


corpus. In the Register Model I treated each of the six registers as separate levels along
one variable, register, by performing one-way ANOVAs and subsequent Tukey HSD post
hoc tests of significance. In the Publication Type x Discipline Model I treated publication
type and discipline as separate factors and performed two-way factorial ANOVAs and
appropriate post hoc tests. On the level of pairwise post hoc tests, the statistical results of
these two models show identical patterns. The key difference is that Publication Type x

80
Discipline Model makes a distinction between the situational variables of publication
type and discipline, whereas the Register Model collapses these two variables into one
situational parameter. The most important benefit of the Register Model is that it makes it
possible to measure differences between specific register categories in a manner that is
fine-grained and efficient. The main benefit of the Publication Type x Discipline Model
is that it quantifies the effects of the situational factors of publication type and discipline,
as well possible interactions between them.
The Register Model was a useful first step for the analyses presented here because
it (a) allows us to graphically display the six register groups on a single parameter, (b)
establishes that the variable of register has a significant effect on linguistic variability in
the corpus, and (c) establishes the significant pairwise differences and register groupings
within the corpus. This model offers a general overview of patterns in the data. However,
this method also suggested the need for a more complex model in order to explain some
of the statistical differences and groupings that could not be explained by simple register
differences. For example, the register plot in Figure 5.6 is difficult to interpret without
reference to discipline variation. While the popular academic books and university
textbooks group together, the two journal article registers fall on opposite ends of the
plot. While the pattern itself is easy to recognize, this particular graphic display does not
lend itself to a simultaneous analysis of publication type and discipline variation.
Additionally, the statistical procedures used in the Register Model make it impossible to
simultaneously determine the statistical effect of discipline and publication type.
The Publication Type x Discipline Model is a logical next step in order to gain a
more complete understanding of the variability due to publication type and discipline. As
mentioned before, the post hoc analyses for these two approaches ultimately reveal the
same patterns of difference between textual categories. However, the results of the
previous analyses have supported the use of both models in order to describe different
types of variability in the data. For example, the results show that the effect of discipline
differs depending on the publication type. The Publication Type x Discipline Model
revealed significant differences between journal articles in biology and history on four of
the five dimensions. In contrast, significant discipline differences emerged on only two of
the dimensions for university textbooks and on none of the dimensions for popular
academic books. This suggests that discipline, at least between biology and history, is a
more important factor in some written publication types (e.g., journal articles) than in
others (e.g., university textbooks, popular academic books).
Interestingly, although there are bigger differences between the two disciplines in
journal articles than in the other two publication types, the linguistic variation (measured
using standard deviation) within the publication type of journal articles is not consistently
larger than the other two publication types. This means that the variable of discipline
does not necessarily contribute a substantial amount of additional linguistic variability
into a publication type sample. However, discipline variation does seem to be a more
powerful explanatory variable within journal articles. This finding suggests that the
extent of discipline variation may depend on the publication type. Further investigation of
this pattern in a wider range of disciplines and publication types would be an interesting
area for future research on academic writing.
Another way of looking at interaction effects is from the perspective of within-
discipline variation. Within the discipline of biology, significant publication type

81
differences emerged on four of the five dimensions. In contrast, significant publication
type differences within the discipline of history were only found on one dimension. This
reveals that the variable of publication type seems to play a more important role within
biology writing than within history writing.
The results of this study strongly support the importance of discipline and
publication type as predictors of linguistic variation in academic writing. However, the
Publication Type x Discipline Model revealed that discipline and publication type
interact in complex ways. This is due, at least in part, to the situational relationships
between discipline and publication type. For example, the lack of discipline variation
within popular academic writing may be due to differences in the situational variables of
purpose and nature of the data or evidence (see Table 4.1). In contrast, there are stark
discipline differences within the publication type of journal articles; whereas history
journal articles report an author’s ideas and observations, biology journal articles transmit
empirical results of new scientific findings.
In summary, this study supports the findings of previous research by
demonstrating the importance of the situational variables of publication type and
discipline. The situational definition of a register category encompasses the variables of
discipline and publication type. However, these two variables have emerged as two of the
most important situational considerations in the definition of academic registers.
Additionally, unlike many of the other situational considerations in a register definition, it
is possible to design a corpus so that the variables of publication type and discipline are
completely crossed, as they are in this study. Balancing the design of the corpus in this
way makes it possible to measure the effects of publication type and discipline, as well as
interactions between them. This design is attractive because it allows for a more complete
description of the effects of publication type and discipline on register variation.

5.6. Interpreting variation among publication types

5.6.1. Popular academic books

Popular academic writing has the highest mean Dimension 1 scores in both
disciplines, showing that popular academic writers rely on a writing style that is more
emotional and less informationally dense. Popular academic writing also contains more
features associated with defining and evaluating new concepts than journal articles but
not nearly as many as biology textbooks. The results for Dimension 3 reveal that popular
academic writing is much more focused on the stance or position of the author. Within
both disciplines popular academic writing was the most colloquial and used the most
features associated with a narrative style. Finally, Dimension 5 shows that popular
academic books use very little ‘Abstract Observation and Description’, or elaborated,
abstract language, relative to the textbooks and history articles. However, the Dimension
5 scores for popular academic writing and biology journal articles were very similar.
This does not necessarily imply, however, that biology journal articles and popular
academic writing have the same reasons for not using ‘Abstract Observation and
Description’. A closer investigation of the texts revealed that the reason abstract
elaboration is less common in popular academic writing is that authors focus instead on
making concrete scientific findings and evidence relevant and exciting to the reader. In

82
summary, popular academic writing in history and biology can be described as an
emotional, author-focused publication type that is written to be readable and appealing to
a wide audience.

5.6.2. University textbooks

The results from the first dimension show that university textbooks are much less
informationally dense than journal articles. While textbooks contain less non-technical
synthesis and more informational density than popular academic books in both
disciplines, the gap between the publication types is much smaller in history than in
biology. The large difference between the Dimension 2 scores for biology and history
textbooks is one of the clearest examples in this study of the importance of accounting for
discipline variation in a corpus-based study. This difference is not altogether surprising
once we consider the nature of pedagogy in these two disciplines. Biology is highly
conceptual, requiring textbook authors to frequently define and explain new terms and
processes. Biology can also be quite difficult for learners to relate to, as much of it is
studied in the abstract and at the microscopic level. History textbooks, on the other hand,
do not rely as heavily on defining new concepts as they do on narrating and sequencing
the details of past events. In many cases, these events are highly relevant to the reader,
thus eliminating the need for extensive explanation and evaluation.
Textbooks in both disciplines are less author-centered than their popular academic
counterparts. While biology textbooks contain more author-focused stance than biology
journal articles, the opposite is true in history. The Dimension 4 scores show that
textbooks contain more colloquial narrative than articles. In both disciplines, textbooks
use less features of colloquial narrative than popular academic books, but this difference
is negligible in history. Finally, the last dimension reveals that textbooks contain more
Abstract Observation and Description than popular academic books in both disciplines.
However, as mentioned above, biology journal articles contain less Abstract Observation
and Description than history journal articles. To summarize, textbooks tend to fall
somewhere between journal articles and popular academic writing in most areas.
However, biology textbooks focus on defining and evaluating new concepts, whereas
history textbooks contain more colloquial narrative.

5.6.3. Journal articles

Relative to other publication types within their respective disciplines, journal


articles in biology and history tend to have more specialized language and density of
information. Journal articles in both disciplines also contain fewer definitions and
evaluations of new concepts and colloquial narrative. However, biology journal articles
contain less author-centered stance, less colloquial narrative, and less Abstract
Observation and Description than journal articles in history. These patterns are
unsurprising when we consider the highly objective informational focus of writing in the
natural sciences. The supporting evidence used by biology writers, in most cases, results
from analyses of empirical data. This eliminates many of the factors that motivate the
use of author-centered stance and abstract elaboration. Supporting evidence in history

83
articles, on the other hand, is often heavily based on the interpretation and rhetoric of the
author regarding causes for and relationships among events.

5.7. Conclusion

This chapter has presented the methods for and results of a MD analysis of linguistic
variation in published academic writing. The MD analysis of 56 linguistic features within
the 150 texts of the corpus revealed five underlying dimensions of linguistic variation in
published academic writing were. The following five interpretable dimensions of
linguistic variation in published academic writing were identified:

1. ‘Non-technical Synthesis vs. Specialized Information Density’


2. ‘Definition and Evaluation of New Concepts’
3. ‘Author-centered Stance’
4. ‘Colloquial Narrative’
5. ‘Abstract Observation and Description’

After calculating a unique dimension score for each text in the corpus, comparisons
were made across registers, disciplines, and publication types. The results have shown
that there is substantial variation within and between registers, disciplines, and
publication types within academic writing. A series of factorial ANOVAs also revealed
that the variables of discipline and publication type interact in meaningful ways. In
Chapter 7 additional analyses will be performed on the data presented here, and I will
discuss the broader implications of these findings in Chapter 8.

84
CHAPTER 6. A STYLISTIC PERCEPTION ANALYSIS OF PUBLISHED
ACADEMIC WRITING

6.1. Introduction

The purpose of this chapter is to introduce and apply a new method for measuring
reader perceptions of writing quality and style. As mentioned in Chapter 2, the research
design used in this dissertation was inspired, in large part, by Carroll’s (1960) use of
factor analysis to simultaneously analyze relationships among linguistic and perceptual
variables. Carroll designed a study to measure prose style using quantitative linguistic
variables and subjective perceptions from ‘expert judges’ using semantic differential
items. The results of Carroll’s factor analysis revealed that correlations tend to be
stronger among subjective perceptual variables and among objective linguistic variables
than between the two groups. These findings show that stylistic variables can be reliably
measured and that reader perceptions and linguistic variables represent two distinct yet
related constructs. Those findings informed my methodology in two ways. First, Carroll’s
results suggest that text-linguistic variables and reader perceptions are inherently
different and, therefore, best measured separately in terms of their co-occurrence patterns.
For that reason, I chose to perform two separate factor analyses, one for the linguistic
rates of occurrence and another for the perceptual ratings. Second, Carroll’s findings
strongly suggest that linguistic variation and reader perceptions are strongly related.
Therefore, relationships between the results of the factor analyses for the linguistic
characteristics and the perceptual items are thoroughly investigated in Chapter 7.
In this chapter I present the methods used to develop a set of semantic differential
items designed to measure readers’ perceptions of writing quality and style. I then lay out
the research design for this portion of the study and describe the procedures and
participants used in the data collection stage (Section 6.2). In Section 6.3 I describe the
results of a series of analyses used to assess the reliability of the survey results. Section
6.4 contains the methods used to perform the factor analysis of the perceptual response
data, and Section 6.5 presents the dimension structures and results of that analysis.
Finally, Sections 6.6-6.12 contain a summary of the results and conclusions of this study
of reader perceptions.

6.2. Methods

In this section I present the methods used to develop and pilot the survey
instrument used in this study. I also present the data collection procedures, corpus texts,
survey administration procedures, and participants used in this study.

6.2.1. Developing an instrument to measure writing quality

6.2.1.1. The perceptual differential item

As mentioned above, Carroll’s (1960) 29 semantic differential items served as a


foundation for the survey developed here. A semantic differential item usually consists of
a scale with 7 points lying between two bipolar adjectives (see Osgood, Suci and

85
Tannenbaum, 1957; Snider and Osgood, 1969). Participants are asked to indicate their
attitude toward a subject by choosing a position between the two adjectives. For example,

bad ___ : ___ : ___ : ___ : ___ : ___ : ___ good

Semantic differential items were also used successfully in van Peer’s (1986) research on
the pragmatic concept of foregrounding in poetry. There is a vast array of options
available for the measurement of attitudes and perceptions (see, e.g., Mueller, 1986).
However, the success of Carroll’s and van Peer’s instruments and the simple and
straightforward nature of semantic differential items are the reasons I chose this item type
for this study. The semantic differential item was originally developed by Charles
Osgood as a measure of “the connotative meaning of objects” in attitude research. In this
study I adopt the semantic differential item type, but I will hereafter refer to the items in
this study as ‘perceptual differential items’ to avoid confusion with the linguistic
connotation of the ‘semantic’ label.

6.2.1.2. Developing a Comprehensive Set of Perceptual Differential Items

There were three major stages in the process of developing the Stylistic
Perceptions Scale used in this study. These stages were:

1. Develop a comprehensive set of relevant perceptual differential items


2. Assess the reliability of the items
3. Revise the items according to reliability results

This section briefly describes the methods and results of the first stage. Section 6.2.1.3
presents the results of the second two stages.
In order to develop a comprehensive set of relevant perceptual differential items, I
followed two steps. The first step was to review the literature in order to identify
descriptive terms used to describe writing quality and style. This included, but was not
limited to a review of previous studies that used semantic differential items as a measure.
I began by borrowing descriptors from Carroll’s (1960) and van Peer’s (1986) semantic
differential instruments. I then performed a broad survey of literature that has discussed
different aspects of texts and writing style, beginning with studies that cited Carroll’s
work, with a goal of identifying possible terms or descriptors that could be used in
perceptual differential items. In most cases, the words from these studies did not come
from perceptual instruments. Rather they were keywords used by authors in their
qualitative descriptions of patterns they identified in the data (e.g., informative, academic,
explanatory). The result was a pool of 40 unique terms that described various aspects of
writing style and quality. I modified these descriptors into perceptual differential items by
assigning a corresponding antonym (e.g., good—bad) to each.
The second step in developing a comprehensive list of relevant items was to
identify additional items that represent valid perceptual parameters. In order to be as
inclusive as possible it was necessary to elicit potentially important perceptual descriptors
from as many participants as possible. Therefore, I gathered descriptors from the
following two groups: (a) all students enrolled in English 223 (N = 52), a large section

86
linguistics course at Northern Arizona University, and (b) feedback from Amazon.com
customers (N = 63) directly relating to the writing style and quality of a sample of
popular university textbooks. Using an online survey, the English 223 students were
assigned to read two passages from their course readings and use at least three adjectives
to describe the writing style of each. They were then asked to write at least three other
adjectives they might use if describing the writing style of their other course textbooks.
The adjectives used by the student participants, combined with adjectives from the
Amazon reviews resulted in over 500 adjective tokens. A word frequency list was
generated using this list of adjectives in order to determine which words occurred
repeatedly. This was done based on the assumption that an adjective used by more than
one textbook reader might be a good candidate for use as a descriptor in a perceptual
item. Of the 104 adjectives that were mentioned two or more times, 30 were mentioned at
least five times, and 12 were mentioned twelve or more times. This shows strong
evidence that there are valid perceptual parameters that are recognized by readers.
In order to create a list of additional items based on these data, I created a list of
adjectives that were mentioned at least twice but not already included in the pilot
instrument. I then narrowed the list further by keeping only items that were perceptually
interesting based on the goals of this dissertation research. Next, I added an opposite term
for each of the adjectives. This resulted in the addition of 11 items.
It was encouraging to note that 54% of the adjectives identified in the first step
were mentioned at least once, for a total of 170 occurrences of those 43 terms. However,
neither of the adjectives was mentioned in the second step for 13 of the items identified
during the first step. These 13 items were subsequently removed from the survey. In
summary, the methods used in this section yielded a list of 38 unique items that can be
considered a comprehensive set of perceptual differential items that are relevant to
published academic writing.

6.2.1.3. Determining the Reliability of the Perceptual Items

The second stage in developing the Stylistic Perceptions Scale was determining
whether the individual perceptual scales identified in the previous section can be reliably
evaluated by independent raters. In order to achieve this goal, two independent, untrained
participants rated a sample of 16 texts from the corpus on each of the 38 perceptual
scales. Pearson’s correlations were calculated for the two raters’ responses on each of the
38 items. These correlations were used to assess the reliability of each perceptual scale.
These results were used to determine which of the 38 items, if any, should be eliminated
or modified.
The results of the item reliability analysis revealed that some of the items were
not applied consistently across texts by the two raters. In order to determine which items
were candidates for elimination or modification, a Pearson’s r threshold of .25 was
established. The goal of this admittedly low threshold was to be as inclusive as possible
while still omitting items that participants clearly did not agree on. The reliabilities for
the items ranged from .12 to .58. Seven of the items did not achieve the Pearson’s r
threshold of .20.

87
6.2.1.4. Modifying the Perceptual Items

The final stage in developing the Stylistic Perceptions Scale was to revise the
perceptual differential items that were deemed unreliable in the previous stage. As
mentioned above, seven items did meet the Pearson’s correlation threshold of .25.
However, for each of these seven items multiple participants in the previous section
identified at least one of the terms as important. Therefore, rather than eliminating these
items, an attempt was made to modify them in an effort to make them clearer and easier
to apply to published academic writing. In order to improve the items that received low
reliabilities, the list of adjectives generated in Section 6.2.1.2 was used to revise items to
include clearer or more appropriate terms. In conclusion, the three stage process applied
in Sections 6.2.1.2 – 6.2.1.4 produced the 38 perceptual differential items that comprise
the final Stylistic Perceptions Scale displayed in Table 6.1.

Table 6.1. Final 38 items in the Stylistic Perceptions Scale.

1. effective—ineffective 20. well-organized—poorly organized


2. readable—unreadable 21. incomprehensible—comprehensible
3. biased—unbiased 22. successful—unsuccessful
4. free—constrained 23. dense—not dense
5. abstract—concrete 24. casual—formal
6. graceful—awkward 25. easy to follow—hard to follow
7. vague—explicit 26. undescriptive—descriptive
8. bad—good 27. conversational—academic
9. impartial—opinionated 28. unclear—clear
10. exciting—dull 29. focused—not focused
11. plain—expressive 30. entertaining—not entertaining
12. detached—interactive 31. unimportant—important
13. emotional—unemotional 32. informative—not informative
14. relatable—unrelatable 33. not useful—useful
15. intimate—distant 34. modern—old-fashioned
16. humorous—serious 35. relevant—irrelevant
17. profound—superficial 36. not technical—technical
18. varied—monotonous 37. personal—impersonal
19. boring—engaging 38. undetailed—detailed

88
6.2.2. Data collection

This section includes a detailed overview of the methods used to gather data on
readers’ stylistic perceptions of the texts in the corpus. After discussing the texts, I
introduce the survey instrument and Mechanical Turk, the online tool used to recruit
participants and collect the perceptual data. Finally, I present information regarding the
participants who took part in the study.

6.2.2.1. Texts

The corpus of texts used in this part of the study is the same as the corpus used in
the text linguistic analyses presented in Chapter 5. A complete description of the corpus
was presented in Chapter 3 and the situational analysis of the registers represented in the
corpus can be found in Chapter 4. The texts were presented to the participants in
electronic format using size 12 Times New Roman font. The participants were instructed
to read the entire text before proceeding to the survey items that began on the next page
of the online survey. The participants were also instructed to pay attention to their
perceptions of the writing quality and readability of the texts they were reading.

6.2.2.2. Mechanical Turk

Mechanical Turk (MTurk) is an Amazon crowdsourcing company designed to


facilitate the creation of simple Human Interaction Tasks (HITs) by Requesters and the
completion of these tasks by participants, or Workers. Although MTurk was originally
developed for human computation tasks, it has been used extensively in recent years for
research in the social sciences, including linguistics (Mason & Suri, 2011). Many
researchers have evaluated the quality of data collected using MTurk Workers. Some of
this research has compared the results of MTurk Worker data to expert data collected on
identical tasks. The results have repeatedly shown MTurk Workers to be performing at
the same level of quality as experts in the task (Snow et al., 2008; Marge et al., 2010;
Urbano, Morato, Marrero, & Martin, 2010; Alonso & Mizzaro, 2009). Additional
research has investigated whether results from MTurk workers are comparable to data
collected from other samples. These studies have revealed no significant differences
between MTurk workers and participants from other populations (Paolacci et al., 2010;
Suri & Watts, 2011).
In this study, MTurk was used as a venue to recruit, correspond with, and pay
participants. After creating a Requester profile on MTurk, I created a separate HIT for
each of the 150 texts in the corpus. My description of the task was: “Read some writing
and report your perceptions of the writing style. This should take 5-10 minutes”. The
keywords were: “read, perception, attitudes, writing style”. These were both used to
recruit potential participants as they searched or browsed through the available HITs
within MTurk. Workers on this task were required to be residing in the United States.
Additionally, their HIT Approval Rate on previous HITs was required to be greater than
or equal to 90%. I requested that 25 participants complete each of the 150 texts/surveys.
Participants were allowed to complete surveys for multiple texts. However, a setting was
used in MTurk that allowed them to complete only one survey per text.

89
Participants were paid 60 cents per HIT. However, payment was not disbursed
until after their work was reviewed and approved by me. Before approving the work
completed by the participants, I reviewed the amount of time they spent on the task to
ensure they spent a reasonable amount of time completing the reading and the survey. I
also spot-checked responses to check for patterns in responses that suggested participants
were not actually completing the task as instructed (e.g., a score of ‘4’ for all 38 items).
No problems with the data were discovered using these two methods.

6.2.2.3. Survey Instrument

While MTurk does provide programs to create surveys, I opted to use Google
Forms to create the survey and collect data because the survey program on MTurk did not
have options for creating semantic differential scales and because data is easier to export
from Google Forms. After accepting the MTurk HIT, participants were presented with a
random text from the corpus. They were instructed to read the entire text before accessing
the survey. The Google survey was accessed through a link below the text. The survey
was titled ‘Text Perceptions’ and the participants were given the following instructions:

After entering some basic personal information, rate the passage you just read on
each of the scales below. There are no right or wrong answers. Try to be as
precise as you can about your judgment. Rating your reaction is done by clicking
on the number that best represents the degree of your opinion. Don’t worry if
some of the adjectives puzzle you. It is your intuitive response I want. So fill in
the scales as you “feel” the items should be judged.

Each time a participant began a survey, they were asked to enter their Mechanical
Turk Worker Identification Number and the unique Text Identification Number for the
text they read. They were then asked to enter basic demographic information (age,
gender, and educational background). Finally, in random order, they were presented with
the 38 perceptual differential items, each of which was on a 6-point scale, in the
following format:

Participants were required to complete each of the perceptual differential items before
submitting the survey. After submitting the Google Forms survey, participants were
given a completion code to be entered into a box in the MTurk HIT before it could be
submitted.

90
6.2.2.4. Participants

Data collection from the participants in this study was approved by the Institutional
Review Board for the Protection of Human Subjects in Research at Northern Arizona
University (Project #12.0219).
As mentioned above, the participants in this study were recruited and paid
through MTurk. The participants were residing in the United States at the time they
participated, and each participant had a high ( > 90%) approval rating on MTurk.
The 3,750 required surveys were completed by a total of 708 unique participants.
The number of surveys completed by each rater ranged from 1 to 98 (M = 5.38, SD =
9.69). The majority of the participants completed only one survey, and 54% of the
surveys were completed by just 10% of the participants. The time taken to complete each
survey ranged from 3.5 minutes to 7.5 minutes (M = 5.42, SD = .76).
An analysis of the demographic information reported by the participants revealed
that the participant pool was composed of 66% males and 34% females. The reasons for
the unbalanced gender distribution are not clear. The ages of the participants ranged from
18 to 78 years old (M = 29.76, SD = 10.20). Figure 6.1 displays the distribution of
participant ages. It can be seen that over half of the participants were in their 20s and
more than 75% of them were between 20 and 30 years old.

Figure 6.1. Distribution of participant ages.

Teens
20s
30s
40s
50s
60s
70s

The education levels of the participants ranged from some high school to
advanced college degree. Figure 6.2 shows the distribution of the participants’
educational backgrounds. The individual perceptual items were analyzed to determine
whether there were statistically significant differences between genders, age groups, and
educational levels. Although there were some minor differences between educational

91
levels and genders, these differences were neither statistical nor systematic. The results of
a sample of these analyses will be presented in the next section.

Figure 6.2. Distribution of participant educational background.

Some HS
HS Grad
Some College
Bachelors
Masters+

6.2.3. Perceptual variation across demographic groups

In this section I measure the effect of three demographic variables—gender,


education, and age—on two of the perceptual items: readable – unreadable and biased –
unbiased. The main reason for performing these comparisons is to determine whether the
participant sample is homogenous enough in their perceptions of writing quality to be
treated as representative of a single population of readers. The mean perceptual scores for
each group will be displayed in a bar graph (see Figures 6.3-6.5), and one-way ANOVAs
will be used to test for statistical differences across the various groups for each variable.
For the purposes of these analyses, the levels of educational background are: some high
school, high school graduate, some college, Bachelor’s degree, and Master’s degree or
higher. The variable of age is divided into the following six groups: 18-19, 20-29, 30-39,
40-49, 50-59, 60-79.
The results for the gender comparison can be seen in Figure 6.3. The scores on
both items are slightly higher for males than for females. Using an a priori alpha of 0.05,
the gender difference was non-significant for the first item (F(1, 3805) = 2.783, p =
.095). However, the difference between genders was significant for the second item (F(1,
3805) = 11.85, p = .001). This revealed that males perceived texts as being significantly
more biased than females. Considering the extremely large sample size in these
comparisons, it is also useful to look at the estimated effect size in order to determine the
proportion of variance accounted for by the difference. The effect size for the difference
between genders on the second item is extremely small (adjusted R2 = .003), showing that

92
only about one-third of one percent of the variability in item scores is accounted for by
the gender difference.

Figure 6.3. Mean Dimension Scores for Males and Females.

4.5 Male
4 Female
3.5

2.5

1.5

0.5

0
Readable--Unreadable Biased--Unbiased

The distribution of mean item scores for the five educational groups is displayed
in Figure 6.4. There were no statistically significant differences between any of the
groups on the first item (F(4, 3802) = 2.05, p = .085) or the second item (F(1, 3802) =
0.502, p = .705).
The final demographic variable I investigate here is that of participant age. The
mean perceptual scores for the six age groups can be compared in Figure 6.5. There was a
significant difference for the age groups on the first item (F(5, 3797) = 3.86, p =.002, R2
= .004). This difference accounted for approximately 0.4% of the variability in scores on
the first dimension. There was no statistical evidence for differences between groups on
the second item (F(5, 3797) = 1.13, p = .34).

93
Figure 6.4. Mean Dimension Scores for Five Education Groups.

5
Some HS
4.5
HS Graduate
4 Some College
3.5 Bachelor's Degree
Master's or Higher
3

2.5

1.5

0.5

0
Readable--Unreadable Biased--Unbiased

Figure 6.5. Mean Dimension Scores for Six Age Groups.

5
18-19
4.5 20s
4 30s
40s
3.5 50s
60s
3
2.5
2
1.5
1
0.5
0
Readable--Unreadable Biased--Unbiased

In summary, these analyses have suggested that the three demographic variables
reported by participants contribute somewhat to the variability in perceptual scores.
However, the negligible effect sizes for the two statistical differences suggest that the
differences are largely a product of the massive sample sizes. In sum, despite the
94
statistical differences that were found between demographic groups, these analyses have
not presented strong arguments to discourage the treatment of this participant sample as a
single, relatively homogenous population of readers. Therefore, the data from all
participants will be analyzed together, regardless of demographic variation in the sample.

6.3. Reliability

This section will present the results of a series of analyses used to assess the
reliability of the data. Three different types of reliability were measured: item reliability,
rater reliability, and text reliability. Item reliability is defined here as the degree to which
an item is used consistently across raters and across texts. Text reliability is the degree to
which a text can be consistently rated by different participants and across different items.
Finally, rater reliability is the degree to which participants are consistent with each other
in their ratings across items and across texts.

6.3.1. Item reliability across raters and texts

The reliability of the perceptual differential items across raters and texts was
measured using the Intraclass Correlation Coefficient (ICC), which was selected because
it is the most appropriate statistic for measuring interrater reliability among more than
two raters. For purposes of interpretation, the ICC is “algebraically equivalent” to
Cronbach’s Alpha (Tinsley and Weiss, 2000; see also Shrout and Fleiss, 1979). For this
application, the ICC is interpreted as the degree to which the ratings for a particular item
on a particular text resemble each other. More specifically, the ICC can be interpreted
here as the proportion of total variance in responses to a particular item that can be
accounted for by the variance between participants’ ratings on that item.
In order to be included in the final analysis, it was determined that each item must
achieve an average measure ICC that is statistically significant (α = .05) and higher than
.70. The ICC for the 38 perceptual differential items ranged from .45 to .94 (M = .80, SD
= .13) (see Table 6.2). All of the items achieved significance at the α = .05 level.
However, there were eight items that did not reach the ICC threshold of .70. Therefore,
these eight items were dropped from all further analyses, leaving data from a total of 30
items that were included in the MD analysis in Section 6.4. This substantially improved
the mean ICC for items (M = .85, SD = .07).

Table 6.2. Intraclass Correlation Coefficients for the 38 perceptual differential


items.

Item ICC
1. effective—ineffective 0.743
2. readable—unreadable 0.894
3. biased—unbiased 0.862
4. free—constrained 0.884
5. abstract—concrete 0.8
6. graceful—awkward 0.829
7. vague—explicit 0.721

95
8. bad—good 0.773
9. impartial—opinionated 0.882
10. exciting—dull 0.9
11. plain—expressive 0.892
12. detached—interactive 0.871
13. emotional—unemotional 0.919
14. relatable—unrelatable 0.871
15. intimate—distant 0.894
16. humorous—serious 0.738
17. profound—superficial 0.448
18. varied—monotonous 0.886
19. boring—engaging 0.897
20. well-organized—poorly organized 0.727
21. incomprehensible—comprehensible 0.861
22. successful—unsuccessful 0.768
23. dense—not dense 0.848
24. casual—formal 0.89
25. easy to follow—hard to follow 0.896
26. undescriptive—descriptive 0.474
27. conversational—academic 0.908
28. unclear—clear 0.814
29. focused—not focused 0.575
30. entertaining—not entertaining 0.888
31. unimportant—important 0.679
32. informative—not informative 0.584
33. not useful—useful 0.699
34. modern—old-fashioned 0.661
35. relevant—irrelevant 0.696
36. not technical—technical 0.939
37. personal—impersonal 0.91
38. undetailed—detailed 0.719

6.3.2. Text reliability across raters and items

The ICC was also used to measure the reliability for each of the 150 texts across
items and raters. In this case, the ICC for texts can be interpreted as the proportion of
total variance in responses to items for a particular text that can be accounted for by the
variance between participants’ ratings on the items for that text.
As with the reliability analysis for items, in order to be included in the final
analysis, each text must achieve an average measure ICC that is statistically significant (α
= .05) and higher than .70. The ICC for the 150 texts ranged from .55 to .98 (M = .91,
SD = .07). All of the items achieved significance at the α = .05 level, but four of the texts
did not reach the ICC threshold of .70. These four texts were removed from the analyses,

96
leaving responses on 146 texts. It is noteworthy that all four of these texts were from the
discipline of history and three of the four texts were history journal articles. It seems that
readers may vary more in their perceptions of history writing than writing in biology.

Table 6.3. Intraclass Correlation Coefficient Results for Texts by Publication Type
and Discipline.

Pub. Type Journal Articles Textbooks Popular Science

Discipline Biology History Biology History Biology History

ICC 0.96 0.83 0.94 0.90 0.92 0.90

A closer look at the ICC means for the two disciplines revealed that the mean ICC for
biology (M = .94) was higher than the mean ICC for history (M = .88). The mean ICC
values for texts were highest for textbooks (M = .92), followed by popular academic
books (M = .91) and journal articles (M = .89). These general patterns can be seen in the
results in Table 6.3.

6.3.3. Rater reliability across items and texts

It was not possible to measure rater reliability using a measure of agreement


because the texts were not each rated by the same 25 participants. Therefore, rater
reliability was indirectly measured through an analysis of outliers in the data. The number
of outliers per participant will help to identify aberrant raters—raters who are consistently
deviant from the other participants.
Outliers were identified by standardizing each item response in relation to the
other responses to that item on a given text. This is done using the z-score formula. For
the purposes of this analysis, outliers are operationalized as any single participant
response that is greater than or equal to z = ± 3.29, (p < .001, two-tailed). While this is a
relatively crude measure for identifying outliers, it will suffice as a tool for identifying
aberrant raters in the dataset.
The analysis revealed 160 outliers from a total of 66 different participants. Of
these 66 participants with outliers, only one-third (n = 22) had more than one response
that was an outlier, and only about 11% (n = 7) had five or more outliers. This was very
encouraging and offers strong evidence in support of the reliability of participant
responses to the items in the survey on each of the texts. However, the analysis did reveal
that some participants had multiple item outliers. In almost every case the participants
with outliers were also among the most prolific raters in the study. Therefore, the percent
of the participants’ total responses that were outliers was calculated in order to correct for
the number of surveys completed by each participant. These percentages were calculated
for each of the nine participants that had more than two deviant responses. The results
ranged from 0.35% to 5.3% (M = 1.70, SD = 1.74). The participant who was deviant
5.3% of the time is the only one who appears to be an aberrant rater. Therefore, all of this
participant’s responses were removed from the dataset. As this particular participant rated

97
only one text (TB_BI_04), the deletion of this rater leaves 24 ratings on that text. Each of
the other texts retained all 25 ratings.

6.3.4. Summary of reliability results

The results of the perceptual differential instrument were measured in three


different ways: item reliability, text reliability, and rater reliability. The Intraclass
Correlation Coefficient was used to measure the reliability of the 38 items and for the 150
texts. Standardized z-scores were used to check for outliers in rater responses and
ultimately to identify aberrant raters.
The mean ICC value for the items was quite high at .80. However, the removal of
eight items that did not meet the a priori ICC threshold of .70 increased the mean ICC
value for the remaining 30 items to .85. The mean ICC value for texts was very high at
.91. After removing the data for four texts that did not meet the .70 threshold, the mean
value increased to .92. Finally, the rater reliability analysis revealed a very small number
of outliers in the dataset, requiring the removal of data from only one participant.
In conclusion, the results of the reliability analyses presented here offer strong
evidence to support the use of perceptual differential items to measure reader perceptions
of writing quality. Furthermore, these results strongly support the reliability of the
responses from the participants in this study on a variety of texts. After the necessary
removal of eight perceptual differential items, four texts, and one participant from the
dataset, the final analysis will proceed using the data for 30 items on 146 texts.

6.4. Performing the Multi-Dimensional analysis

In this section I present the methods used to carry out a new MD analysis of
readers’ stylistics perceptions of the texts in this corpus. Traditionally, MD analyses have
been performed on a set of objective linguistic variables. This is the first MD analysis to
include only subjective perceptual variables. As the methods used here are essentially the
same as those used for the MD analysis in Chapter 5, I will only briefly describe each
methodological step before describing the results.

6.4.1. Perceptual Variables

The perceptual variables included in this analysis are the reduced set of thirty
perceptual differential items reported in bold in Table 6.2. For all of the texts except one,
each of these thirty items received 25 ratings. For each text the median of these 25 ratings
was calculated, resulting in a set of thirty median perceptual differential item scores per
text. The median was chosen as a measure of central tendency because it is more robust
to outliers than the mean, especially for ordinal data. Although the analysis in Section
6.3.3 did not reveal substantial concerns with outliers in the dataset, I chose to use the
median as a conservative measure of central tendency.

98
6.4.2. Factor Analysis and Dimension Scores

The median scores for the thirty perceptual variables described in the preceding
section were included in a factor analysis. This factor analysis procedure was completed
using the R package, following the same procedure as the one outlined in Section 5.1.3.
The scree plot suggested that a two factor solution was best (see Appendix G). The full
results of this two factor solution can be seen in Appendix H. Both factors were
interpreted. The cumulative percentage of shared variance accounted for by these two
factors was 73%. A minimum factor loading threshold of +/- .30 was used, and each
variable was included in the factor where it had the highest factor loading.
Scores for the two dimensions were calculated for each text by simply reversing
the polarity of the negative-loading items and adding the median perceptual differential
item scores for all of the items. Because the item scores are all on the same scale there
was no reason to standardize them using the formula used in Section 5.1.4. The two
dimension scores for each text were then used for the statistical analyses and figures in
the following sections.

6.5. Dimensions of perceived writing quality in academic writing

I begin this section by presenting the results of the factor analysis. I then interpret
the two dimensions based on the underlying stylistic perceptions represented by the co-
occurrence patterns of the perceptual terms. As part of the interpretation of the two
dimensions I include examples that demonstrate the type of language associated with
particular perceptions of writing quality.
In order to interpret the two dimensions, I rely on the co-occurrence patterns of
the perceptual terms and the writing styles of texts that received relatively high or low
dimension scores. Because the perceptual differential items are binary, it is possible to
interpret them as either positive or negative by simply reversing the polarity of the terms.
Accordingly, in order to make the interpretations of the factors more transparent, in Table
6.4 I have reversed the polarity of the negative terms and added them to the list of
positive items. The factor loading for each of the items is reported in parentheses next to
the positive term. The negative loading terms are reported in parentheses.

Table 6.4. Final dimension structure of the two-factor solution.

Dimension 1: Engaging and Easy to Read vs. Boring and Difficult to Read

Positive-loading terms:
successful (.97); good (.92); effective (.87); well-organized (.86); clear (.86); easy to
follow (.80); comprehensible (.79); graceful (.74); exciting (.74); engaging (.73); readable
(.72); entertaining (.68); varied (.66); not dense (.65); relatable (.60)

Negative-loading terms:
(unsuccessful); (bad); (ineffective); (poorly organized); (unclear); (hard to follow);
(incomprehensible); (awkward); (dull); (boring); (unreadable); (not entertaining);
(monotonous); (dense); (unrelatable)

99
Dimension 2: Interactive Author Interpretation vs. Objective Information Focus

Positive-loading terms:
opinionated (.91); biased (.88); abstract (.87); vague (.85); personal (.76); not technical
(.74); undetailed (.71); conversational (.70); emotional (.68); free (.57); humorous (.57);
interactive (.57); intimate (.56); casual (.55); expressive (.55)

Negative-loading terms:
(impartial); (unbiased); (concrete); (explicit); (impersonal); (technical); (detailed);
(academic); (unemotional); (constrained); (serious); (detached); (distant); (formal);
(plain)

6.5.1. Dimension 1: ‘Engaging and Easy to Read vs. Boring and Difficult to Read’

The terms that loaded positively on the first dimension are all related to one of
two distinct, yet closely related underlying attributes: (1) engaging (good, engaging) or
(2) readable (clear, comprehensible, not dense). Diametrically, the negative terms
describe the text as either (1) boring (unsuccessful, ineffective, dull, not entertaining,
monotonous, unrelatable) and (2) difficult to read (poorly organized, hard to follow,
awkward, unreadable). It is not at all surprising to find correlations between reader
perceptions of the degree to which a text is engaging, on one hand, and comprehensible,
on the other. However, it is interesting to observe how closely related those two
perceptions are. A review of the bivariate correlation matrix reveals the strength of these
associations. Strong and statistically significant correlations were found between
participant perceptions of texts that are readable and engaging (r = .84, p < .01), hard to
follow and dull (r = .86, p < .01), and unsuccessful and poorly organized (r = .67, p <
.01). Therefore, Dimension 1 will be labeled ‘Engaging and Easy to Read vs. Boring and
Difficult to Read’.

6.5.2. Dimension 2: ‘Interactive Author Interpretation vs. Objective Information Focus’

The relationships among the items of Dimension 2 are complex and multi-faceted.
The first perceptual aspect of this dimension is the degree to which a text is objective
versus based on author interpretation. There are four positive-loading terms that can be
classified into this category (opinionated, biased, personal, and emotional). The second
aspect is the amount of perceived author involvement and author-reader interaction.
There are five positive-loading terms (interactive, conversational, humorous, intimate,
and casual). The third and final perceptual aspect of Dimension 2, which seems to be at
the core of this dimension, is the type of information included in the text. There are eight
positive-loading terms in this category. These eight terms are related to the degree of
information specialization and formality. The combination of these three aspects
(objectivity, author involvement, and information focus) of this dimension led me to label
it ‘Interactive Author Interpretation vs. Objective Information Focus’.

100
6.5.3. Relationship between the two dimensions

Although the factor analysis identified two distinct underlying dimensions in the
reader perceptions, it is important to point out that the factor structures are correlated at a
level of 0.52. After the dimension scores were calculated based on the standardized
perceptual ratings, this correlation between the two dimensions increased to 0.78. In
order to interpret these correlations, it is important to first note that the co-occurrence
patterns for the items on these two dimensions show that they measure two distinct
parameters of perceived writing style. However, the statistical positive correlation
between the two dimensions demonstrates that they also have a high level of shared
variance.
A comparison of the two dimension structures reveals that they both measure
writing quality, just in different ways. The first dimension captures one cline of writing
quality: engaging and easy to read versus boring and difficult to read. The second
dimension captures another parameter of quality: interactive author interpretation versus
objective information focus. It seems that there is a general pattern in which texts that are
perceived as (a) engaging and easy to read and are also perceived as (b) interactive and
author focused. In summary, the data suggest that these two dimensions represent two
distinct, yet closely related stylistic parameters of perceived writing quality. This
relationship can be seen throughout the results below.

6.6. Stylistic perceptions of published academic writing

In order to describe the variation in reader perceptions of published academic


writing, I use the same two models that were used in Chapter 5: the Register Model and
the Publication Type x Discipline Model. The Register Model uses one-way ANOVAs
and Tukey HSD post hoc comparisons to describe the variability across the six registers.
The Publication Type x Discipline Model describes the variation in terms of two
situational factors, discipline and publication type, and uses 2x3 factorial ANOVAs and
appropriate post hoc analyses for the two dimensions.
The normality of the dependent variables (perceptual dimension scores) was
assessed for each group. Using an a priori z-score cutoff of +/-3.29, the standardized
dimension scores for each of the observations revealed no univariate outliers for either of
the dimensions. All of the groups met the criteria of the Shapiro-Wilk normality statistic
(p < .05). Additionally, all of the groups met the normality criteria of the Q-Q normality
plots, as well as the skewness and kurtosis standardized scores. Therefore, the analyses
were performed without modifying the data. Dimension scores for each individual text in
the corpus are contained in Appendix A. The complete ANOVA results, including
corresponding main or simple effects output, can be found in Appendix I.

6.6.1. Dimension 1: ‘Engaging and Easy to Read vs. Boring and Difficult to Read’

The distribution of the six registers along Dimension 1 can be seen in Figure 6.6.

101
Engaging and Easy to Read Positive-loading terms:
successful (.97); good (.92); effective (.87); well-organized
70
A – University Textbooks—Biology (M = 68.16, SD = 7.54) (.86); clear (.86); easy to follow (.80); comprehensible (.79);
graceful (.74); exciting (.74); engaging (.73); readable (.72);
A – University Textbooks—History (M = 65.77, SD = 5.21) entertaining (.68); varied (.66); not dense (.65); relatable (.60)
65
AB – Popular Academic Books—History (M = 64.28, SD = 7.48)

60
BC – Popular Academic Books—Biology (M = 58.06, SD = 8.52)

55
C – Journal Articles—History (M = 54.34, SD = 8.25)

50

45

40 D – Journal Articles—Biology (M = 39.90, SD = 7.97)

35

Boring and Difficult to Read

Figure 6.6. Registers along Dimension 1: ‘Engaging and Easy to Read vs. Boring and Difficult to Read’
F(5, 140) = 46.99, p < .001, R2 = .627

102
As displayed in that figure, the group with the highest negative scores on Dimension 1 is
journal articles in biology. These texts were commonly perceived as being less engaging
and more difficult to read than the other groups. In all cases, university textbooks and
popular academic books were more engaging and easy to read than journal articles.
The factorial ANOVA for the first dimension revealed a statistical interaction
effect between publication type and discipline, F(2, 140) = 14.79, p < .001, R2 = .63.
Figure 6.7 displays the nature of this interaction effect in a marginal means plot. This plot
shows a dramatic difference between the trends of the two disciplines across the three
publication types. Popular academic books and journal articles in biology were perceived
as being less engaging and more difficult to read than their counterparts in history. In
contrast, biology textbooks were perceived as being more engaging and easier to read
than history textbooks.
The simple effects ANOVAs for publication types within disciplines revealed
significant differences across the three publication types of biology F(2, 140) = 89.41, p
< .001 and history F(2, 140) = 16.28, p < .001. This shows that readers can perceive
differences in the writing quality and style of publication types within a discipline. There
were significant simple effects of discipline within journal articles, F(1, 140) = 51.79, p <
.001, and popular academic books, F(1, 140) = 8.43, p = .004. However, there was no
statistical simple effect of discipline within textbooks.

Figure 6.7. Marginal Means Plot for Publication Type and Discipline along
Dimension 1.

70

65
Dimension 1 Scores

60

55

50

45
Biology
40
History
35
Popular Academic Textbooks Journal Articles

Text excerpt 6.1 received the lowest score on the first dimension. This text
contains a large amount of technical vocabulary (e.g., venation, costal, discal, chorda)
that is likely to be unfamiliar to non-expert readers. The ellipsis of articles (e.g., [the]
subcostal separate from [the] R1 at [the] base) and other function words (e.g., discal cell
[is] half the length of […]) give the prose an incohesive, list-like feel. These challenging
features, coupled with the fact that this topic is not likely to be relevant to most of the

103
participants in the sample, were factors that may have led to this text being perceived as
boring and difficult to read.

6.1 Wing venation: fore wing with 14 longitudinal veins. Radius with five
branches; subcosta separate from R1 at base, terminating beyond half of
the length of the costal margin; discal cell half the length of the costal
margin, partially closed, triangular, reduced at end cell; chorda ellipsoidal,
extended slightly beyond end discal cell; R1and R2 arising from one-third
and two-thirds the length of discal cell respectively, both ending at costal
margin […] [JA_BI_07, D1 score: 26.00]

While some of the popular academic texts received very low scores on Dimension
1, this seems to be the result of different factors than those leading to the low Dimension
1 scores for biology journal articles. Text excerpt 6.2 comes from a biology popular
academic text that received the fifteenth lowest Dimension 1 score of all the 146 texts.
However, the language in this text is dramatically different from the dense, technical
prose in excerpt 6.1. The vocabulary in 6.2 is technical in a different sense from 6.1; the
author of this text assumes the reader has knowledge of, or at least an interest in the
specific details of African geography. This was likely to contribute to the high ‘difficult
to read’ ratings. Additionally, the specific details (e.g., altitude and average annual
rainfall) may not be particularly engaging to the average reader.

6.2 The Okavango-Kalahari ecosystems are a quilt-work of intersecting


habitats. Flying north from Gaborone, Botswana's capital near the South
African border, at an altitude of 300 meters, the acacia scrublands and
grasslands of the southern and central regions immediately hove into view.
Here, in its easternmost expression, the Kalahari is technically not a true
desert at all, receiving (depending on locality) anywhere from 150 to 500
millimeters of rain a year (true deserts receive less than 100 millimeters).
[PA_BI_11, D1 score: 46.00]

The highest scoring of the six registers was the biology textbooks. Excerpt 6.3
was taken from the text with the highest positive Dimension 1 score. This text excerpt
demonstrates some of the features that are associated with texts that the participants
reported as being engaging and easy to read. Unlike 6.1, the author of this text defined
potentially unfamiliar vocabulary and all sentences were grammatically complete, in the
traditional sense. The author also makes an effort to relate the topic to the reader by
contextualizing discoveries and facts in a historical period relative to the reader.

6.3 A biologist who studies mycology is called a mycologist. Historically,


mycology was a branch of botany (fungi are evolutionarily more closely
related to animals than to plants but this was not recognized until a few
decades ago). Pioneer mycologists included Elias Magnus Fries, Christian
Hendrik Persoon, Anton de Bary and Lewis David von Schweinitz. Today,
the most comprehensively studied and understood fungi are the yeasts and

104
eukaryotic model organisms Saccharomyces cerevisiae and Schizosa-
ccharomyces pombe. [TB_BI_07, D1 score: 80.00]

6.6.2. Dimension 2: ‘Interactive Author Interpretation vs. Objective Information Focus’

Figure 6.8 displays the six registers along the second dimension. The distribution
of scores for each register on Dimension 2 was surprisingly similar to Dimension 1.
Journal articles in biology received the lowest score on this dimension. Biology journal
articles were generally perceived as objective and informational. As a group, university
textbooks were perceived as being significantly more interactive and author-focused than
the other registers. Interestingly, the popular academic books and journal articles in
history also formed their own group that was significantly different from the other
registers. The factorial ANOVA for Dimension 2 also yielded a significant effect for the
interaction between publication type and discipline, F(2, 140) = 25.53, p < .001, R2 = .70.
The trends for this dimension are very similar to those of Dimension 1. Figure 6.9 reveals
no noticeable difference between the mean dimension scores for textbooks in history and
biology. However, popular academic books and journal articles in biology were
perceived as being much more objective and informational than the same publication
types in history. These visual patterns are strongly supported by the statistical results.
There were significant simple effects of publication type within both biology, F(2,
140) = 111.01, p < .001, and history, F(2, 140) = 10.65, p < .001. These findings add
additional support for the finding that readers are perceptive of publication type variation
in writing style. The simple effects results for disciplines within publication types showed
that popular academic books, F(1, 140) = 32.38, p < .001, and journal articles, F(1, 140)
= 105.97, p < .001 in biology, are perceived as significantly more objective and
informational than their history counterparts. However, there was no statistical difference
between the Dimension 2 scores for university textbooks in biology and history.

105
Interactive Author Interpretation Positive-loading terms:
opinionated (.91); biased (.88); abstract (.87); vague (.85);
55
personal (.76); not technical (.74); undetailed (.71);
conversational (.70); emotional (.68); free (.57); humorous (.57);
interactive (.57); intimate (.56); casual (.55); expressive (.55)
50 A – University Textbooks—Biology (M = 49.64, SD = 9.53)
A – University Textbooks—History (M = 49.21, SD = 6.77)

45
B – Popular Academic Books—History (M = 43.64, SD = 6.09)

B – Journal Articles—History (M = 40.66, SD = 4.47)


40

35

C – Popular Academic Books—Biology (M = 32.90, SD = 7.48)

30

25

D – Journal Articles—Biology (M = 21.7, SD = 3.82)


20

Objective Information Focus

Figure 6.8. Registers along Dimension 2: ‘Interactive Author Interpretation vs. Objective Information Focus’
F(5, 140) = 64.23, p < .001, R2 = .696

106
Figure 6.9. Marginal Means Plot for Publication Type and Discipline along
Dimension 2.

50

45
Dimension 2 Scores

40

35

Biology
30
History
25

20
Popular Academic Textbooks Journal Articles

Text excerpt 6.3 received the lowest score on the second perceptual dimension.
This text characterizes the type of writing that is perceived in this way. This text is a
description of the materials and methods used during an experiment. It is a concise and
seemingly unbiased informational account of past events. The authors’ effort to distance
themselves from the informational account is so extreme that the procedures are
explained exclusively in passive voice (e.g., ‘samples were taken’).

6.3 The microorganisms were prepared and inoculated according to


manufacturer's recommendations to give cell counts of 8 107 CFU ml 1
for the yeast and 1 106 CFU ml 1 for the wine LAB, which were
confirmed by viable cell counts on YPD and MRS agars (Difco),
respectively. The fermentation temperature was maintained at 19 to 20°C
in a temperature-controlled room. Samples were taken periodically during
fermentations and centrifuged at 10,000 g for 5 min, and the supernatant
was transferred into 15-ml screw-cap tubes and frozen at 18°C until
analysis. [JA_BI_04, D2 score: 15.00]

In stark contrast to the information-driven biology article, the author in excerpt 6.4
explicitly interacts with the reader multiple times (e.g., ‘watch the moon’, ‘think about
how you rewind’). The excerpt is also written in active voice (e.g., ‘the galaxies start
moving back together’). It is not entirely clear why this text was perceived as being
‘biased’ and ‘opinionated’. However, it is possible that this is also related to the author’s
reliance on the readers’ perceptions of reality in order to describe the content.

6.4 Some clear evening, watch the moon as it rises from the horizon and think
of the 380,000 kilometers between it and you. Five billion trillion times

107
the distance between the moon and you are galaxies—systems of stars—at
the boundary of the known universe. Wavelengths traveling through space
move faster than anything else, millions of meters per second. Yet the long
wavelengths that originated from faraway galaxies many billions of years
ago are only now reaching the Earth. By every known measure, all the
near and distant galaxies suspended in the vast space of the universe are
moving away from one another—which means the universe is expanding.
And the prevailing view of how the colossal expansion came about
accounts for every bit of matter in the universe, in every living thing.
Think about how you rewind a videotape on a VCR, then imagine yourself
"rewinding" the universe. As you do, the galaxies start moving back
together. [TB_BI_03, D2 score: 69.50]

History textbooks were also perceived as less informational and more author-centered.
This author-centeredness can be seen in excerpt 6.5 in which the authors make subjective,
qualitative statements (e.g., ‘monumental importance’, ‘remembered and honored’, and
‘sympathetic Italians’). While this historical account certainly transmits information, it
does so through the perspective of the authors.

6.5 World War II brought hardship and disaster to Venice. Even though the
city itself was not bombed out of respect for its monumental importance, it
was heavily impacted in the Jewish quarter, first by the national racial
laws of 1938, which restricted employment, teaching, learning, and
performance for Jews, and then in 1943 when deportation began. Many
Jews lost their lives in the Holocaust, and they are remembered and
honored in Arbit Blatas Holocaust memorials. Others, fortunately,
managed to go underground, hidden by Venetians and other sympathetic
Italians. [TB_HI_13, D2 score: 64.00]

6.7. Summary: Instrument development and data collection

The first part of this chapter described the methods used to develop, pilot, and
assess an instrument for measuring reader perceptions of writing quality and author style.
The perceptual differential items used in this instrument were modified from the well-
established semantic differential item. After constructing a list of potential items based on
related research studies, I set out to increase the comprehensiveness of the survey items
by drawing on two sources: undergraduate students and reader comments on
Amazon.com. Once a sizeable list of potential items was developed, the next step was to
assess the reliability of the items in order to eliminate or modify items that were
ineffective, confusing, or otherwise unreliable. After removing many items and
modifying some others, the instrument was ready to be used in the large-scale study.
Participants in this study were recruited and paid through MTurk. The participants
were assigned a random text and instructed to read it in its entirety. After reading the
assigned text, the participants were asked to report their perceptions of the text on 38
perceptual differential items. Each of the 150 texts was rated by 25 independent raters.

108
The results for the 150 texts were then subjected to a battery of reliability
assessments in order to measure the reliability of the items, the texts, and the raters.
Modifications based on these assessments resulted in a reduced dataset of 30 items for
146 texts.

6.8. Summary: Dimensions of stylistic perception in published academic writing

6.8.1. Dimensions of stylistic perception in published academic writing

The reduced dataset of 30 items for 146 texts was used to perform a factor
analysis to identify underlying dimensions of variation in readers’ perceptions of writing
quality and author style. This resulted in a two factor solution. The two factors were
interpreted and labeled:

1. ‘Engaging and Easy to Read vs. Boring and Hard to Read’


2. ‘Interactive Author Interpretation vs. Objective Information Focus’

Dimension scores were then assigned to each of the texts in the corpus in order to analyze
multi-dimensional patterns of variation in reader perceptions of academic writing. A
series of 2x3 factorial ANOVAs was performed for each dimension to determine whether
discipline and publication type interacted significantly. Significant interaction effects
were found for both dimensions. Therefore, simple effects were used to investigate
significant differences between disciplines within publication types and between
publication types within disciplines.

6.8.2. Interpreting variation between disciplines

The simple effects results for Dimension 1 revealed that popular academic books
and journal articles in biology had significantly lower scores than those in history. No
statistical differences were found between the Dimension 1 scores of university textbooks
in biology and history. This shows that relative to history, journal articles and popular
academic books in biology are perceived as less engaging and more difficult to read.
Reasons for this will be clearer after the perceptual results are correlated with the
linguistic results in the next chapter. There are, however, some non-linguistic factors that
are likely to play a role in these perceptions. As a discipline, history writing is more
likely to be relevant to readers than biology writing. Much of biology writing is focused
on the scientific study of organisms and processes that readers will never observe with
their natural senses. History, on the other hand, deals with past accounts of human
civilizations, events, and social issues. Non-expert readers are more likely to connect
with historical accounts and analysis than with methods and findings from the study of
biology. Interestingly, this pattern was not found with the university textbooks, possibly
suggesting biology textbook authors’ efforts to relate the material to the student readers.
The analysis of the simple effects for Dimension 2 revealed results that are very
similar to those for the first dimension. While no statistical difference was found between
the two disciplines for university textbook writing, the writing in biology popular
academic books and journal articles was perceived as being more objective and

109
informational and having less interaction and author interpretation. It is not at all
surprising that the readers in this study think that history writing is less objective and less
information-focused. A great deal of history writing is made up of narrative that is
presented from the perspective of the writer. By nature, the content is also much more
subjective because the methods are usually based on qualitative observation rather than
quantitative, empirical data. Again, it is interesting to note the convergence of these two
disciplines in the data for university textbooks.

6.8.3. Interpreting variation among publication types

6.8.3.1. Popular academic books

On both of the dimensions, the popular academic writing samples generally have
dimension scores that are lower than the university textbooks and higher than the journal
articles. The findings for Dimension 1 reveal that the popular academic writing in this
sample was generally perceived to be quite engaging and easy to read. Although popular
academic writing was reported as being slightly less readable and engaging than
university textbooks, it was much easier to read and more engaging than the journal
articles. The Dimension 2 results show that popular academic writing is quite objective
and information-focused. However, the biology popular academic texts received much
lower scores than those in history. Relative to university textbooks, popular academic
writing was perceived to be less interactive and author-focused. Within both disciplines
the popular academic texts were rated as less objective and informational than their
journal article counterparts.
Initially, these findings seemed surprising. Of the three publication types, I
expected popular academic writing to be the most engaging, easy to read, interactive, and
author-focused. It seems, however, that popular academic texts are perceived to be quite
academic to the average reader. One potential contributing factor is the source of the
popular academic texts in this corpus. All of these texts were samples from books that
were reviewed by the New York Times. This suggests that these are high profile books,
written by credible authors who value the accurate transmission of knowledge. The
situational analysis presented in Chapter 4 supports this, revealing that over 82% of the
authors of popular academic books in this corpus hold doctorate degrees, and almost 95%
of the authors have an educational background that is relevant to the topic of the book.
The results of the analyses in Chapters 4, 5, and 6 demonstrate that this particular sample
of popular academic writing is not a representative sample of the full range of popular
academic writing. While a sample of that breadth was beyond the scope of this study, it is
important to emphasize that the popular academic books in this corpus represent only a
subset of the full range of popular academic writing, and the situational, linguistic and
perceptual results from this study should not be generalized beyond that subset.

6.8.3.2. University textbooks

The university textbooks received the highest scores for both dimensions. It is not
at all surprising to find that textbooks have higher scores than journal articles on both
dimensions. However, contrary to my expectations, university textbooks were generally

110
perceived to be more readable and interactive than popular academic writing. This is a
very encouraging assessment of university textbooks, suggesting that they are written in a
style that is both accessible and student-centered. As with the other two publication types,
there was a substantial range of variation in the dimension scores for the university
textbooks. This variability in reader perceptions, along with the linguistic characteristics
associated with it, will be explored in detail in the next chapter.

6.8.3.3. Journal articles

The findings for reader perceptions of academic journal articles were consistent
with my expectations. Journal article writing is perceived as extremely boring, difficult to
read, objective, and information-focused. However, it is interesting to note that there are
larger discipline differences within the publication type of journal articles than within the
other two publication types. For both dimensions, the dimension scores for the biology
texts were generally much lower. Reasons for this are explored in Section 6.8.2.
It is important to emphasize here that the participant readers in this study, in
almost every case, would not be considered part of the intended audience of these
academic journal articles. However, the perceptual data collected for these journal
articles is interesting for at least two reasons. First, participants were sensitive to the
differences between journal articles and the other publication types. Second, the large
range of variability among journal articles was quite systematic in that the participants
reported large differences between journal articles from different disciplines. Another
factor that will be discussed further in Chapter 7 is variability across text passages from
different sections of biology articles.

6.9. Conclusion

This chapter has described the process used to develop and pilot a new survey
instrument designed to assess reader perceptions of academic writing quality and style.
The 38-item instrument was used to assess reader perceptions of 150 academic texts from
three publication types and two disciplines. A series of analyses offered strong evidence
in support of the instrument’s reliability. An exploratory factor analysis of the data
resulted in two underlying dimensions of perceived stylistic variation, which were given
the following labels: ‘Engaging and Easy to Read versus Boring and Difficult to Read’
and ‘Interactive Author Interpretation versus Objective Information Focus’. A series of
ANOVAs were run in order to determine whether discipline and publication type interact
along these two dimensions, and to pinpoint significant differences among disciplines and
publication type. The results were then interpreted and discussed.
Chapters 5 and 6 have focused on analyzing variance in the linguistics and
perceived stylistics of academic prose. The next chapter will focus on the correlations
between them within and between disciplines and publication types.

111
CHAPTER 7. CORRELATING THE TEXT-LINGUISTICS AND STYLISTIC
PERCEPTIONS OF PUBLISHED ACADEMIC WRITING

7.1. Introduction

The goal of this chapter is to triangulate the text-linguistic results in Chapter 5 and
the perceptual results in Chapter 6. This is accomplished through a series of correlational
analyses. Since the factorial ANOVAs in Chapters 5 and 6 showed strong interaction
effects between discipline and publication type for the linguistic and perceptual variables,
we can hypothesize that the strength (and possibly the direction) of these relationships
will not be the same for each discipline, publication type, and register. Therefore, Section
7.2 contains the results of these analyses for the full corpus, as well as for the two
disciplines, the three publication types, and the six registers. In addition to the
quantitative results, Section 7.2 includes qualitative descriptions of the discourse patterns
used in texts from a range of scores on the stylistic perception instrument. Finally,
Section 7.3 concludes this chapter with a brief summary of the main findings.

7.2. Linguistic predictors of perceived writing quality

In this section, I present the results of two statistical procedures, bivariate


correlation and multiple regression, in an effort to explore relationships between the
linguistic and perceptual results presented in the two previous chapters. After reporting
these results for the entire corpus, I investigate the same relationships within the
individual registers, disciplines, and publication types. As mentioned in the previous two
chapters, the data for the linguistic and perceptual dimensions for each of the groups met
the underlying assumptions associated with parametric statistical analyses, such as
Pearson’s correlation and multiple linear regression.
In the sections below I present and explore the results of several case studies in
order to qualitatively interpret important quantitative relationships. In each of these case
studies I begin by analyzing texts that received notably high or low scores on the two
perceptual dimensions relative to other texts. These analyses are approached
comparatively in that each text analysis case study includes two texts that are interpreted
relative to each other. Each text excerpt is approximately 100 words and will be
accompanied by its score on the two perceptual dimensions (PD) as well as its scores on
each of the five linguistic dimensions (LD) presented in the form of a text-linguistic
profile.

7.2.1. Complete corpus

This section contains the results for the entire corpus of published academic
writing.

7.2.1.1. Complete corpus: quantitative results

The overall correlation results for the full corpus are presented in Table 7.1. In
order to present the correlation results in an economical way throughout this chapter, the

112
full list of dimension labels is only included with this first correlation matrix. However,
the labels for the most noteworthy relationships are reported in the prose explanations.

Table 7.1. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the entire corpus.

LD1 LD2 LD3 LD4 LD5

PD1 .45** .32** .24** .23** .15

PD2 .40** .24* .11 .28** .10


*significant at the 0.05 level; **significant at the 0.01 level

LD1 – ‘Non-technical Synthesis vs. Specialized Information Density’


LD2 – ‘Definition and Evaluation of New Concepts’
LD3 – ‘Author-centered Stance’
LD4 – ‘Colloquial Narrative’
LD5 – ‘Abstract Observation and Description’

PD1 – ‘Engaging and Easy to Read vs. Boring and Difficult to Read’
PD2 – ‘Interactive Author Interpretation vs. Objective Information Focus’

The results presented in Table 7.1 reveal a number of statistically significant


relationships between the perceptual and linguistic variables. These relationships are
extremely important because they offer empirical evidence to support long-standing
beliefs about the impact of author style on the reader. Moreover, these results show that
writing style is multi-faceted and that the facets that comprise it have different
relationships with reader perceptions. It is important to note the large range of variability
in the correlation strengths reported above. For example, the relationship between (a)
perceived readability and engagement and (b) specialized information density is almost
twice as strong as the relationship between the same perceptual variable and the linguistic
dimension of colloquial narrative. It is surprising to find that the linguistic features
associated with Dimension 5, ‘Abstract Observation and Description’ were not strongly
related to either of the perceptual dimensions in the full corpus. This will be explored
further below. It is also quite surprising to find that reader perceptions of ‘Interactive
Author Interpretation’ [PD2] do not correlate with the linguistic dimension of ‘Author-
centered Stance’ [LD3] because of the items of biased—unbiased and impartial—
opinionated on PD2. However, an investigation of the relationships between these two
items and LD3 revealed moderate and statistically significant correlations of .36 with
biased – unbiased and .40 with impartial – opinionated. It seems that these relationships
are overshadowed by other perceptual variables that are not related to stance.
One limitation of summarizing the data in terms of a correlation matrix is that
there is no way of determining the degree of overlap between the linguistic dimensions in
terms of their relationship with the perceptual dimensions. It is true that the dimensions
resulted from a factor structure in which the correlations among the factors were limited.
Therefore, we would expect each dimension to account for at least some unique variance

113
in the full set of relationships. However, this is an empirical question that cannot be
answered without a more sophisticated statistical procedure. In order to answer this
question, multiple regression was used to determine (a) how much unique variance each
linguistic dimension accounts for and (b) how much shared variance can be accounted for
by the sum of the dimensions included in a given model. These regression procedures
make it possible to measure the extent to which the use of linguistic features can predict
the stylistic perceptions of readers.
Multiple regression analysis was first used to test if the linguistic variables
significantly predicted reader perceptions of readability and engageability [PD1]. The
results of the regression indicated that four of the linguistic dimensions [LD1, LD2, LD4,
and LD5] explained 30.5% of the variance (R2=.38, F(4,141) = 16.91, p < .001) (See
Appendix J for the full regression results). It was found that ‘Non-technical Synthesis’
[LD1] significantly predicted readability and engageability [PD1] (β = .30, p < .001), as
did ‘Definition and Evaluation of New Concepts’ [LD2] (β = .26, p = .001), ‘Colloquial
Narrative’ [LD4] (β = .32, p < .001), and ‘Abstract Observation and Description’ [LD5]
(β = .22, p = .002). Together, these results show that writing is perceived as being more
readable and engaging when it contains linguistic features associated with:

 More Colloquial Narrative [LD4] (β = .32)


 Less Specialized Information Density [LD1] (β = .30)
 More Definition and Evaluation of New Concepts [LD2] (β = .26)
 More Abstract Observation and Description [LD5] (β = .24)

Another multiple regression procedure was performed in order to test if the


linguistic variables significantly predicted reader perceptions of objectivity and
interactivity [PD2]. The results revealed that four of the linguistic dimensions [LD1,
LD2, LD4, and LD5] accounted for 25.6% of the variance (R2=.28, F(4,141) = 13.47, p <
.001). To summarize, ‘Non-technical Synthesis’ [LD1] (β =.27, p = .001), ‘Definition
and Evaluation of New Concepts’ [LD2] (β = .21, p < .05), ‘Colloquial Narrative’ [LD4]
(β = -.35, p < .001) and ‘Abstract Observation and Description’ [LD5] (β = .19, p < .05)
significantly predicted perceived objectivity and interactivity [PD2]. These results reveal
that readers tend to perceive writing as more interactive and less objective when it
contains features associated with:

 More Colloquial Narrative [LD4] (β = .35)


 Less Specialized Information Density [LD1] (β = .27)
 More Definition and Evaluation of New Concepts [LD2] (β = .21)
 More Abstract Observation and Description [LD5] (β = .19)

The similar results of these two regression analyses reveal that linguistic features
associated with non-technical synthesis, definition and evaluation of new concepts,
colloquial narrative, and abstract observation and description impact readers on two
distinct perceptual parameters: (1) readability and engageability and (2) objectivity and
interactiveness. It will be useful here to focus on each of the individual linguistic
dimensions and the relationship it has with reader perceptions. I will discuss each of the
four significant linguistic predictors in order of the proportion of variance they account

114
for in the model.
The variable with the largest beta weight in both models is colloquial narrative.
This dimension comprises several linguistic characteristics of narration, such as 3rd
person pronouns, past tense verbs, and present progressive aspect verbs, as well as
features often linked to conversational language, such as common phrasal verbs and the
absence of academic words. According to the regression model for PD1, the co-
occurrence patterns of these features account for just over 10% of the variability in reader
perceptions of text readability and engageability. Specifically, texts with more features
associated with colloquial narrative were typically perceived as being more readable and
more engaging. This unsurprising discovery suggests that narrative prose with more
conversational features tends to be perceived as more engaging and readable. The results
of the second regression revealed that this same dimension accounts for over 12% of the
variability in perceptions of text objectivity and interactivity. Texts perceived as
interactive and non-objective were typically written in a conversational, narrative style.
The second largest beta weight for both regression models belonged to the
dimension of ‘Non-technical Synthesis versus Specialized Informational Density’. The
negative features on this dimension (pre-modifying nouns, nouns, technical concrete
nouns, and agentless passive voice) are all strongly linked to objective, informationally
dense prose. The positive features, on the other hand, are much more closely tied to a less
technical, more synthetic prose style. The first regression model shows that texts with
fewer features associated with specialized information density were perceived as being
more engaging and readable. This relationship accounted for approximately 9% of the
variability in the model. The second regression adds to our understanding of the potential
impact of specialized, informationally dense prose by revealing that it tends to be
perceived as more objective and less interactive. This relationship accounts for more than
7% of the variability in the second regression model.
The variable with the third strongest beta weight in both models was the third
dimension, labeled ‘Definition and Evaluation of New Concepts’. This dimension is
composed of linguistic features that are associated with the language of explanatory
definitions (e.g., verb BE, predicative adjectives, present tense) and conceptual
evaluations (clauses marking stance, possibility, permission, and ability modals).
Together, the two regression models show that texts containing these features are
typically perceived as being (a) more engaging and readable and (b) less objective and
more interactive. These two relationships accounted for approximately 7% and 4%,
respectively, of the variability in the models.
The final and weakest predictor in the model is that of ‘Abstract Observation and
Description’. This dimension is made up of features of grammatical metaphor
(nominalizations) and abstract (process nouns, abstract nouns) and moderate frequency
vocabulary. The relationship this dimension has with the two perceptual variables was
unexpected in both cases. Taken together, the two regression models show that texts with
more of these features tend to be perceived as (a) more engaging and readable and (b)
less objective and more interactive. The positive correlation between the use of relatively
frequent vocabulary (501-3,000 on the list of most frequent words) and the two
perceptual dimensions is not surprising. However, the relationship between LD5 and PD1
is surprising considering the traditional assumptions that grammatical metaphor and
abstract nouns hinder comprehension. Reasons for this relationship were not immediately

115
apparent. However, a closer analysis of patterns in the data suggests that discipline and
topic differences play an important role in these relationships, particularly for journal
articles. This will be discussed further below. Finally, it should be borne in mind that this
relationship between ‘Abstract Observation and Description’ and the two perceptual
dimensions is relatively small. The qualitative analyses in later sections will also be
useful for illuminating these patterns.
Correlation analyses were also performed between the perceptual dimensions and
individual linguistic variables on each linguistic dimension (see Appendix K for the full
correlation matrix). The language features with the largest correlations were then added
to a regression model. This was done to further explore the individual linguistic features
that seem to be the most important predictors of perceived writing quality. Correlations
above .30 were found between PD1 and seven of the features on LD1. This list includes
core vocabulary (1-500) (.61), nouns (-.45), noun-noun sequences (-.44), infinitives (.38),
non-finite to-clauses controlled by stance adjectives (.33), pronoun it (.33), and passives
(-.30). Using these seven features as predictor variables, an exploratory multiple
regression procedure revealed core vocabulary (1-500) as the only significant predictor of
PD1, R2=.37, F(1, 144) = 84.89, p < .001. This shows that the use of highly frequent
vocabulary explains more than a third (37%) of the variability in reader perceptions of
perceived readability and engageability. The emergence of high-frequency vocabulary as
the only significant predictor in the regression model does not suggest that the other six
variables are unrelated to reader perceptions. Rather, this shows that there is a large
amount of shared variance between core vocabulary (1-500) and the other variables.
Therefore, core vocabulary (1-500), as the most strongly correlated variable, was the only
predictor variable retained in the final model.
Essentially the same pattern was found for PD2. Perceived objectivity and
interactivity correlated with a very similar list of six linguistic variables, including core
vocabulary (1-500) (.50), noun-noun sequences (-.41), nouns (-.41), academic vocabulary
(-.31), pronoun it (.31), and passives (-.29). However, in a multiple regression model only
core vocabulary (1-500) emerged as a significant predictor of PD2, R2=.25, F(1, 144) =
47.15, p < .001. This reveals that the use of highly frequent vocabulary accounts for 25%
of the variability in perceived objectivity and interactivity. These results also suggest that
the use of individual linguistic features can predict the perceptions readers have about
texts. However, regression analyses for the two perceptual dimensions suggest that the
use of highly frequent vocabulary in writing is the strongest predictor of both (a)
perceived readability and engageability and (b) perceived interactivity and objectivity.
This relationship was particularly strong for PD1.
A closer look at correlations between high frequency vocabulary use and the
individual perceptual items suggests that word frequency is most strongly related to
perceived readability. This is apparent from the correlations between core vocabulary (1-
500) and the following items: readable (.74), comprehensible (.63), and easy to follow
(.68). The first correlation shows that the single variable of vocabulary frequency
accounts for almost 55% of the variability in perceived text readability.
One factor that may contribute to the strong correlation between high-frequency
vocabulary use and perceived readability is the large difference between history and
biology in the use of vocabulary. The texts in the history sub-corpus use significantly
more high-frequency words (p < .001) than their biology counterparts. These stark

116
differences are likely to play a role in polarizing the perceptions of readers, thereby
strengthening the correlations. This suggests that the strong correlations in the overall
corpus are, at least to some extent, a product of the familiarity of topics and words used
in history writing. This relationship may confound the relationship between linguistic
style and topic familiarity, limiting our ability to determine the exact strength of the
relationship between vocabulary frequency and reader perceptions in the overall corpus.
A thorough investigation of this relationship is beyond the scope of this study. However,
the results presented here suggest that this will be a fruitful area of future research.

7.2.1.2. Complete corpus: qualitative results

Figure 7.1 displays two texts excerpts, a passage from a popular academic book in
history and a biology journal article passage, which represent the extreme high and low
ends of the range of PD1 and PD2 scores. Below the text excerpts is a line graph
displaying dimension score profiles for the two texts on each of the five linguistic
dimension scores. In order to appropriately interpret the PD1 and PD2 scores in Figure
7.1 it is important to note that the scores ranged from 26.00 to 80.00 for PD1 and 15.00 to
69.50 for PD2. According to the quantitative findings reported above, only four of the
five linguistic dimensions were significant predictors of reader perceptions (LD1, LD2,
LD4, and LD5). From the plots it can be seen that the history textbook passage had
higher scores on each of these four linguistic dimensions. A qualitative comparison of the
two texts reveals stark differences in their linguistic characteristics.
One striking difference between these two texts is the comparatively large number
of acronyms in the biology journal article. In this short 76-word excerpt there are sixteen
acronyms (e.g., pIRES2-EGFP; siRNAs), compared with the 96-word excerpt from a
history popular academic text contains no acronyms. Acronyms seem to be generally
more common in biology than in history, and they are particularly prevalent in the
methods section of biology articles. In biology research articles the series of words that
these acronyms represent is often left unexplained based on the assumption that the
reader has the background knowledge necessary to unpackage their meanings. In cases
where an acronym is explained, it is typically only spelled out in the first instance, with
the expectation that the reader will remember its meaning in subsequent encounters.
Within the journal article sub-corpus, there is a significant positive correlation between
word length and PD1 (r = .44, p < .001) and PD2 (r = .49, p < .001). This finding was
surprising at first because previous research has suggested that longer words, especially
nominalizations, make writing less comprehensible. However, these results suggest that
for lay readers long words are a welcome alternative to densely packaged acronyms.
Furthermore, it is not at all surprising that acronyms contribute to reduced readability and
engageability as they are extreme examples of technical, specialized lexical items.

117
Figure 7.1. Case study 1: PD scores, sample excerpts, and LD profiles for two texts
from the overall corpus.

Text: PA_HI_15 Text: JA_BI_25


PD1 Score: 71.00 PD1 Score: 34.00
PD2 Score: 51.00 PD2 Score: 16.50
Whereas today many scholars hail the "end FAK was amplified from a rat brain cDNA
of philosophy,” Lincoln hoped that politics library and inserted into pEGFP-C1 or SFB
could be elevated to the status of vector (pIRES2-EGFP with S peptide, Flag
philosophy in order to address the most tag, and streptavidin-binding peptide) as
crucial moral issues facing the nation. described previously (Wang et al., 2011).
Without a philosophical sensibility, all p130Cas constructs were subcloned from
politics would be power and America pEBG-p130cas (Addgene). Constructs
would be unable to recognize the goals that were transfected with Lipofectamine 2000
it is failing to achieve. To some historians (Invitrogen) according to the
today, the outbreak of the Civil War proves manufacturer’s instructions. siRNAs
once and for all that American history can targeting MT1-MMP (D-004145-02), FAK
be neither consensual nor exceptional. A (M-003146-02), and p130Cas (M-020465-
people who war among themselves can 01) were purchased from Thermo Fisher
hardly be said to subscribe to a synthesis. Scientific and transfected with
Lipofectamine RNAiMAX (Invitrogen)
following the standard protocol.

15

10

0
LD1 LD2 LD3 LD4 LD5
-5

-10

-15
PA_HI_15
-20
JA_BI_25
-25

The high frequency of pre-modifying nouns is another distinguishing


characteristic of the biology journal article. The journal article passage above was taken
from a text that has almost nine times more noun-noun sequences than the source text for
the popular academic excerpt above. Pre-modifying nouns, which are common in biology

118
journal articles (e.g., rat brain cDNA library; Lipofeactimine RNAiMAX) are much less
frequent in many of the other registers. Overall, noun-noun sequences had significant
negative correlations with PD1 (r = -.44, p < .001) and PD2 (r = -.40, p < .001), showing
that they contribute to perceptions of lower readability, engageability, and interactivity. It
can be seen in the biology journal article excerpt that the complexity of dense noun
phrases is compounded by the presence of the acronyms within those phrases. Together,
these features seem to contribute to the reader perceptions of this text.
Another difference between these two passages is the frequency of the passive
voice. While passive voice is used in every clause in the biology journal article passage
(e.g., p130Cas constructs were subcloned), the history popular academic passage is
written almost entirely in active voice (e.g., America would be unable to recognize).
Overall, there is a statistically significant negative correlation between the use of the
agentless passive voice and PD1 (r = .30, p < .001) and PD2 (r = .29, p < .001). These
moderate correlations offer evidence that the use of passive voice contributes to reader
perceptions of lower readability, engageability, and interactiveness. Agentless passive
voice and pre-modifying nouns are two of the strongest loading linguistic features on
LD1, which is strongly related to lower PD1 and PD2 scores.
A final qualitative observation about the two texts above is the stark difference
between the familiarity of the content in the two passages for most lay readers. Although
this is not directly quantified in the linguistic analysis, it is an important difference
between these two text excerpts. The lay readers in the sample of participant readers in
this study are much more likely to be familiar with Abraham Lincoln and the Civil War
than with rat brain cDNA and Lipofectamine transfection. The ‘average’ participant in
this sample is a U.S. citizen with some college education. Readers in this group have
probably taken multiple courses (K-12 and/or university-level) in U.S. history. However,
their background in research methods in microbiology is probably limited or non-
existent. Although the content covered by popular academic books and university
textbooks in biology is likely to be more familiar than the methods described in
JA_BI_25 above, it is still less familiar, in most cases, than the content of history writing
in the same publication types.
In addition to receiving less attention in U.S. K-12 and university general
education courses, biology writing is generally less relatable than history writing. This is
supported by the results of the individual PD1 item ‘relatable – unrelatable’, for which
the biology texts were rated as significantly less relatable than the history texts, t(148) =
3.20, p = .002. While history writing may address different cultures and time periods,
biology often addresses different species and microscopic levels.
In summary, the results of the qualitative analyses described in this section have
demonstrated and discussed several important differences between two texts which
received scores at the extreme opposite ends of the range of scores for PD1 and PD2.
Acronyms, noun-noun sequences, and passive voice constructions were shown to be
important features that are negatively correlated with perceived readability,
engageability, and interactivity. The familiarity and relevance of the content of biology
and history writing was also discussed in terms of their effects on reader perceptions.
Exploring the results of the correlation and regression analyses for the entire
corpus has proven to be useful for understanding the multi-faceted relationships between
text-linguistics and reader perceptions. However, the factorial ANOVAs in Chapters 5

119
and 6 clearly showed large differences between the patterns in the various publication
types and disciplines included in the corpus. Therefore, as mentioned above, we can
expect the strength, and possibly the direction of the relationships between text-
linguistics and reader perceptions to vary across the disciplines and publication types.
Accordingly, in Sections 7.2.2 and 7.2.3, I will report the results of a series of analyses to
investigate these relationships in the different disciplines and publication types.

7.2.2. Discipline results

This section contains correlation and regression results for the biology and history
discipline sub-corpora.

7.2.2.1. Biology sub-corpus: quantitative results

Table 7.2 displays correlations between the two perceptual dimensions and the
five linguistic dimensions within a sub-corpus containing only the texts within the
discipline of biology. It can be seen that there are no statistically significant relationships
between reader perceptions and linguistic Dimensions 4 and 5. However, the correlations
reveal noteworthy relationships between reader perceptions and the language used in
these texts.

Table 7.2. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the biology sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 .56** .51** .31** .20 .13

PD2 .44** .51** .15 .12 .09


*significant at the 0.05 level; **significant at the 0.01 level

A multiple regression procedure was used to determine whether the linguistic


variables significantly predict reader perceptions of readability and engageability in
biology writing. The results show that three of the linguistic dimensions [LD1, LD2, and
LD4] account for 41.1% of the variance (R2=.44, F(1,71) = 18.23, p < .001). To
summarize these results, ‘Definition and Evaluation of New Concepts’ [LD2] (β = .40, p
< .001), ‘Non-technical Synthesis [LD1] (β = .34, p < .01), and ‘Colloquial Narrative’
[LD4] (β = .23, p < .05) significantly predicted perceived readability and engageability
[PD1]. These results suggest that readers tend to perceive writing as more engaging and
readable when it contains linguistic features associated with:

 More Definition and Evaluation of New Concepts [LD2] (β = .40)


 Less Specialized Information Density [LD1] (β = .34)
 More Colloquial Narrative [LD4] (β = .23)

A comparison between these results and the results for the entire corpus will help

120
determine which linguistic dimensions most strongly predict reader perceptions of
readability and engageability in biology writing. It is interesting to note that the definition
and evaluation of new concepts is a much stronger predictor of readability and
engageability in biology writing (r = .51) than in the overall corpus (r = .32). This is not
surprising considering how unfamiliar many biology concepts and terms are likely to be
for non-expert readers. It is clear that the passages that explicitly define and comment on
unfamiliar concepts are the most engaging and easy to read. While the amount of dense,
specialized information is a slightly stronger predictor of readability and engageability in
the biology texts, the amount of colloquial narrative is a weaker predictor. One reason for
this is that there is very little variation in the use of the features of colloquial narrative in
the biology texts. On the other hand, there was much more variation across publication
types within biology than in history in the use of features associated with specialized
information density (see Section 5.4.1).
A second regression was performed in which the linguistic dimensions were used
as predictors of perceived objectivity and interactivity. The results revealed that only two
of the linguistic dimensions [LD1 and LD2] are significant predictors, accounting for just
over 29% of the variance (R2=.31, F(1,72) = 16.20, p < .001). In other words, texts with
more ‘Definition and Evaluation of New Concepts’ [LD2] (β = .39, p = .001) and ‘Non-
technical Synthesis [LD1] (β = .26, p < .05) were typically rated as being less objective
and more interactive [PD2]. Once again, the extent to which an author defines and
evaluates new terms and concepts is much more strongly related to text scores on the
second perceptual dimension within biology than in the results for the entire corpus.
However, the strength of the relationship between specialized, informationally-dense
prose and scores on the second perceptual dimension remained about the same.
Overall, the results for the biology sub-corpus emphasize the importance of
defining and elaborating on unfamiliar material within biology texts. Although the
participants in this sample had no assumed background in the biological sciences, these
results offer a clear picture of the impact this practice has on reader perceptions of text
quality. Additionally, these results show that the amount of information specialization
and density contributes to the difficulty and interactivity of a text. A comparison between
these relationships within the biology popular academic sub-corpus and those in the
journal article sub-corpus makes it even more apparent that stylistic and linguistic
variability in biology writing is a very important predictor of perceived writing quality.

7.2.2.2. Biology sub-corpus: qualitative results

The regression results reported that LD2, LD1, and LD4 were significant
predictors of PD1, and LD1 and LD2 were significant predictors of PD2. The distribution
of these scores can be seen in the second case study in Figure 7.2. One of the most
striking differences between the popular academic text and journal article text is the use
of adjectives. Whereas both excerpts use several attributive adjectives, the adjectives
used in JA_BI_14 are almost exclusively attributive (e.g., bacterial, initial, insufficient,
different), contributing to its relatively high ‘Abstract Observation and Description’
[LD5] score. On the other hand, PA_BI_03 uses a higher volume of predicative
adjectives (e.g., irresistible, exciting, accessible), adding to this text’s higher score on
LD2, ‘Definition and Evaluation of New Concepts’.

121
Figure 7.2. Case study 2: PD scores, sample excerpts, and LD profiles for two texts
from the biology sub-corpus.

Text: PA_BI_03 Text: JA_BI_14


PD1 Score: 69.00 PD1 Score: 37.00
PD2 Score: 53.50 PD2 Score: 19.00
I had marked the presence of fossils on a The collected periphyton from all three
local map. They were described as the sites was homogenized and filtered through
oldest fossils in the British Isles. What a 5 µm Teflon filter (Micron Separations
could be more irresistible? There was Inc.) with the filtrate containing the
something extraordinarily exciting about bacterial consortia preserved at -85 °C in a
tapping into a vein of such prehistory. The 15 % (v/v) glycerol solution. Initial
top dressing of the landscape of human experiments indicated insufficient bacterial
tenancy was stripped away to reveal some mass for the measurement of
deeper reality, layer after layer of denitrification. Therefore, prior to the start
geological time unpeeled in my of the incubations with different carbon
imagination. While my long-suffering substrates, the bacteria were pre-fed M9
mother knitted or read, I beat the rocks at Minimal Media with glucose (G) as the
Nine Wells and Porth-y-rhaw. These were carbon source until exponential growth was
places where the rocks were accessible by observed. The bacteria were then washed
foot and could be broken by sheer effort. I ten times in M9 Minimal Media without a
did not even have a proper geological carbon source.
hammer. The fever of discovery was upon
me.
15

10

0
LD1 LD2 LD3 LD4 LD5
-5

-10

-15
PA_BI_03
-20
JA_BI_14
-25

It seems that the author of JA_BI_14 is using attributive adjectives to be precise


and specific about the procedures and entities that are being described. In contrast, the

122
predicative adjectives used by the author of PA_BI_03 seem to have a more affective
function. Some of the attributive adjectives used in PA_BI_03 have a similar, emotional
function (e.g., long-suffering, sheer). These results suggest that the journal article is
written for the purpose of describing information precisely and objectively, whereas the
popular academic text is written to entertain and interact emotionally with the audience.
This interpretation aligns well with the author purposes established for popular academic
writing in Table 4.1, the first of which was to ‘entertain’.
Additional features that play a role in the degree of affect in PA_BI_03 are
emphatics (e.g., more), intensifiers (e.g., such), adverbs (e.g., extraordinarily, even,
sheer), and comparatives (e.g., deeper) and superlatives (e.g., oldest). All of these
features are examples of an effort on the part of the author to add intensity and emotional
energy to the descriptions. General adverbs and emphatics both load positively on LD1,
which is strongly related to reader perceptions of engageability and interactivity. Within
the biology sub-corpus, emphatics are quite strongly correlated with PD1 (r = .40, p <
.001) and PD2 (r = .34, p < .001). Likewise, general adverbs are also positively correlated
with PD1 (r = .39, p < .001) and PD2 (r = .39, p < .001). This offers additional support
that emotionally charged language is related to reader perceptions of writing quality,
especially within biology writing.
Another difference is the way authors make reference to themselves. In both texts
the authors are the agents of most actions. Despite the use of passive voice, we can infer
that the authors of JA_BI_14 carried out each of the methodological steps they describe
(was homogenized; were pre-fed). The only time these authors use active voice is when
the agent is non-human (e.g., initial experiments indicated). The choice to use passive
rather than active voice seems to be a deliberate effort to shift the focus of each of the
sentences away from the agents and onto the actions and the objects. Contrastingly, in
PA_BI_03, the author chooses to refer directly to herself throughout (e.g., I had marked;
I beat the rocks; was upon me). Instances of the passive voice in PA_BI_03 are cases
where the agent is either irrelevant (they were described) or generalized (accessible by
foot; broken by sheer effort). In these cases, the passive does not function to deemphasize
the author. On the contrary, their purpose is to shift the focus away from less important
details in order to continue focusing on the narrative about the author.
In conclusion, the results of the qualitative analysis of these two biology texts
serve to highlight and exemplify meaningful patterns from the quantitative results. The
frequency of predicative and attributive adjectives, as well as their functions in the text
samples, reveals meaningful findings that contrast affective prose with writing that is
objective, specific, and precise. Similarly, the use of other descriptive features such as
emphatics and adverbs also seem to augment the emotional tone of the writing. Finally,
the qualitative results revealed that the authors of journal articles in biology tend to use
the passive voice to deemphasize their role in the research. In more author-centered
prose, the passive seems to function as a means of deemphasizing irrelevant agents in
order to have the opposite effect of maintaining a focus on the author’s message.

7.2.2.3. History sub-corpus: quantitative results

Table 7.3 contains the results of a series of correlation analyses between each of
the five linguistic dimensions and the two perceptual dimensions within the history sub-

123
corpus. It can be seen that in every case the correlations are small and non-significant.
Multiple regression analyses revealed that none of the linguistic dimension significantly
predicts reader perceptions of writing quality. These low correlations show that variation
in reader perceptions is not directly attributable to the linguistic characteristics included
in this analysis. This is not particularly surprising considering the lack of variability in
reader perceptions across the history publication types (see, e.g., Figures 6.4 – 6.8).

Table 7.3. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the history sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 -.03 .08 .01 .06 .09

PD2 -.02 -.11 -.15 .22 -.08


*significant at the 0.05 level; **significant at the 0.01 level

One possible explanation for this is that the content of the history passages is, in many
cases, more familiar and relatable to the readers. Therefore, it is conceivable that reader
perceptions of history texts are more strongly influenced by the content or topic of the
passage than by the linguistic features in the texts. However, because the content of many
of the biology texts is relatively unfamiliar and unrelatable, the readers’ judgments of the
biology texts may be based more on their linguistic characteristics than their content or
topic.

7.2.3. Publication type results

This section contains correlation and regression results for the three publication
types: journal articles, university textbooks, and popular academic books. It also contains
correlation results for the two disciplines within each of the three publication types.

7.2.3.1. Journal Article sub-corpus: quantitative results

The results in Table 7.4 show the correlations between the linguistic and
perceptual dimensions for the journal article sub-corpus. These findings reveal several
statistically significant and moderate to strong correlations between the language features
used by journal article authors and reader perceptions of writing quality.

124
Table 7.4. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the journal article sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 .49** .17 .45** .49** .54**

PD2 .73** .19 .43** .52** .53**


*significant at the 0.05 level; **significant at the 0.01 level

Multiple regression was used to determine if the linguistic variables significantly


predicted reader perceptions of readability and engageability in journal articles. Three of
the linguistic dimensions [LD1, LD3, and LD4] accounted for 56.2% of the variance
(R2=.59, F(1,43) = 20.68, p < .001). In summary, ‘Colloquial Narrative’ [LD4] (β = .49,
p < .05), ‘Abstract Observation and Description’ [LD5] (β = .44, p < .001), and ‘Author-
centered Stance’ [LD3] (β = .25, p < .05) are significant predictors of text readability and
engageability [PD1]. These results suggest that readers tend to perceive writing as more
engaging and readable when it contains linguistic features associated with:

 More Colloquial Narrative [LD4] (β = .49)


 More Abstract Observation and Description [LD5] (β = .44)
 More Author-centered Stance [LD3] (β = .25)

Another regression procedure was used to measure the extent to which the
linguistic dimensions could predict the dimension scores of texts on the second
perceptual dimension. These results showed that less ‘Specialized Information Density’
[LD1] (β = .50, p < .001), more ‘Abstract Observation and Description’ [LD5] (β = .41, p
< .001), and more ‘Colloquial Narrative’ [LD4] (β = .34, p < .001) accounted for almost
75% of the variability in perceived text objectivity and interactiveness [PD2] (R2=.77,
F(1,43) = 46.73, p < .001).
Based on the results of the regression analyses reported thus far in this section, it
is apparent that linguistic variability is strongly related to reader perceptions of writing
quality in the journal article sub-corpus. However, it should be emphasized that the mean
linguistic dimension scores reported in Chapter 5 revealed very large differences between
journal articles in biology and history, highlighting the importance of the variable of
discipline within the journal article sub-corpus. This raises questions about the extent to
which discipline influenced the outcome of these statistical tests. This discipline
influence can be estimated by looking at the correlations for the biology and history
journal article sub-corpora in Table 7.5 and 7.6, respectively. It can be seen that with the
exception of the correlations between ‘Author-centered Stance’ and reader perceptions of
objectivity and interactiveness, none of the other correlations are strong across both
dimensions. This suggests, unsurprisingly, that discipline is a moderator variable in this
case. In other words, the data strongly suggest that the relationship between several of
these linguistic dimensions and the two perceptual variables depends, to at least some
degree, on the variable of discipline. The potential role of discipline, along with other
important patterns, is investigated in more detail in the next section.

125
7.2.3.2. Journal article sub-corpus: qualitative results

Figure 7.3 displays case study 3, which is a comparison of two journal articles,
one in history and one in biology. This biology journal article was perceived to be much
less readable, engaging, interactive, and interpretative than the history article. One of the
most noticeable differences between these two excerpts is the vocabulary used by the
authors. Whereas only 40% of the words in JA_BI_12 were among the 500 most frequent
words in English, 59% of the words in JA_HI_05 come from this list. Additionally, 47%
of the words in JA_BI_12, compared to only 26% for JA_HI_05, are not found on the list
of the 3,000 most frequent words in English. Examples of low-frequency words in the
biology article include: nominotypical, testaceus, and localities. It should be noted that an
additional reason for the extremely low word frequency percentages is the large number
of names used in the citations in this text. Based on the multivariate findings for LD1,
there seems to be a relationship between word frequency and the two perceptual
dimensions. An analysis of high-frequency words (1-500) in the journal article sub-
corpus reveals strong and statistically significant positive correlations between high
frequency words and PD1 (r = .70, p < .001) and PD2 (r = .74, p < .001), showing that
readers’ perceptions are clearly related to the familiarity of the words used by the authors.
The quantitative results in Section 7.2.3.1 revealed that LD3, Author-centered
Stance, is positively related to text readability and engageability. It is interesting to note
that most of the stance-related features counted as part of LD3 were clausal and are not
seen in either of these texts. However, in JA_HI_05 it can be seen that the author’s stance
is made abundantly clear through the use of adverbs (e.g., really, always, easily) and
adjectives (e.g., inferior, inevitable, exploitable, expendable).
This author uses adverbs in order to intensify his language (e.g., easily
exploitable) and even uses absolutes (e.g., always kept on the margins). Some of the
verbs used in JA_HI_05 also reveal the author’s stance (e.g., did not really want them;
denying full institutional integration; othering them in discourse). These features are
contrasted with the language used in JA_BI_12 which is relatively ‘faceless’ and
objective.
Another important characteristic that distinguishes history journal articles from
biology journal articles is the amount of narrative prose. The authors of journal articles in
history consistently use more features associated with ‘Colloquial Narrative’. Linguistic
features such as 3rd person pronouns, past tense verbs, and progressive aspect verbs are
relatively more common in the history journal articles. These features can be seen in the
excerpt from JA_HI_05. Although the authors of JA_BI_12 use the past tense, they do
not use progressive aspect or 3rd person pronouns. The narrative style of history writing is
clearly contrasted with the procedural discourse of biology writing. These linguistic
patterns reveal important differences in not only the stylistics of these two disciplines but
also in their rhetorical approaches.

126
Figure 7.3. Case study 3: PD scores, sample excerpts, and LD profiles for two texts
from the journal article sub-corpus.

Text: JA_HI_05 Text: JA_BI_12


PD1 Score: 68.00 PD1 Score: 28.50
PD2 Score: 43.50 PD2 Score: 21.00
Work made Apaches and Pawnees The nominotypical subspecies, which was
members of an organization and described from Caucasus, is distributed
community that did not really want them, only in Caucasus and Iran. P. (s. str.)
but still needed their expertise to cement testaceus (Linnaeus, 1758) has been
the organization's own power. […] The reported by various authors from different
army offered a semi-incorporated status for localities in Turkey as cited in Özdikmen
the indigenous workers, but nothing 2007, 2008. It is distributed rather widely
permanent or stable. By denying full in Turkey. P. (Phymatodellus) magnanii
institutional integration, and by othering Sama and Rapuzzi, 1999 is known only
them in discourses, the army made its from the type localities of Antalya and
Apache and Pawnee soldiers inferior and ?Içel provinces. So it is distributed only in
thus, their exclusion logical, even southern and south-western Turkey. P.
inevitable. Valued by some, but othered as (Phymatodellus) rufipes (Fabricius, 1777)
savages by many, indigenous workers were has been reported in Turkey by a few
always kept on the margins of the army authors, from various localities in both the
community as a subaltern workforce, north and south.
randomly employed, socially excluded, and
easily exploitable and expendable.
10

0
LD1 LD2 LD3 LD4 LD5

-5

-10
JA_HI_05

JA_BI_12
-15

According to Gray (2011), this is, at least in part, a function of the research
methods typically used in these two disciplines. Gray (2011) states:

127
In the qualitative research paradigm, the purpose is to describe the natural course
of events or actions, and then build interpretations upon those observations. Thus,
there is a focus on establishing a narrative that sets up a reconstructed event to
serve as evidence for the writers’ claims and interpretations (p. 148).

History writing clearly takes this qualitative, narrative approach to their arguments.
Biology writing, on the other hand, includes “description of the methodological steps
carried out,” and “relies on quantitative displays of evidence” (Gray, 2011, p. 150). Based
on this interpretation, we would expect history journal articles to generally receive higher
perceptual ratings on both PD1 and PD2. Not only is history content more likely to be
familiar to lay readers, but it also adopts a narrative style that readers have encountered
before in registers such as fiction, and it reports qualitative data that does not require as
much technical expertise to decode.
A final relationship within the journal article sub-corpus worth discussing here is
the positive correlation between features associated with ‘Abstract Observation and
Description’ and the two perceptual dimensions, particularly PD1. As discussed above
linguistic features such as nominalizations, longer words, and abstract nouns have
traditionally been associated with lower comprehensibility. However, the correlations
within the journal article sub-corpus, as well as the complete corpus, show the opposite
pattern. One explanation for this is that although elaborated, abstract language may
hinder comprehensibility to a degree, readers seem to prefer that to dense, specialized
language. A comparison of the history and biology journal article excerpts in Figure 7.3
clearly demonstrates the stark differences between these two styles of expository prose.
Linguistically, the biology article may be more concrete. However, the concrete entities
and processes being discussed, as well as the dense presentation of them, is difficult to
unpackage and interpret without a specialization in that particular area of biology. The
history excerpt, on the other hand, while much more linguistically abstract, contains
much more elaborated descriptions that can be interpreted by non-experts with relative
ease. The correlation results show that readers prefer the linguistic style of ‘Abstract
Observation and Description’ typically found in history journal articles over the
‘Specialized Information Density’ common in biology journal articles.
In conclusion, this qualitative analysis of journal article prose has identified
several important variables, both linguistic and non-linguistic, that contribute to the large
differences in reader perceptions of biology and history. Word frequency seems to play
an important role in text readability and engageability, especially among journal articles.
Author stance is an important characteristic of history journal articles that is much less
common in biology writing, which tends to be more objective and ‘faceless’. Finally, one
of the most important differences between journal articles in history and biology is the
argument structure and type of evidence used, with history being narrative and qualitative
and biology being procedural and quantitative.

7.2.3.1.1. Journal articles in biology: quantitative analysis

The correlations between reader perceptions and linguistic dimensions within the
narrow sub-corpus of journal articles in biology are displayed in Table 7.5, below. These
results show large and statistically significant correlations between the third linguistic

128
dimension, ‘Author-centered Stance’ [LD3], and both of the perceptual dimensions. This
means that as the frequency of features associated with authorial stance and author-
centeredness increase, readers tend to perceive the prose as (a) more readable and
engaging and (b) more interactive and less objective. These relationships are especially
intriguing when we consider that journal articles in biology contain fewer features
associated with ‘Author-centered Stance’ than any other register (see Figure 5.7).
Apparently, while author-centeredness and stance are not key defining characteristics of
journal articles in biology, the extent to which those linguistic features are used is an
important predictor of reader perceptions of writing quality.

Table 7.5. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the journal articles in biology sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 .17 .17 .51** .15 .32

PD2 .21 .21 .52** .10 .34


*significant at the 0.05 level; **significant at the 0.01 level

Another notable relationship seen in the table above occurs between linguistic
dimension 5, ‘Abstract Observation and Description’ [LD5] and the two perceptual
dimensions. While these relationships are not statistically significant (likely because of
the small sample size), they are moderate and worth discussing and exploring further. As
with the other correlations previously mentioned, these relationships are somewhat
puzzling considering the general lack of features generally associated with ‘Abstract
Observation and Description’ in biology journal articles. These relationships are explored
qualitatively in the next section.

7.2.3.1.2. Journal articles in biology: qualitative analysis

The two sets of linguistic features that merit further investigation in this section
are those associated with LD3 and LD5. According to the quantitative results, the
features of Author-centered Stance are positively correlated with both perceptual
dimensions. In other words, texts with more features of author-centeredness and stance
were perceived as more readable, engaging, interactive, and interpretive. Similar
relationships have been observed in the overall corpus and several of the sub-corpora, but
not with this degree of strength. Unlike JA_HI_05 in the previous section, the linguistic
features in this excerpt are both grammatical and lexical. Text JA_BI_13 contains
examples of stance verb + that complement clauses (e.g., asserts that), stance verb + to
clause (e.g., allow corals to; been shown to), stance noun + that clause (e.g., observation
that), modals of possibility (e.g., this shift may aid; communities may allow), and stance
adverbs (e.g., indeed). This noteworthy conglomeration of various grammatical and
lexical stance features occurs in an article introduction. Text JA_BI_25, on the other
hand, was taken from a methods section.

129
Figure 7.4. Case study 4: PD scores, sample excerpts, and LD profiles for two texts
from the biology journal article sub-corpus.

Text: JA_BI_13 Text: JA_BI_25


PD1 Score: 48.00 PD1 Score: 34.00
PD2 Score: 29.00 PD2 Score: 16.50
The recently proposed “Coral Probiotic MT1-MMP antibody (MAB3328) was
Hypothesis” asserts that changes in purchased from Millipore. The antibodies
environmental conditions result in changes against FAK, paxillin, and p130Cas were
in the diverse, metabolically active from BD. Anti-Src antibody (sc-18) and
microbial population found in and on the anti-GST (sc-138) were purchased from
coral hosts. This shift may aid the coral Santa Cruz Biotechnology, Inc. The
holobiont in adapting to the new phospho-Src family antibody pY416 was
conditions. This hypothesis is supported by from Cell Signaling Technology, the
observations that corals harbor a wide and monoclonal vinculin antibody was from
diverse bacterial population and that these Sigma-Aldrich, anti-Flag antibody from
coral-associated bacteria undergo a rapid Cell Signaling Technology, and anti-GFP
change in population in response to antibody from Roche. Goat anti–rabbit or
alterations in environmental conditions. goat anti–mouse secondary antibodies
These changes in associated microbial conjugated to either Alexa 488 or 594 were
communities may allow corals to overcome from Invitrogen. BB-94 was purchased
disease and perhaps develop resistance to from Tocris Bioscience. All the other
certain microbial-driven diseases. Indeed, chemicals and reagents, unless otherwise
coral mucus-associated bacteria have been stated, were from Sigma-Aldrich.
shown to possess antibacterial activity.
10

0
LD1 LD2 LD3 LD4 LD5
-5

-10

-15

JA_BI_13
-20
JA_BI_25

-25

This passage has no stance-related features. It is essentially a description of the sources

130
for each of their materials and instruments that is organized into an objective and
somewhat formulaic list. It seems that the amount of Author-centered Stance is an
important component of writing that is perceived as engaging and interactive. Two
possible interpretations for these positive relationships between the perceptual
dimensions and authorial stance are (a) the logical nature of the arguments and (b) the
cohesive nature of the organizational structure.
The organization in JA_BI_13 is a logical succession of related propositions, each
of which builds on previous assertions and lays the foundation for subsequent statements.
This passage is both cohesive and coherent. The strongest evidence of cohesion in
JA_BI_13 is the cohesive devices used at the beginning of sentences 2-5.
For example, in the second sentence, ‘This shift’ is an explicit anaphoric reference to the
‘changes’ mentioned in the first sentence, and ‘Indeed’ in the fifth sentence is an adverb
linking a new assertion with the claim made in the previous sentence. The cohesive
nature of JA_BI_13 is in stark contrast with the list-like structure of JA_BI_25, which is
composed of descriptive statements that could conceivably occur in any order without
altering their meaning. Although this passage is coherent, it lacks explicit cohesion
markers. The presence of cohesive devices and a logical organizational structure are
clearly related to reader perceptions of readability and engageability.
A final set of features worth investigating in these two texts is those related to
‘Abstract Observation and Description’. Although previous research has often shown that
features such as nominalizations and long words tend to hinder comprehensibility, as
discussed above, they seem to be having the opposite effect in this study. It should be
noted, first of all, that both of the texts excerpts below received low perceptual scores and
neither has many of the features associate with ‘Abstract Observation and Description’.
However, in relation to each other, JA_BI_13 is clearly (a) more readable and engaging
and (b) more interactive and less objective, and it contains many more features of
‘Abstract Observation and Description’. An analysis of these two texts suggests the
possibility of a moderator variable of journal article section. Support for this can be seen
in Figure 7.5 below which shows the mean LD5 scores for the four major journal article
sections. The mean LD5 score for Introduction texts is substantially higher than all of the
other sections. The introduction sections of articles are also generally rated higher on
both perceptual dimensions. However, according to the quantitative results, this is more
strongly related to features such as the amount of Author-centered Stance and the degree
of textual cohesion. The smaller, non-significant relationships between ‘Abstract
Observation and Description’ [LD5] and the two perceptual dimensions may simply be a
by-product of those relationships. Finally, as mentioned above, the size of the dimension
scores for these texts is relative to the other texts. Accordingly, it is likely that lay readers
would prefer to read texts with nominalizations, abstract nouns, and longer words than
texts with technical, low-frequency vocabulary and dense noun phrases.
In summary, the two texts compared here are dramatically different in their use of
‘Author-centered Stance’ [LD3], textual cohesion, and ‘Abstract Observation and
Description’ [LD5]. The abundance of stance features in the first text seems to be a
function of the introductory and theoretical nature of this passage, whereas the lack of
stance features in the second text is likely to be a result of its objective, methodological
focus. The textual organization of the first text is both explicit and cohesive, and this
logical and cohesive organization seems to be related to perceptions of readability and

131
engageability [PD1]. On the other hand, the list-like nature of the second text seems to
function as a concise and objective overview of the methods used in that study. Finally,
the data suggest that the positive correlation between ‘Abstract Observation and
Description’ and reader perceptions on both dimensions is a product of the different
purposes of the four journal article sections.

Figure 7.5. Profile plot of mean ‘Abstract Observation and Description’ scores for
journal articles in biology across article sections.

0
Introduction Methods Results Discussion
-1

-2

-3

-4

-5

7.2.3.1.3. Journal articles in history: quantitative analysis

The correlation results displayed in Table 7.6 reveal that two of the linguistic
dimensions, ‘Author-centered Stance’ [LD3] and ‘Colloquial Narrative’ [LD4] are related
to reader perceptions of objectivity and interactiveness [PD2]. The direction of the
relationship shows that readers tend to perceive texts as being less objective and more
interactive when the author uses more language features associated with (a) author-
centeredness and stance and (b) a conversational, narrative style. It is not surprising that
relationships exist between perceptions of less objectivity and the language of author
stance, on the one hand, and between perceptions of interactivity and colloquial narrative,
on the other. These relationships have been discussed and qualitatively investigated in
previous sections.

132
Table 7.6. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the journal articles in history sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 -.26 -.15 -.14 .34 .27

PD2 .12 -.19 .42 .46* .07


*significant at the 0.05 level; **significant at the 0.01 level

There was also a moderate but non-significant relationship between LD4 and
PD1, showing that readers tend to perceive texts as more readable and engaging when
authors narrate with a colloquial style. These relationships will be investigated more
thoroughly from a qualitative perspective below.

7.2.3.2. University textbooks

From the table below, it can be seen that there are no strong or statistically
significant relationships between reader perceptions and language use in the university
textbook sub-corpus. These results suggest that other factors contribute more to the
perceptions readers have about university textbooks passages than the linguistic choices
of authors. In Section 7.3, I explore additional variables that may influence reader
perceptions more than language use.

Table 7.7. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the university textbook sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 .12 .11 .20 -.05 .03

PD2 .12 .01 -.001 .14 -.22


*significant at the 0.05 level; **significant at the 0.01 level

7.2.3.2.1. University textbooks in biology: quantitative analysis

Within the university textbooks in biology sub-corpus, there are two moderate,
albeit non-significant relationships, worth discussing further. The first of these
relationships is between the first linguistic dimension and the first perceptual dimension.
The direction of this relationship shows that texts with more features associated with
‘Specialized Information Density’ [LD1] tend to be perceived as less readable and less
engaging. This is a relationship that was seen multiple times in the various correlational
analyses performed in previous sections.

133
Table 7.8. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the university textbooks in biology sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 .39 .04 .23 .19 -.25

PD2 .26 .06 .02 .22 -.31


*significant at the 0.05 level; **significant at the 0.01 level

One relationship that has not been previously seen is a negative correlation
between linguistic dimension 5 and the second perceptual dimension. Although this
relationship is non-significant and only moderate, it demonstrates the variable nature of
reader perceptions of the ‘Abstract Observation and Description’ [LD5] style. Whereas in
previous analyses ‘Abstract Observation and Description’ has been related to reader
perceptions of author interpretation and interactiveness [PD2] in texts, the opposite seems
to be true in this sub-corpus. Possible causes for this will be considered in the qualitative
analysis in the next section.

7.2.3.2.2. University textbooks in biology: qualitative analysis

Case study 6, in Figure 7.6, contains descriptive information and text excerpts
from two biology textbook passages. It can be seen that text TB_BI_24 received higher
perceptual ratings than TB_BI_16 on both PD1 and PD2. TB_HI_24 also has a higher
score on LD1, Non-technical Synthesis vs. Specialized Information Density. It can be
seen that this text has more adverbs (e.g., much; simply), adverbial conjuncts (e.g.,
similarly), verb HAVE (e.g., have a set; have more of), and relative clauses (e.g.,
probability that it rains). Additionally, 69% of the words in JA_BI_24 are among the 500
most frequent words in English, whereas only 52% of the words TB_BI_16 were on that
list. Examples of lower frequency content words in TB_BI_16 are microbial, inocula,
bioremediation, and microbial. TB_BI_16 also uses more of the negative features from
LD1, such as noun-noun sequences (e.g., resident microorganisms; risk assessments).
Taken together, these LD1 features clearly distinguish these texts and seem to be closely
related to reader perceptions of readability and engageability.
As mentioned above, the negative relationship between LD5, ‘Abstract
Observation and Description’, and the two perceptual dimensions is unique to the biology
textbook sub-corpus. Although these correlations are not statistically significant, they are
moderately strong and therefore worth further qualitative discussion. In order to measure
this difference from another perspective, I compared the correlations between LD1 and
LD4 in the overall corpus and the biology textbook corpus. Whereas the correlation
between these two linguistic dimensions is essentially zero in the overall corpus, there is
a moderate negative correlation in the biology textbook sub-corpus. This reveals that
within biology textbooks the negative features of LD1 and the positive features of LD5
tend to co-occur.

134
Figure 7.6. Case study 5: PD scores, sample excerpts, and LD profiles for two texts
from the biology textbook sub-corpus.

Text: TB_BI_24 Text: TB_BI_16


PD1 Score: 73.50 PD1 Score: 53.00
PD2 Score: 55.50 PD2 Score: 48.00
Many unpredictable phenomena have a set Microorganisms suitable for the
of possible outcomes. In some cases there remediation are generally present in
may be only two possibilities, such as groundwater or soil, and when nutrients are
whether or not it rains on a given day. added to the environment, many
Similarly, we may consider whether or not physiological types of microbes in the soil
a species will go extinct in a given time will grow, including those that are
period. Other phenomena will have more appropriate for bioremediation. Alternately,
than two outcomes. The probability of a microbial augmentation can be used, which
particular outcome can be determined is the addition of microbial inocula
based on considerations of different containing indigenous environmental
temporal or spatial scales. The probability bacteria that may be selectively grown in
that it rains tomorrow could be judged on the laboratory. Under appropriate
how many days it has rained in the last conditions, natural attenuation of polluted
month; for example, 28 out of 30 days. We environments occurs as resident
might wish to contrast this probability microorganisms conduct bioremediation
(28/30) with that of equal probability (1/2) over a period of time. As in the case of all
that it rains or does not. The much higher environmental issues, risk assessments
probability of rain during that month may must be conducted to ensure that microbial
indicate that we are in a wet season or treatment of organic pollution does not
simply an area of high rainfall. produce harmful effects.
20

TB_BI_24
15
TB_BI_16

10

0
LD1 LD2 LD3 LD4 LD5

-5

-10

135
A perusal of the two text samples in Figure 7.6 reveals that there is indeed a
marked difference between the two texts in their use of the features associated with LD5.
The normed rates of occurrence for the features also support this stark difference between
the two texts. TB_BI_16 has nearly twice as many nominalizations and more than twice
as many attributive adjectives. The combination of nominalizations, attributive
adjectives, and the noun-noun sequences mentioned above make the noun phrases in
TB_BI_16 exceptionally dense (e.g., microbial augmentation; indigenous environmental
bacteria; natural attenuation of polluted environments).
This qualitative analysis has revealed that although biology textbooks do not
typically have high frequencies for the features associated with ‘Abstract Observation
and Description’ and ‘Specialized Information Density’, these features seem to co-occur
when they are present. This compounds the challenges presented by both sets of features
and contributes to a negative relationships between LD5 and the two perceptual
dimensions.

7.2.3.2.3. University textbooks in history: quantitative analysis

Within the history university textbook sub-corpus there was only one notable
relationship between language use and reader perceptions. The strong and statistical
relationship between the fifth linguistic dimension and the first perceptual dimension
shows that increased ‘Abstract Observation and Description’ is related to reader
perceptions of higher readability and engageability in texts.

Table 7.9. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the university textbook sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 -.19 -.04 .20 -.16 .47*

PD2 -.08 -.17 -.03 .13 -.09


*significant at the 0.05 level; **significant at the 0.01 level

7.2.3.3. Popular academic books: quantitative analysis

As with the university textbook sub-corpus, there were no notable correlations


between reader perceptions and language use in the popular academic sub-corpus (see
Table 7.10).

136
Table 7.10. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the popular academic book sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 -.20 -.01 .06 -.12 .17

PD2 -.24 -.21 .07 .19 -.06


*significant at the 0.05 level; **significant at the 0.01 level

7.2.3.3.1. Popular academic books in biology: quantitative analysis

Similarly, the correlation analysis for the biology popular academic books sub-
corpus did not reveal any meaningful correlations (see Table 7.11). While there are many
possible reasons for the lack of relationships found here, it is probable that there are
additional factors, linguistic or otherwise, within biology popular academic books that are
influencing reader perceptions more than the limited set of linguistic features accounted
for in this study.

Table 7.11. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the popular academic books in biology sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 -.24 .08 .18 .03 -.12

PD2 -.13 -.01 .20 .08 -.26


*significant at the 0.05 level; **significant at the 0.01 level

7.2.3.3.2. Popular academic books in history: quantitative analysis

There were strong and statistically significant relationships between the first
perceptual dimension and linguistic dimensions 4 and 5. Surprisingly, for this particular
sub-corpus, more linguistic features associated with ‘Colloquial Narrative’ was related to
readers perceiving texts as being less engaging and less readable. This direction of this
relationship is opposite of every previous statistical relationship on this linguistic
dimension. I attempt to explain this in subsequent qualitative analyses by looking
qualitatively at texts within this sub-corpus. The other notable relationship shows that a
higher use of features associated with ‘Abstract Observation and Description’ [LD5] has
a relationship with more readable and engaging texts, according to reader perceptions.

137
Table 7.12. Correlations between the two perceptual dimensions and the five
linguistic dimensions for the popular academic books in history sub-corpus.

LD1 LD2 LD3 LD4 LD5

PD1 -.03 .09 -.08 -.54** .50*

PD2 -.17 -.16 -.05 -.04 .11


*significant at the 0.05 level; **significant at the 0.01 level

7.2.3.3.3. Popular academic books in history: qualitative analysis

The quantitative results reported above showed that, within history popular
academic books readers tend to rate a text as more readable and engaging when it
contains more linguistic features associated with ‘Abstract Observation and Description’
[LD5] and fewer features associated with ‘Colloquial Narrative’ [LD4]. Initially, these
results seem counterintuitive. Case study 6 in Figure 7.7 contains the information for two
texts, PA_HI_25, which received a PD1 score of 73.50, and PA_HI_04, which has a PD1
score of 44.00. While both of these texts are historical overviews, they are inherently
different in several ways. The overview in PA_HI_25 is much more general and
expository, whereas PA_HI_04 is a more detailed and descriptive narrative. The author of
PA_HI_04 includes relatively precise dates (e.g., summer of 1918; By the 1970s), and
specific geographical information (630 kilometers southeast of Moscow; for hundreds of
kilometers around). PA_HI_25, on the other hand, includes no dates and only very
general geographical references (everywhere north of Maryland; in half of the new
nation). The narrative nature of PA_HI_04 resulted in a high score on LD4, and the time
references, combined with the lack of long words, nominalizations, and attributed
adjectives contributed to its relatively low score on LD5. It is possible that readers found
the prose in PA_HI_04 to be overwhelmingly detailed, making it difficult to follow.
In addition to being a much more specific and detailed narrative, PA_HI_04, we
can presume, contains content that is much less familiar to the readers in this study (i.e.,
lay readers in the U.S.). Although PA_HI_25 contains very few features associated with
Colloquial Narrative, the concepts are related to a period in American history that most of
the readers would have studied in school. A simple comparison of the proper nouns in
both texts suffices to demonstrate this difference. PA_HI_25 uses terms such as the
American Revolution, the United States, Republican, Christian, and Maryland. On the
other hand, PA_HI_04 focuses on Penza, Moscow, Bolshevik rule, Lenin, and the Soviet
regime. Considering that the education system in the U.S. places much more emphasis on
18th century American history than on 20th century Soviet Union history, the lower PD1
ratings for PA_HI_04 is probably more closely related to content than linguistic style.

138
Figure 7.7. Case study 6: PD scores, sample excerpts, and LD profiles for two texts
from the history popular academic books sub-corpus.

Text: PA_HI_25 Text: PA_HI_04


PD1 Score: 73.50 PD1 Score: 44.00
PD2 Score: 51.00 PD2 Score: 42.00
The American Revolution launched the In the summer of 1918 Penza, 630
debate over the future of blacks in the kilometers southeast of Moscow, had been
United States. As long as most blacks had the site of one of the first peasant risings
been slaves, law and custom had fixed their against Bolshevik rule. Lenin blamed the
place, fastening the institution on colonial revolt on the kulaks (better-off peasants)
America without confronting conscience, and furiously instructed the local Party
save for isolated exceptions. The American leaders to hang in public at least one
Revolution, however, inspired the first hundred of them so that "for hundreds of
wave of emancipations. Republican kilometers around the people may see and
natural-rights ideology reinforced latent tremble ..." By the 1970s, however, Penza's
Christian benevolence to abolish slavery counter-revolutionary past was long
everywhere north of Maryland, thereby forgotten, and Lenin's bloodthirsty orders
endowing the new revolutionary order with for mass executions were kept from public
a glow of moral legitimacy and ideological view in the secret section of the Lenin
consistency. The vanguard republic became archive. One of the most striking
the vanguard emancipator, at least in half characteristics of the best literature
of the new nation. produced under the Soviet regime is how
much of it was written in secret.
15

PA_HI_25
10
PA_HI_04

0
LD1 LD2 LD3 LD4 LD5

-5

-10

139
In summary, it appears that in some cases the specific and highly detailed
descriptions contained in the narrative of some of the history popular academic passages
are difficult for readers to understand and engage with. Additionally, it seems that the
topic of these passages plays an important role in their readability. Understandably, less
familiar topics seem to receive lower readability and engageability ratings than those that
are more familiar.

7.3. Summary of findings

The results of the correlation and regression analyses reported in this chapter have
revealed a number of meaningful patterns in the data. Many of these patterns have
offered evidence in support of traditional beliefs about the linguistic components of
quality writing. The general trend showed that writing is perceived to be (a) more
readable and engaging and (b) less objective and more interactive when the writing (a) is
written in narrative format with a colloquial style, (b) is non-technical and evaluative
rather than specialized and informationally-dense, and (c) contains definitions of and
elaborations on new terms and concepts. These patterns were found in the overall corpus
and in several of the other sub-corpora analyses.
Some of the results presented here have also identified relationships that
challenge traditional assumptions about the relationships between language use and
writing quality. For example, one relationship that consistently emerged in the data
showed that texts with more linguistic features associated with ‘Abstract Observation and
Description’ (e.g., nominalizations, abstract nouns, moderate frequency words) were
typically rated as being more readable, engaging, interpretive, and interactive. The
qualitative analyses were helpful in interpreting these puzzling relationships.
The results reported in Section 7.2.1.1 revealed many notable correlations
between individual linguistic features and reader perceptions. This finding is important
because it demonstrates the profound impact that individual linguistic choices can have
on reader perceptions. The results of regression analyses for both PD1 and PD2 showed
that the most important predictor of both perceptual dimensions was the percent of the
text that is composed of high frequency vocabulary. This finding suggests that although
grammatical choices are clearly important, vocabulary use seems to have the biggest
impact on reader perceptions, particularly with regard to perceived text readability. The
results of this study have shown that vocabulary frequency can account for 55% of the
variability in perceived text readability. This finding has important implications for
research on vocabulary and text readability.
Finally, there were a handful of relationships within the specific sub-corpora
analyses that contradicted the general patterns listed here. One example of this was the
negative correlation between ‘Abstract Observation and Description’ and the two
perceptual dimensions within the biology textbook sub-corpus. Another example was the
strong negative correlation between ‘Colloquial Narrative’ and perceived readability and
engageability in the popular academic books in history sub-corpus. Although these
relationships were explored qualitatively in the sections above, the conclusions were
tentative and exploratory. Future research will be needed in order to determine (a)
whether these relationships resurface in similar studies and (b) the underlying reasons for
these relationships.

140
Overall, the analyses presented here have offered unprecedented insights into the
relationships between the linguistic styles of writers and reader perceptions of writing
quality. In the next chapter, I endeavor to present a synthesis of the findings presented in
this dissertation study. I also discuss a range of practical implications and potential
applications of the research presented here. Finally, I discuss important limitations of the
research presented here and directions for future work in this area.

141
CHAPTER 8. SYNTHESIS AND CONCLUSION

8.1. Introduction

The overarching goal of this dissertation study has been to investigate


relationships between the linguistic choices of academic writers and reader perceptions of
writing quality across disciplines and publication types. In order to accomplish this goal,
I constructed a corpus of writing samples balanced across three publication types (journal
articles, university textbooks, and popular academic books) and two disciplines (history
and biology) (see Chapter 3). I then developed and applied a comprehensive framework
to analyze and compare the situational characteristics of these publication types and
disciplines (see Chapter 4). The results of a MD analysis of linguistic variation,
including variation across publication types, disciplines, and individual authors, were
described in Chapter 5. In Chapter 6, I described the development and use of a new
method called Stylistic Perception analysis, and reported reader perceptions of writing
quality across disciplines, publication types, and authors. Finally, in Chapter 7, I
described the results of a series of correlational analyses performed in an attempt to
identify and interpret linguistic predictors of perceived writing quality.
The results presented and discussed in these previous chapters have revealed that
the linguistics of published academic writing and the stylistic perceptions of readers are
complex, multi-faceted, and interrelated phenomena. While this study has not presented
a complete picture of these phenomena nor their relationships with each other, it has
contributed a wealth of previously unknown information and proposed novel methods for
further research in this area. The goals of this chapter are to summarize and further
discuss the findings reported in previous chapters (Section 8.2), highlight key
methodological decisions and their advantages in this study (Section 8.3), discuss some
of the implications of these results (Section 8.4), and conclude with an overview of the
limitations of this study and needed future research in this area (Section 8.5).
The discussion in this chapter is based on the summary of results displayed in
Table 8.1. This table displays a very general overview of the characteristic linguistic and
perceptual dimensions for each of the six registers. The dimension labels for the
linguistic and perceptual dimensions are only included when the mean score for a given
group falls above or below the mean by at least one-quarter of a standard deviation. This
section will conclude with a discussion of the main linguistic predictors of writing
quality.

142
Table 8.1. Summary: Distinctive linguistic and perceptual characteristics by register.

Register Situational Characteristics Characteristic Linguistic Characteristic Perceptual


Dimensions Dimensions
Journal Articles
Biology high level of shared ++++ Specialized Information Density ++++ Boring and Difficult to Read
technical knowledge -- Definition/Evaluation of New Concepts ++++ Objective Information Focus
between author and target -- Author-centered Stance
audience; specific topic -- Colloquial Narrative
with narrow focus; - Abstract Observation and Description
extensive description of
procedures and
explanation of empirical
findings; interpretations
based on data

History high level of shared - Definition/Evaluation of New Concepts + Boring and Difficult to Read
technical knowledge + Author-centered Stance
between author and targe ++ Abstract Observation and Description
audience, report of
author’s argument or
observational findings;
interpretations based on
author claims

Textbooks
Biology limited shared technical ++++ Definition/Evaluation of New Concepts ++ Engaging and Easy to Read
knowledge between author - Author-centered Stance ++ Interactive Author Interpretation
and target audience; -- Colloquial Narrative
purpose of introducing
important principles,
concepts, and ideas; based
on secondary evidence

143
History narrative accounts of + Non-technical Synthesis ++ Engaging and Easy to Read
historical events, cultures, - Definition/Evaluation of New Concepts ++ Interactive Author Interpretation
peoples, and artifacts; past ++ Colloquial Narrative
research often applied to
current issues

Popular
Academic
Books
Biology very little shared technical ++ Non-technical Synthesis ++ Objective Information Focus
knowledge between author + Definition/Evaluation of New Concepts
and target readers; ++ Author-centered Stance
newsworthy, rare, or - Abstract Observation and Description
interested topics or issues;
purpose of entertaining,
praising, and synthesizing
ideas and findings

History very little shared technical + Non-technical Synthesis + Engaging and Easy to Read
knowledge between author ++ Author-centered Stance + Interactive Author Interpretation
and target readers; past ++ Colloquial Narrative
discoveries explained in - Abstract Observation and Description
terms of current relevance;
newsworthy, rare, or
interested topics or issues;

+ | - : plus or minus ¼ standard deviation from the mean; ++ | -- : plus or minus ½ standard deviation from the mean;
++++ | ---- : plus or minus 1 standard deviation from the mean;

144
8.2. Linguistic variation and perceived quality of published academic writing

This section begins by providing a summary of the main linguistic and perceptual
findings from Chapters 5-7 and discusses them in terms of the situational characteristics
described in Chapter 4. This summary is organized according to publication type, which
is the preeminent independent variable in this study. Throughout this section I refer
frequently to the information included in Table 8.1.

8.2.1. Journal articles in biology and history

The analyses in Chapters 5 and 6 revealed more variation between disciplines


within journal articles than the other two publication types on three of the linguistic
dimensions (LD1, LD3, LD5) and both of the perceptual dimensions. However, LD2
was the only dimension for which there was no statistically significant difference
between biology and history. Therefore this section begins with a discussion of the
second linguistic dimension. I then summarize the key linguistic and perceptual
differences between journal articles in history and biology. In both cases, I compare the
dimension scores to the other publication types and disciplines and interpret them based
on the relevant situational characteristics discussed in Chapter 4 and summarized in Table
8.1.
Based on the results in Chapters 5 and 6, journal articles in history and biology
seem to be similar in only a couple of respects: (a) they were perceived as ‘Boring and
Difficult to Read’, although to different degrees and (b) they lack ‘Definition and
Evaluation of New Concepts’. While the large disciplinary differences on the other
dimensions were unexpected, this particular similarity is not surprising. Journal articles,
regardless of discipline, have a high level of technical and specialized knowledge that is
shared between the authors and their target audiences. Unlike the scholar-to-general
public and scholar-to-student natures of popular academic books and university
textbooks, respectively, journal articles are a clear example of scholar-to-scholar writing.
Thus, defining and elaborating on new concepts is not one of the major goals of this
publication type.
As mentioned above, there was a surprising number of large differences between
biology and history journal articles. For the linguistic dimensions, biology writing was
characterized by (a) more specialized and densely packaged information, (b) less author-
centeredness and stance, (c) less colloquial narrative, and (d) less Abstract Observation
and Description than history writing. With regard to LD1, it seems that the difference
between biology and history is being driven, at least to some extent, by the situational
characteristic of topic. The topic of history journal articles is the author’s interpretation
of or argument about historical observations. Biology articles, on the other hand, are
focused on objectively reporting experimental or observational findings. The quality of
biology articles is based, in large part, on the quality and the amount of empirical
evidence included. This results in precise descriptions of methodological procedures and
dense descriptions of research findings (see 8.1)

145
8.1 Biology Journal Article
Particularly, pH and dissolved oxygen strongly increased in summer due
to photosynthetic activities (pH 8.0–9.0 (annual mean 8.41 ± 0.23), O2 9–
12 mg L-1 (annual mean 10.49 ± 1.67) and Chl a 1–10 µg L-1 (annual
mean 4.1 ± 1.7)). [JA_BI_15]

In contrast, history articles tend to include many different sources of evidence to build a
case for a particular perspective on peoples, events, and artifacts from the past (see 8.2).

8.2. History Journal Article


The late Professor Wu Yujin, general editor of the above-mentioned
general world history textbook, who consistently occupied himself with
theoretical issues of world history, came to the conclusion that, according
to Marxism, human history became world history, that is, an
interconnected whole, only when human society had reached a certain
stage. [JA_HI_13]

A comparison of these two texts also illustrates the much higher use of features
associated with Colloquial Narrative and Abstract Observation and Description within
journal articles in history. Moreover, the combination of these features makes it
unsurprising that biology journal articles are perceived by non-expert readers as being (a)
more boring and difficult to read and (b) more objective and information-focused than
history articles.
A final notable difference between the journal articles in history and biology is
the much higher use of features of Author-centered Stance in history articles. A
comparison between 8.1, above, and 8.3, below, demonstrates this difference clearly.

8.3 History Journal Article


It is interesting that Hearn seems to presume that
(hetero)sexuality/marriage ("a man and a woman") implies a transparency
of intention, a kind of emotional knowledge. This supports the theory that
intimacy was an important site for knowledge production. [JA_HI_15]

While the biology author provides an objective description of experimental results, the
history excerpt is focused on the author’s stance regarding evidence that (a) “is
interesting” and (b) “supports” a particular theory.
Together these findings strongly support Gray’s (2011) findings about the large
disciplinary differences within the publication type of academic journal articles, and the
fact that the “complex interplay between various situational characteristics, and these
interactions have clear links to the patterns of linguistic variation that we see across
disciplines” (p.166).

146
8.2.2. University textbooks in biology and history

In contrast with the large disciplinary differences found in the journal article sub-
corpus, the effect of discipline within the publication type of university textbooks was
relatively small with respect to all but two of the linguistic and perceptual dimensions.
Therefore, I begin this section with a discussion of the linguistic and perceptual
dimensions that revealed similarities between disciplines, and interpret them in terms of
relevant situational characteristics. I then address the two linguistic dimensions that
revealed disciplinary differences in university textbooks.
The university textbooks in this corpus, regardless of discipline, can be generally
characterized by their use of language features associated with relatively (a) high Non-
technical Synthesis, (b) low Author-centered Stance, and (c) high Abstract Observation
and Description. As pedagogical manuals, textbooks are designed to introduce important
concepts and principles in a general topic area. Therefore, one of the primary goals of a
textbook author is to objectively synthesize large amounts of information (see 8.4).

8.4 Biology University Textbook


The complement system has two major effects. It can act directly on
invading microbes or it can act in association with antibody to cause cell
lysis. It does so by puncturing holes in the microbial cell membrane. The
complement system also binds to the outside of microbes, making them
much easier for phagocytes to engulf. [TB_BI_15]

This information, in most cases, comprises large bodies of scientific research. However,
authors tend to focus more on synthesizing what we know rather than describing the
details of how we know. Not surprisingly, this type of writing was generally perceived as
engaging, easy to read, and interactive. It was somewhat surprising to observe that
university textbooks were also generally rated as being the least objective of the three
publication types. However, this is clearly related to the interactive nature of pedagogical
texts.
Regarding discipline differences, there were statistically significant differences
between history and biology textbooks on two of the linguistic dimensions: LD2 and
LD4. Essentially, these differences reveal that biology textbooks contain more
‘Definition and Evaluation of New Concepts’, and history textbooks contain more
‘Colloquial Narrative’. As discussed in previous chapters, readers are less likely to be
familiar with technical terms and concepts in biology texts than with the cultures,
peoples, and time periods in history textbooks. The higher rates of occurrence for
features associated with defining and evaluating new concepts suggest that the authors of
biology texts recognize and try to compensate for this (see 8.5). In addition, biology
textbooks do not contain many features of colloquial narrative. To the contrary, they tend
to be written in a technical, expository style.

8.5 Biology University Textbook


Several pathogenic microorganisms and parasites are commonly found in
domestic wastewater as well as in effluents from wastewater treatment
plants. The three categories of pathogens encountered in the environment

147
are as follows (Leclerc et al., 2002 ): Bacterial pathogens. Some of these
pathogens (e.g., Salmonella , Shigella ) are enteric bacteria. Others (e.g.,
Legionella , Mycobacterium avium , Aeromonas ) are indigenous aquatic
bacteria. [TB_BI_06]

Unlike textbooks on biology, history textbooks tend to be better characterized by


a colloquial, narrative style rather than an expository style. Excerpt 8.6, below, illustrates
this style with its description of past events, which are linked to current events and issues.

8.6 History University Textbook


Little did the German friar Martin Luther know, when he nailed his
protests against Catholic doctrines to the door of Wittenberg's cathedral in
1517, that he was shaping the destiny of a yet unknown nation.
Denouncing the authority of priests and popes, Luther declared that the
Bible alone was the source of God's word. [TB_HI_01]

The findings summarized here show that university textbooks are generally
perceived as having an engaging, easy to read, and interactive style. This is largely
related to the general use of features associated with Non-technical Synthesis. However,
the particular choices made by the authors of textbooks differ, to some extent, according
to discipline. Whereas biology textbook authors define and elaborate on new concepts,
history textbook authors tend to narrate past events in a colloquial manner. Once again,
these findings shed light on the complex relationships between the situational
characteristics of different publication types and the practices of different disciplines.
These results have been particularly insightful in that they demonstrate that authors seem
to adapt their writing styles to address the challenges of producing a pedagogical manual
in their respective disciplines.

8.2.3. Popular academic books in biology and history

Unlike journal articles and university textbooks, there were no significant


differences between popular academic books in biology and history on any of the
linguistic dimensions. Interestingly, however, there were significant differences between
the two disciplines in terms of the perceptual ratings they received. This section begins
with a descriptive summary of the linguistic characteristics of popular academic writing,
highlighting the relevant situational characteristics of this publication type. I then discuss
the differences between the two disciplines in terms of reader perceptions.
According to the results of this study, popular academic writing in biology and
history can be characterized by its (a) Non-technical Synthesis, (b) Author-centered
Stance, and (c) lack of Abstract Observation and Description. The high rates of linguistic
features associated with Non-technical Synthesis and the low rates of features associated
with Abstract Observation and Description can be attributed, to some degree, to the small
amount of technical knowledge shared by the authors and their readers (see 8.7).

148
8.7 History Popular Academic Book
No event in American history which was so improbable at the time has
seemed so inevitable in retrospect as the American Revolution. On the
inevitability side, it is true there were voices back then urging prospective
patriots to regard American independence as an early version of manifest
destiny. Tom Paine, for example, claimed that it was simply a matter of
common sense that an island could not rule a continent. [PA_HI_18]

This is also related to the topics included in popular academic writing, which are
typically newsworthy, rare, interesting, or otherwise relevant to a general audience. This
is in stark contrast with the scholarly and pedagogical focus of journal articles and
textbooks.
The large amount of Author-centered Stance is related to the purpose of praising
ideas and findings while entertaining the audience (see 8.8).

8.8 Biology Popular Academic Book


Despite being a committed Darwinian, I share these doubts. I do not think
that natural selection for survival can explain the human mind. Our minds
are entertaining, intelligent, creative, and articulate far beyond the
demands of surviving on the plains of Pleistocene Africa. To me, this
points to the work of some intelligent force and some active designer.
However, I think the active designers were our ancestors, using their
powers of sexual choice to influence—unconsciously—what kind of
offspring they produced. [PA_BI_24]

These purposes are clearly illustrated in Excerpt 8.8, in which the author uses an
interactive writing style to praise and promote Darwin’s theories.
Despite the lack of linguistic differences between history and biology popular
academic books, there were large and statistically significant disciplinary differences on
PD1 and PD2. These differences reveal that popular academic books in biology are
perceived as (a) more boring and difficult to read and (b) more objective and information-
focused. This supports discussions in previous chapters in which I have posited that the
subject matter of biology is generally less familiar to lay readers than that of history.
While there are almost certainly other disciplinary factors, linguistic and non-linguistic,
these results show that readers react quite differently to biology and history popular
academic writing despite the lack of linguistic differences between them.
To sum up, this section has shown that popular academic writing in biology and
history is characterized by a non-technical, synthetic style comprised of author-
centeredness and a lack of Abstract Observation and Description. These differences are
clearly related to the situational characteristics of popular academic writing, such as the
lack of shared technical knowledge between reader and author and the focus on
entertaining the audience and praising science. Despite the lack of disciplinary
differences found on the linguistics dimensions, readers reacted quite differently to the
writing of the two disciplines. This might be explained by the relative familiarity of the
content of history writing over that of biology writing, a finding that was made apparent
by the results of the perceptual analysis.

149
8.2.4. Linguistic predictors of perceived writing quality

This section briefly summarizes and interprets the results of Chapter 7 in which I
measured the relationships between language use in and reader perceptions of published
academic writing. Using correlation and regression techniques, I measured the extent to
which language use can predict reader perceptions of writing style and quality. Overall,
the results strongly supported the use of these techniques in order to gain a more
complete understanding of how language use is related to reader perceptions in published
academic writing, generally, as well as within specific publication types and disciplines.
These more specific findings painted a complex picture of the interplay between the
independent variables of discipline and publication type and the dependent variables of
language use and reader perceptions.
In the overall corpus, it was discovered that, together, more Colloquial Narrative,
less Specialized Informational Density, more Definition and Evaluation of New
Concepts, and more Abstract Observation and Description account for more than 30% of
the variability in reader perceptions of readability and engageability and almost 26% of
the variability in reader perceptions of interactivity and objectivity. In order to illustrate
these findings, Excerpt 8.9 and 8.10 contain samples taken from the texts with the highest
and lowest perceptual scores. In other words, Excerpt 8.9 was taken from the text rated
as the least readable, engaging, interactive, and interpretive text in the corpus.

8.9 Biology Journal Article


Fore wing triangular with outer margin straight. Dorsal surface
ferruginous brown; oblique, postmedian band from costal margin to inner
margin, two dorsal white spots on band: one between R3 and R4, the other
between R4 and R5, the latter not visible in males; whitish spot across the
distal end discal cell and between M1 and M2, more evident in females.
[JA_BI_07]

Excerpt 8.10, on the other hand, was taken from the text rated as the most readable,
engaging, interactive, and interpretive.

8.10 Biology University Textbook


And the prevailing view of how the colossal expansion came about
accounts for every bit of matter in the universe, in every living thing.
Think about how you rewind a videotape on a VCR, then imagine yourself
"rewinding" the universe. As you do, the galaxies start moving back
together. [TB_BI_03]

A comparison of these two texts reveals stark differences in a wide range of linguistic
features, many of which contribute to the extreme perceptual ratings.
Within the data for the discipline, publication type, and register sub-corpora, some
of these results remained consistent, some became more marked, and some disappeared.
This variability was not surprising considering the results of Chapters 5 and 6 which
showed a large amount of publication type and discipline variability.

150
The correlational results from Chapter 7 suggest a number of important
conclusions about the nature of published academic writing. First, these findings suggest
that lay readers are sensitive enough to language use to consciously or sub-consciously
perceive and rate their perceptions of linguistic variation. Second, there is strong
evidence that there are underlying dimensions of reader perceptions that can be measured
systematically using a survey instrument. Third, these results show that reader
perceptions are a relatively reliable measure of writing style and quality. Fourth, there is
a moderate to strong relationship between the linguistic choices of authors and the
perceptions of readers. Fifth, these findings seem to show that the relationships between
writing style and stylistic perception are moderated by variables such as discipline,
publication type, topic, and sub-section. Finally, the results show that there is a great
deal of variability in reader perceptions that is not accounted for by language use,
suggesting the need for further research to investigate additional linguistic and non-
linguistic variables. These conclusions are discussed in more detail in the implications
and future research sections below.

8.3. Methodological advantages of this study

The goal of this section is to draw attention to a number of key methodological


decisions made during the course of this study. Each of these decisions distinguishes this
study from other related studies in some way. Therefore, I briefly discuss the reasons for
these choices, their drawbacks and benefits, and their potential usefulness in other studies
of this nature.

8.3.1. Corpus design

The nature of the research goals of this study required me to develop a corpus that
could simultaneously satisfy many competing requirements. These requirements
included the following:

1. Length of texts
a. Long enough to provide stable measurements of linguistic features
b. Short enough to be read and assessed by readers in a reasonable amount of
time
2. Number of texts
a. Large enough sample of texts to adequately represent the three publication
types and two disciplines
b. Small enough sample of texts to make it practical for many readers to read
and assess each text
3. Number of publication types and disciplines
a. Large enough sample of publication types and disciplines to represent
variability in published academic writing
b. Small enough sample of publication types and disciplines to make it
practical for many readers to read and assess each text in each variety

151
The distinctive nature of this study introduced specific challenges that may not
exist in other corpus-based studies. However, most corpus compilers must deal with
practical constraints and limitations in order to build a principled and representative
corpus. In essence, corpus compilers must (a) establish a target variety and linguistic
features they would like to study, (b) sample from that target variety, trying to represent
the range of its situational and linguistic variation, and (c) assess the representativeness
(situational and linguistic) of their sample, and (d) avoid generalizing their findings
beyond the population that the actual corpus sample represents.
In order to follow those guidelines in this study, I:

a. established that I wanted to study the lexico-grammatical characteristics of


published academic writing in three publication types and two disciplines (see
Chapters 1 and 2)
b. sampled twenty-five 500-600 word texts from each discipline and each
publication type, trying to represent a range of text sections and topics (see
Chapter 3)
c. assessed the representativeness of the corpus in terms of its linguistic features (see
Chapter 3 and Section 5.2.2) and its situational characteristics (see Chapter 4)
d. limited the generalizations made to the acknowledged limitations of the corpus
sample (see Chapters 5-8)

Adherence to this model helped me to avoid pitfalls such as failing to assess the
representativeness of the corpus, measuring linguistic characteristics that are not well-
represented in the corpus, and generalizing the findings beyond the language variety
actually represented in the corpus. Necessarily, this resulted in a corpus with a number of
limitations, including relatively short text samples (500-600 words), a limited number of
disciplines and publication types, and a relatively small sample size for text each register
(25 texts). The tradeoff, however, was a principled, balanced corpus that seems to
represent the target varieties quite well (see, e.g., Section 3.4).

8.3.2. Instrument development

The goal of the analysis in Chapter 6 was to measure reader perceptions of


academic writing quality and style. In order to achieve this aim, I set out to develop an
instrument capable of reliably assessing reader perceptions of academic writing. The
development and use of this instrument proceeded in the following stages: (a) review of
relevant literature to identify the best item type and possible items for inclusion in the
study, (b) supplement that list of items with additional relevant items and develop the list
into a pilot instrument, (c) administer the pilot instrument, (d) assess the reliability of the
pilot instrument both quantitatively (e.g., reliability analysis) and qualitatively (e.g.,
think-aloud interviews), (e) revise the items to improve the reliability of the instrument,
(f) administer the final instrument to 25 raters per text, (g) quantitatively assess the
reliability of the instrument, and (h) remove unreliable items, texts, and raters from the
dataset.
The development of this instrument and its administration to lay readers is a novel
approach to research on the linguistic and stylistics choices of published authors. This

152
study, as well as other related studies (e.g., Carroll, 1960; Egbert, 2013), have offered
strong evidence for the reliability of reader perceptions and the usefulness of these
perceptions in helping to interpret linguistic variation in written texts. While the items on
this instrument were developed for the specific purpose of assessing reader perceptions of
published academic prose, it is likely that many of these items could be usefully applied
in future research to assess reader perceptions of other registers. In any case, the stages
listed above could be followed in order to develop new instruments to measure
perceptions of other registers or written domains.

8.3.3. Interaction effects

One of the most important methodological choices made in this study was testing
for and interpreting the effects of interactions between discipline and publication type.
Based on the small number of previous studies that have investigated discipline x
publication type interaction, it was hypothesized at the beginning of this study that
discipline would interact with publication type in meaningful ways. However, the exact
nature and magnitude of those interactions was difficult to predict. The results reported
in Chapters 5 and 6 showed significant interaction effects for four of the five linguistic
dimensions and for both of the perceptual dimensions. These results are important for
two reasons. First, measuring register variation in academic prose without regard to
discipline and publication type could lead to misleading or inaccurate conclusions. For
example, if discipline differences were disregarded in the analysis of linguistic
Dimension 5, we would probably conclude that journal articles and university textbooks
use similar amounts of the linguistic features associated with ‘Abstract Observation and
Description’. However, Figure 5.11 shows that journal articles in biology are
dramatically different from those in history. Therefore, averaging these two would result
in a finding that represents neither discipline. In cases such as this, a post hoc
investigation of the simple effects rather than main effects was more appropriate.
Another related finding revealed by the analysis of statistical interaction is that
some language features vary more within publication types than across them. In other
words, there were cases in Chapter 5 in which discipline was a stronger predictor of
language variation than publication type. A final finding from the analysis of interaction
was that lay readers are sensitive to linguistic variation due to both discipline and
publication type variation, despite not being explicitly told what publication type and
discipline category a given text falls in. These findings suggest that potential interaction
effects, which are frequently ignored in linguistic research with multiple independent
variables, should be considered, at the very least. This will ensure that data are analyzed
and interpreted as accurately and completely as possible.

8.3.4. Methodological triangulation

This dissertation study was based on the assumption that more could be learned
about the nature of published academic writing by triangulating the results of more than
one method (see Section 2.3.5). Traditionally, text-linguistic research is based on the
measurement of objective linguistic variability across text varieties and the interpretation
of language patterns based on a researcher’s understanding of previous literature and the

153
language variety under study. This study is innovative in that it also considers subjective
reader perceptions of textual variation. This approach proved to be even more useful
than I had anticipated. In addition to providing insights into how readers respond to
different texts, the results of the Stylistic Perception analysis were instrumental in
revealing linguistic predictors of writing style in published academic writing.
The quantitative methods in Chapters 5 and 6 also relied on the triangulation of
two different analytical models, the Register and Publication Type x Discipline Models.
Approaching the linguistic and perceptual data from these two perspectives produced a
more complete description of the variability in the corpus than either of the individual
methods could have offered on its own.
A third form of methodological triangulation in this study was the mixed methods
approach taken throughout. Mixed methods approaches include both quantitative and
qualitative analyses in order to achieve a more complete understanding of data. In this
study I have made an effort to adopt a mixed methods approach that is cyclical rather
than unidirectional by (a) qualitatively interpreting quantitative patterns and (b)
quantitatively investigating interesting qualitative findings. The case studies throughout
Chapter 7 are a clear example of this process. The specific methods adopted in this study
represent a handful of ways that multiple methods can be usefully triangulated in corpus
linguistic research. However, there are certainly many other methodological approaches
that could be effectively triangulated to achieve a more complete understanding of corpus
data.

8.4. Implications

The methodology developed for this dissertation research has provided insights
into the relationships between language use by published academic authors and reader
perceptions of writing style and quality. These findings have demonstrated that many of
the linguistic choices made by authors are not only salient to readers, but that they can
influence the perceptions a reader has about a text. Based on these and other important
findings from this study, this section will discuss some of the implications of this
dissertation for several groups: teachers (8.4.1), authors, publishers, and editors (8.4.2),
researchers (8.4.3), and EAP instructors and administrators (8.4.4).

8.4.1. For teachers

As mentioned above, the results presented here have revealed that the writing
styles of authors are highly variable, and that their stylistic choices have an influence on
reader perceptions of the quality of published academic writing. University faculty and
committees are often required to select textbooks and other course reading materials for
students. These teachers and committees might consider including student perceptions of
the quality and style of the writing in various textbook options in order to make a more
informed decision. This could be as simple as using a think-aloud interview to get some
qualitative feedback on a number of potential course textbooks or as sophisticated as
following the procedures laid out in this study. As the target audience of university
textbooks, students can be a valuable source of information regarding the writing quality
of potential textbooks (see also Egbert, 2013).

154
8.4.2. For authors, publishers, and editors

The findings of this study suggest that writers, editors, and publishers of academic
books and articles will benefit from placing more emphasis on the results of descriptive
research that relates textual features to the perceptions of readers. It is quite likely that
traditional style guidelines do not actually impede or enhance the reader’s experience in
the same ways or to the same extent as they claim. Many of these guidelines have never
been empirically tested to determine their influence on readers. Investigations like the
ones presented in this study could be used to improve writing style guides and editorial
practices. Novice and seasoned authors could certainly benefit from knowing the
potential effects that their linguistic and stylistic choices may have on their target
audiences. As demonstrated by this study, these results are not an effort to distinguish
good writing from bad writing. Rather, they could be used to help authors match the
content and stylistic intent of their writing with linguistic choices that are most likely to
help them achieve their goals.
It should be emphasized here that these findings do not suggest that reader
perceptions and language use are the only considerations for authors and publishers.
Most, if not all of the authors included in this study are accomplished scholars in their
respective fields, and publishing companies have spent decades pursuing the highest
quality of writing in their publications. Reader perceptions are not the only measure of
writing quality, but most would agree they are important to consider, at the very least.

8.4.3. For researchers

For researchers interested in register and style variation, this study has
demonstrated the utility of a dual methodology for the analysis of written language which
includes looking beyond data and researcher intuitions in order to interpret the results of
research. The methodology developed in Chapter 6 of this study, including the
subjection of perceptual data to a battery of reliability measures, has attempted to address
this gap. Stylistics researchers often measure statistical significance (is the difference
big?), but they rarely consider practical significance (does the difference matter and to
whom?). This approach to measuring reader perceptions of stylistic variation in the
language of published academic writing has measured both statistical and practical
significance by bringing the focus back to the reader. Based on the results of the study, it
seems to be a step in the right direction.
One of the important themes of this study has been the importance of
investigating and appropriately interpreting interaction between independent variables.
The results of this study have shown that these interaction effects, when they exist, have a
profound impact on the interpretation of our research findings and the conclusions we can
draw from our data. This is not to say that the use of factorial ANOVAs to measure
interaction effects is necessary or appropriate for every study. Rather, it is to say that
when two or more independent variables or factors exist in our corpus sample, we should
at least test whether they interact with each other. This will help us ensure we are
appropriately and accurately interpreting patterns in our data.
Representativeness in corpus design has been an area of emphasis among corpus
linguists for at least two decades (see, e.g., Biber, 1993), but the specific implications of

155
representativeness for corpora in studies of multiple factors has not received adequate
attention. This study has shown that issues of representativeness must be considered at
each stage of a corpus study. After establishing the research questions for a corpus
study, we should design a corpus sample so that it (1) includes the full range of
variability within the factor(s) (e.g., publication type, discipline) and factor levels (e.g.,
journal articles, textbooks) that we are interested in measuring, and (2) excludes, to the
extent possible, extraneous variables from the corpus sample. We should then perform
appropriate analyses that account for variability within each of the factors in the corpus.
In cases where more than one factor exists in a corpus sample, a statistical technique,
such as a factorial ANOVA, should be used to test for interactions. The presence or
absence of an interaction effect will determine how the researcher should analyze and
interpret patterns. Finally, patterns should be interpreted based on the corpus sample
used in the study and generalized only to the factors and factor levels included in that
corpus sample.
In cases where researchers are not interested in possible interactions between
independent factors, they should design the corpus in such a way that it represents the
range of variability along a single factor (e.g., academic journal articles), without
introducing additional factors (e.g., disciplines, time periods). By controlling for
variables that are deemed extraneous to the research questions of interest, the researcher
will reduce the noise, or unexplained variance, in the sample, allowing for a more
accurate description and/or comparison of the levels within the factor of interest.
Although this will necessarily limit the scope and generalizability of a study, it will
increase the interpretability of the findings. In other words, it is the researcher’s
responsibility to create a representative corpus by including the full range of variability
for the language variety of interest and limiting additional variability due to factors and
factor levels that are beyond the scope of the study.
A final implication of this study for researchers is the potential benefit of
methodological triangulation. This study has clearly illustrated the value of approaching
a data set from more than one perspective by using more than one methodological
approach. Many of the questions raised during the course of this study could not have
been answered by simply using a linguistic or perceptual description of the data. The use
of both Multi-Dimensional analysis and Stylistic Perception analysis has resulted in a
much more comprehensive understanding of the nature of published academic writing
from the perspective of the text and the reader. There is great need for researchers to gain
a better understanding of the data they analyze. The limitations of one methodological
approach can often be overcome, at least to some extent, through the use of one or more
additional, complementary approaches.

8.4.4. For EAP instructors and administrators

A final area that can benefit from the findings in this study is that of English for
Academic Purposes (EAP) teaching, assessment, and curriculum and materials
development. Research on EAP is founded on the assumption that language varies based
on the purpose for which it is used. This study has demonstrated both the reality and the
complexity of linguistic variation based on the specific purpose for which English is
used. With regard to registers, this study has shown that intraspecialist registers such as

156
published journal articles are written quite differently than pedagogical and popular
academic texts. Furthermore, the research presented here has demonstrated that authors
write very differently within different disciplines, and that these patterns vary based on
publication type. The results from this study could be used to prepare students to read
within different publication types and disciplines. Finally, the results of this study have
shown that variability that exists among authors, even within narrow publication types
and disciplines. These findings underscore the importance of additional research on
specific publication types and disciplines, which should then be used as the basis for
authentic EAP materials, teaching, and tests.

8.5. Limitations and Future research

8.5.1. Limitations

In this study, I have made every effort to measure relationships between linguistic
variation and reader perceptions. In order to gain some understanding of these variables,
it was necessary to control for as many extraneous variables as possible. This made the
scope of this study quite narrow. The data were gathered based on a relatively small
corpus of 150 samples of published academic writing taken from just three of the many
written academic publication types and only two of the many academic disciplines.
Therefore, the findings reported here should not be generalized beyond that scope, and
even then it should be stressed that the results of this study are exploratory and in need of
future investigation and replication.
Another limitation is that each of the texts in the corpus contains only 500-600
words. Therefore, many linguistic features were excluded from this study because they
do not typically occur with rates that are consistently frequent enough across texts to be
reliably measured. Furthermore, because the texts are quite short, they are limited in the
extent to which they represent the full range of potential variability within the writing of
particular authors and across sections and chapters of the full texts.
A final limitation of the corpus is the limited range of popular academic writing
represented in the popular academic sub-corpus. The popular academic writing in the
corpus used in this study represents only a narrow sub-domain of popular academic
prose. While this does not affect the validity of the results, it does limit the
generalizability of the findings. A corpus composed of a broader range of popular
academic writing, or a corpus designed to represent a different sub-domain of popular
academic prose, would be likely to yield different results from this study.
Two additional limitations are the lack of information about the participant
readers and the heterogeneous sample of readers used for this study. A more thorough
background survey would have provided additional demographic information in order to
better understand the nature of the sample and the potential influences of participant
variables. The ideal participant readers for studies of this nature are people within the
target audience of the particular writing under study. Although the use of a single lay
audience presented some advantages for an exploratory such as this one, future
applications would likely benefit from an effort to match the writing under study with an
appropriate target audience.

157
8.5.2. Need for replication studies

The findings presented in this study have opened up a wide range of possibilities
and applications for future research. MD analysis of linguistic variation can be applied to
a representative corpus of any register or domain. Stylistic Perception analysis can be
used to measure the perceptions of almost any target audience. As in this study, these
two sets of results can then be brought together through correlational analyses to
determine whether any multivariate relationships exist between the linguistic and
perceptual variables. The success of the items in the survey used in this study also
suggests that other perceptual differential scales could be developed to measure
additional reader perceptions of interest.
Additional issues that may be explored in future research include text length and
the number of participant readers. It would be interesting to explore the use of longer or
shorter text samples on reader perceptions and linguistic analyses. While shorter texts
would certainly be easier to analyze perceptually, they will present additional obstacles to
achieving corpus representativeness, both situational and linguistic. On the other hand,
while longer texts would increase corpus representativeness, they might make it more
difficult to achieve reliable stylistic ratings from participant readers.
Regarding the number of participant raters, in this study I chose to include a large
number of raters for each text sample. This aided in the reliability analyses performed in
Chapter 6 and increased the generalizability of these results. Although the reliability
results presented in Chapter 6 ultimately show that reader perceptions are, for the most
part, quite reliable, there is variation in the reader responses that is not accounted for in a
simple measure of central tendency. While an in-depth investigation of the variability
among these 25 raters was beyond the scope of this study, there is a real need for future
research that focuses on variability in readers’ perceptions of texts, including the extent
of that variation as well as possible variables that contribute to it.
For practical reasons, the use of 25 raters per text may not be feasible in future
research. The ideal number of raters per text could also be explored in the future to make
the approach proposed here more reasonable for future applications while still allowing
researchers to generalize their results beyond the sample included in the study.
This area of research would also benefit from future studies that investigate
linguistic and perceptual variation in different disciplines. We would expect to find
additional patterns of variability in the linguistic features and reader perceptions of other
disciplines. Discipline-specific investigations of this nature would be an important step
towards a comprehensive description of the linguistic characteristics and perceived styles
of academic writing.
Another direction for future research would be to go beyond perceptions of a
variable such as comprehensibility by actually measuring it in a more direct way. It
would be interesting to investigate possible associations between dimensions of linguistic
variation and the actual comprehensibility of a passage based on scores from a reading
comprehension test.
A final area for potentially fruitful future research is the use of cluster analysis as
a method of grouping texts according to their similarities, linguistic, perceptual, or both.
The results presented in this study revealed that the grouping variables used here (i.e.,
publication type and discipline) may not be the only variables at play. Cluster analysis

158
would allow groups to emerge in the data using a bottom-up approach. This may also
reveal other predictor variables that are useful in explaining patterns in the data.
This corpus-based study, although relatively small-scale and limited in scope, has
contributed valuable insights into the nature of register, publication type, and discipline
variation in published academic writing. More importantly, these findings have raised
important and exciting new questions about reader perceptions of linguistic variation in
published academic writing.

159
REFERENCES

Atkinson, D., and D. Biber. 1994. Register: A review of empirical research. In D. Biber
& E. Finegan (Eds.), Sociolinguistic Perspectives on Register. Oxford: Oxford
University Press, 351-385.
Adams-Smith, D.E. 1987. The process of popularization – rewriting medical research
papers for the layman: Discussion paper. Journal of the Royal Society of
Medicine, 80, 634-636.
Alonso, O., & Mizzaro, S. 2009. Can we get rid of TREC assessors? Using Mechanical
Turk for relevance assessment. In S. Geva, J. Kamps, C. Peters, T. Saka, A.
Trotman, & E. Voorhees (Eds.), Proceedings of the SIGIR 2009 Workshop on the
Future of IR Evaluation (pp. 15–16). Amsterdam: IR Publications.
Bailin, A., & Grafstein, A. 2001. The linguistic assumptions underlying readability
formulae: A critique. Language and Communication, 21, 285-301.
Benjamin, R.G. 2012. Reconstructing readability: Recent developments and
recommendations in the analysis of text difficulty. Educational Psychology
Review, 24, 63-88.
Bertilson, H. & Fierke, K.M. 1980. Distribution of gender usage as pronouns, examples
and pictures in introductory college textbooks. Encyclia: The Journal of the Utah
Academy of Sciences, Arts and Letters, 57, 149-161.
Biber, D. 1988. Variation across Speech and Writing. Cambridge: Cambridge University
Press.
Biber, D. 1993. Representativeness in corpus design. Literary and Linguistic Computing,
8, 243-257.
Biber, D. 2004. Modal use across registers and time. In Anne Curzan and Kimberly
Emmons (eds.), Studies in the history of the English language II: Unfolding
conversations, 189-216. Berlin: Mouton de Gruyter.
Biber, D. 2006a. Stance in spoken and written university registers. Journal of English for
Academic Purposes. 5, 97-116.
Biber, D. 2006b. University Language: A Corpus-based Study of Spoken and Written
Registers. Philadelphia: John Benjamins.
Biber, D. 2011. Corpus linguistics and the study of literature: Back to the future?
Scientific Study of Literature, 1, 15-23.
Biber, D. 2012. Register as a predictor of linguistic variation. Corpus Linguistics and
Linguistic Theory, 8, 9-37.
Biber, D., & Conrad, S. 2009. Register, genre and style. Cambridge: Cambridge
University Press.
Biber, D., Conrad, S., & Cortes, V. 2004. If you look at...: lexical bundles in university
teaching and textbooks. Applied Linguistics, 25(3), 371-405.
Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. 2002. Speaking and
writing in the university: A multidimensional comparison. TESOL
Quarterly, 36, 9-48.
Biber, D., Csomay, E., Jones, J., & Keck, C. 2004. A corpus linguistic investigation of
vocabulary-based discourse units in university registers. In U. Connor & T. Upton
(eds.), Applied corpus linguistics: A multi-dimensional perspective (pp. 53-72.
Amsterdam: Rodopi.

160
Biber, D., Egbert, J., Gray, B., Oppliger, R., & Szmrecsanyi, B. (to appear). Variation
versus text-linguistic approaches to grammatical change in English: Nominal
modifiers of head nouns. In Kyto, M. & Paivi, P (Eds.), Handbook of English
historical linguistics, Cambridge: Cambridge University Press.
Biber, D., & Finegan, E. 1994. Intra-textual variation within medical research articles. In
N. Oostdijk and P. de Haan (Eds.), Corpus-based Research into Language,
Amsterdam: Rodopi, 201-222.
Biber, D., and Finegan, E. 1997. Diachronic relations among speech-based and written
registers in English. In To explain the present: Studies in the changing English
language in honour of Matti Rissanen, ed. by T. Nevalainen and L. Kahlas-
Tarkka, 253-275. Helsinki: Societe Neophilologique.
Biber, D., & Gray, B. 2010. Challenging stereotypes about academic writing:
Complexity, elaboration, explicitness. Journal of English for Academic Purposes,
9, 2-20.
Biber, D. & Gray, B. 2013. Being specific about historical change: The influence of sub-
register. Journal of English Linguistics, 41(2), 104-134.
Biber, D., Gray, B., & Poonpon, K. 2011. Should we use characteristics of conversation
to measure grammatical complexity in L2 writing development? TESOL
Quarterly, 45, 5-35.
Biber, D., Gray, B., & Staples, S. (under review). Do more complex tasks facilitate
greater grammatical complexity? An investigation of task type variation on the
TOEFL iBT.
Biber, B., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman
Grammar of Spoken and Written English. Harlow, UK: Pearson Education.
Britton, B.K., Gulgoz, S., & Glynn, S. 1993. Impact of good and poor writing on
learners: Research and theory. In B.K. Britton, A. Woodward, & M. Binkley
(Eds.), Learning from Textbooks: Theory and Practice (1-46), Hillsdale, NJ:
Lawrence Erlbaum Associates.
Bruce, B., Rubin, A., & Starr, K. 1981. Why readability formulas fail. Reading Education
Report No. 28. Urbana: University of Illinois Center for the Study of Reading.
Byrd, P. 1997. Naming practices in academic writing: Another thought. English for
Specific Purposes, 16(4), 339-343.
Carkin, S. 2001. Pedagogic discourse in introductory classes: Multi-dimensional analysis
of textbooks and lectures in biology and macroeconomics. Doctoral dissertation,
Northern Arizona University.
Carroll, J. 1960. Vectors of prose style. In T.A. Sebeok (Ed.) Style in language (pp. 283-
292), Cambridge: Cambridge University Press.
Charney, D. 2003. Lone geniuses in popular science: The devaluation of scientific
consensus. Written Communication, 20(3), 215-241.
Charney, D. 2004. Introduction: The rhetoric of science. Written Communication, 21(3),
3-5.
Chen, L. 2008. An Investigation of Lexical Bundles in Electrical Engineering
Introductory Textbooks and ESP Textbooks. Master’s thesis, Carleton University.
Cline, T.A. 1972. Readability of community college textbooks. Journal of Reading,
16(1), 33-37.

161
Cloitre, M. and Shinn, T. 1985. Expository practice: Social, cognitive and
epistemological linkages. In T. Shinn and R. Whitley (Eds.), Expository Science,
31-60.
Cohen, L., & Manion, L. (2000). Research Methods in Education. London: Routledge.
Conrad, S. 1996a. Academic Discourse in Two Disciplines: Professional Writing and
Student Development in Biology and History. Doctoral dissertation, Northern
Arizona University.
Conrad, S. 1996b. “Investigating academic texts with corpus-based techniques: An
example from biology”. Linguistics and Education, 8, 299-326.
Conrad, S., and D. Biber (Eds.). 2001. Variation in English: Multi-Dimensional Studies.
London: Longman.
Conrad, S. & D. Biber 2001. Multi-dimensional methodology and the dimensions of
register variation in English. In S. Conrad & D. Biber (Eds.) Multi-dimensional
Studies of Register Variation in English (p. 13-42). Harlow: Pearson Education.
Crossley, S. A., Dufty, D. F., McCarthy,P. M., & McNamara, D. S. 2007. Toward a new
readability: A mixed model approach. In D.S. McNamara and G. Trafton (Eds.),
Proceedings of the 29th annual conference of the Cognitive Science Society (pp.
197–202. Austin, TX: Cognitive Science Society.
Crossley, S.A., Allen, D.B., & McNamara, D.S. 2011. Text readability and intuitive
simplification: A comparison of readability formulas. Reading in a Foreign
Language, 23(1), 84-101.
Crossley, S.A., Greenfield, J., & McNamara, D.S. 2008. Assessing text readability using
cognitively based indices. TESOL Quarterly, 42(3), 475-493.
Crystal, D. 1972. Objective and subjective in stylistic analysis. In B.B. Kachru & H.
Stahlke (Eds), Current trends in stylistics (p. 103-114). Edmonton: Linguistic
Research Inc.
Csomay, E. 2007. A corpus-based look at linguistic variation in classroom interaction:
Teacher talk versus student talk in American university classes. Journal of
English for Academic Purposes, 6, 336-355.
Dahl, T. 2008. Contributing to the academic conversation: A study of new knowledge
claims in economics and linguistics. Journal of Pragmatics, 40, 1184-1201.
Davies, M. 2008-. The Corpus of Contemporary American English: 450 million words,
1990-present. Available at http://corpus.byu.edu/coca/.
Davison, A., & Kantor, R.N. 1982. On the failure of readability formulas to define
readable texts: A case study from adaptations. Reading Research Quarterly,
17(2), 187-209.
Diani, G. 2008. Emphasizers in spoken and written academic discourse: The case of
really. International Journal of Corpus Linguistics, 13(3), 296-321.
DiMarco, C. & Hirst, G. 1993. A computational theory of goal-directed style in syntax.
Computational Linguistics: 19(3), 451-499.
Egbert, J. (2012). Style in nineteenth century fiction: A multi-dimensional analysis. The
Scientific Study of Literature, 2(2), 167-198.
Egbert, J. (2013). Student perceptions of stylistic variation in introductory university
textbooks. Linguistics and Education.
Fahnestock, J. 1986. Accommodating science: The rhetorical life of scientific facts.
Written Communication, 3, 275-96.

162
Flesch, R. 1948. A new readability yardstick. Journal of Applied Psychology, 32(3), 221-
233.
Fulcher, G. 1997. Text difficulty and accessibility: Reading formulae and expert
judgement. System, 25(4), 497-513.
Gardner, D., & Davies, M. 2007. Pointing out frequent phrasal verbs: A corpus‐based
analysis. Tesol Quarterly, 41(2), 339-359.
Green, A., Unaldi, A., & Weir, C. 2010. Empiricism versus connoisseurship: Establishing
the appropriacy of texts in tests of academic reading. Language Testing, 27(2),
191-211.
Gopen, G & Swan, J. (1990). The science of scientific writing. American Scientist, 78:
550-558.
Grabe, W. 1984. Towards Defining Expository Prose within a Theory of Text
Construction. Doctoral dissertation, University of Southern California.
Gray, B. 2011. Exploring Academic Writing through Corpus Linguistics: When
Discipline Tells Only Part of the Story. Doctoral dissertation, Northern Arizona
University.
Gries, S. Th. 2006. “Exploring variability within and between corpora: Some
methodological considerations”. Corpora, 1(2), 109-151.
Halliday, M.A.K., & Hasan, R. 1976. Cohesion in English. London: Longman.
Harwood, N. 2005. 'I hoped to counteract the memory problem, but I made no
impact whatsoever': Discussing methods in computing science using I.
English for Specific Purposes, 24, 243-267.
Heilman, M.J., Collins-Thompson, K., Callan, J., & Eskenazi, M. 2007. Combining
lexical and grammatical features to improve readability measures for first and
second language texts. Proceedings of the NAACL-HLT 2007 Conference.
Rochester, U.S.A. pp. 460-467.
Hendricks, W.O. 1976. Grammars of style and styles of grammar. New York: North-
Holland Publishing Company.
Hewings, M., & Hewings, A. 2002. "It is interesting to note that...": A comparative
study of anticipatory 'it' in student and published writing. English for
Specific Purposes, 21, 367- 383.
Hidi, S. 2001. Interest, reading, and learning: Theoretical and practical considerations.
Educational Psychology Review, 13(3), 191-209.
Hidi, S., & Baird, W. 1986. Interestingness—A neglected variable in discourse
processing. Cognitive Science, 10, 179–194.
Hidi, S., & Baird, W. 1988. Strategies for increasing text-based interest and students’
recall of expository texts. Reading Research Quarterly, 23, 465–483.
Hilton, C.B., Motes, W.H., & Fielden, J.S. 1989. An experimental study of the effects of
style and organization on reader perceptions of text. The Journal of Business
Communication, 26(3), 255-270.
Hirst, G. 2007. Views of text-meaning in computational linguistics: Past, present, and
future. In: Dodig-Crnkovic, G., Stuart, S. eds.) Computation, information,
cognition: The nexus and the liminal (pp. 270–279. Cambridge Scholars
Publishing.

163
Hirst, G. 2008. The future of text‐meaning in computational linguistics. In: P. Sojka; A.
Horák; I. Kopeček; & K. Pala (Eds.), Proceedings, 11th International Conference
on Text, Speech and Dialogue (p. 1-9. Brno, Czech Republic: Springer‐Verlag.
Howell, D.C. 2007. Statistical methods for psychology. Belmont, CA: Thomson
Wadsworth.
Hsu, W. 2011. The vocabulary thresholds of business textbooks and business research
articles for EFL learners. English for Specific Purposes, 30, 247-257.
Hu, G., & Cao, F. 2011. Hedging and boosting in abstracts of applied linguistics articles:
A comparative study of English and Chinese medium journals. Journal of
Pragmatics, 43, 1-15.
Hunston, S. 2011. Corpus Approaches to Evaluation: Phraseology and evaluative
language. New York: Routledge.
Hyland, K. 1994. Hedging in academic writing and EAP textbooks. English for Specific
Purposes, 13(3), 239-256.
Hyland, K. 1996. Writing without conviction? Hedging in science research articles.
Applied Linguistics, 17(4), 433-454.
Hyland, K. 1998. Hedging in Scientific Research Articles. Amsterdam: John Benjamins.
Hyland, K. 1999. Talking to students: Metadiscourse in introductory coursebooks.
English for Specific Purposes, 18(1), 3-26.
Hyland, K. 2001. Humble servants of the discipline? Self-mention in research articles.
English for Specific Purposes, 20, 207-226.
Hyland, K. 2006. Disciplinary differences: Language variation in academic discourses. In
Hyland, K. & Bondi, M. (Eds.) Academic Discourse across Disciplines.
Frankfort: Peter Lang, 17-45.
Hyland, K. 2008. Genre and academic writing in the disciplines. Language Teaching,
41(4), 543-562.
Hyland, K. 2009. Academic Discourse. London: Continuum.
Hyland, K. 2010. Constructing proximity: Relating to readers in popular and professional
science. Journal of English for Academic Purposes, 9, 116-127.
Hyland, K. 2012. Bundles in academic discourse. Annual Review of Applied Linguistics,
32, 150–169.
Hyland, K., & Tse, P. 2007. Is there an "academic vocabulary"? TESOL Quarterly, 4(2),
235-253.
Johns, A.M. 1986. Coherence and academic writing: Some definitions and suggestions
for teaching. TESOL Quarterly, 20(2), 247-265.
Kapon, S. 2013. Bridging the knowledge gap: An analysis of Albert Einstein’s
popularized presentation of the equivalence of mass and energy. Public
Understanding of Science, 0(0), 1-12.
Kelly, J., Knight, J., Peck, L.A., & Reel, G. 2003. Straight/narrative? Writing style
changes readers’ perceptions of story quality. Newspaper Research Journal,
24(4), 118-122.
Kidd, J.S. 1988. The popularizations of science: Some basic measurements.
Scientometrics, 14, 127-142.
Koutsantoni, D. 2006. Rhetorical strategies in engineering research articles and research
theses: Advanced academic literacy and relations of power. Journals of English
for Academic Purposes, 5, 19-36.

164
Koyalan, A. & Mumford, S. 2011. Changes to English as an additional language writers’
research articles: From spoken to written register. ESP J, 30(2): 113–123.
Kranich, S. 2011. To hedge or not to hedge: the use of epistemic modal expressions in
popular science in English texts, English–German translations, and German
original texts. Text & Talk-An Interdisciplinary Journal of Language, Discourse
& Communication Studies, 31(1), 77-99.
Kronick, D.A. 1976. History of scientific and technical periodicals: The origins and
development of the scientific and technical press, 1665-1790. Lanham, MD:
Rowman and Littlefield.
Kuehl, R. O. 2000. Design of Experiments: Statistical Principles of Research Design and
Analysis. Pacific Grove, CA: Brooks/Cole.
Kuo, C. 1999. The use of personal pronouns: Role relationships in scientific journal
articles. English for Specific Purposes, 18, 121-138.
Kurzman, M. 1974. The reading ability of college freshmen compared to the
readability of their textbooks. Reading Improvement, 11(2), 13-25.
Leroy, G., Helmreich, S., & Cowie, J.R. 2010. The influence of text characteristics on
perceived and actual difficulty of health information. International Journal of
Medical Informatics, 30, 1-12.
Lin, L. & Evans, S. 2012. Structural patterns in empirical research articles: A cross-
disciplinary study. English for Specific Purposes, 31, 150-160.
Lischinsky, A. 2008a. The Construction of Expert Knowledge in Popular Management
Literature. Doctoral dissertation, Universitat Pompeu Fabra.
Lischinsky, A. 2008b. Examples as persuasive argument in popular management
literature. Discourse & Communication, 2(3), 243-269.
Liu, D. 2011. The most frequently used English phrasal verbs in American and British
English: A multicorpus examination. TESOL Quarterly, 45(4), 661 – 688.
Love, A. 1991. Process and product in geology: An investigation of some discourse
features of two introductory textbooks. English for Specific Purposes, 12(3), 197-
218.
Love, A. 1993. Lexico-grammatical features of geology textbooks: Process and product
revisited. English for Specific Purposes, 10(2), 89-103.
Mann, W.C. & Thompson, S.A. 1988. Rhetorical structure theory: Toward a functional
theory of text organization. Text, 8(3), 243-281.
Mannes, S.M., & Kintsch, W. 1987. Knowledge organization and text organization.
Cognition and Instruction, 4(2), 91-115.
Marco, M.J.L. 2000. The construction of novelty in computer science papers. Revista
Alicantina de Estudios Ingleses, 13, 123-140.
Marge, M. , Banerjee, S., & Rudnicky, A. I. 2010. Using the Amazon Mechanical Turk
for transcription of spoken language. In Hansen, J. (Ed.), Proceedings of the 2010
IEEE Conference on Acoustics, Speech and Signal Processing (pp. 5270–5273).
IEEE.
Martínez, I. 2005. Native and non-native writers' use of first person pronouns in the
different sections of biology research articles in English. Journal of Second
Language Writing, 14, 174-190.
Martínez, I., Beck, S., & Panza, C. 2009. Academic vocabulary in agriculture research
articles: A corpus-based study. English for Specific Purposes, 28, 183-198.

165
Mason, W., & Suri, S. 2012. Conducting behavioral research on Amazon’s Mechanical
Turk. Behavior research methods, 44(1), 1-23.
McGrath, L. & Kuteeva, M. 2012. Stance and engagement in pure mathematics research
articles: Linking discourse features to disciplinary practices. English for Specific
Purposes, 31: 161-173.
McNamara, D.S., Crossley, S.A. & McCarthy, P.M. 2009. Linguistic features of writing
quality. Written Communication, 27(1), 57-86.
McNamara, D.S., Kintsch, E., Songer, N.B., & Kintsch, W. 1996. Are good texts always
better? Interactiveness of text coherence, background knowledge, and levels of
understanding in learning from text. Cognition and Instruction, 14(1), 1-43.
Meadows, A.J. 1981. Development of science publishing in Europe. New York: Elsevier
Science Ltd.
Milic, L.T. 1967. Against the typologies of styles. In S. Chatman & S. Levin (Eds.)
Essays on the language of literature. (p. 450). Boston: Houghton Mifflin.
Miller, D. 2011. ESL reading textbooks vs. university textbooks: Are we giving our
students the input they may need? Journal of English for Academic Purposes, 10,
32-46.
Mueller, D.J. 1986. Measuring social attitudes: A handbook for researchers and
practitioners. New York: Teachers College Press.
Murillo, S. 2012. The use of reformulation markers in Business Management research
articles: An intercultural analysis. International Journal of Corpus Linguistics,
17(1), 64-90.
Myers, G. 2003. Discourse studies of scientific popularization: questioning the
boundaries. Discourse Studies, 5(2), 265-279.
Oakey, D. J. 2002. Formulaic language in English academic writing: A corpus-based
study of the formal and functional variation of a lexical phrase in different
academic disciplines in English. In R. Reppen, S. Fitzmaurice, & D. Biber (Eds.),
Using Corpora to Explore Linguistic Variation. Amsterdam: John Benjamins,
111-130.
Oliveira, J.M.D. & Pagano, A.S. 2006. The research article and the science
popularization article: A probabilistic functional grammar perspective on direct
discourse representation. Discourse Studies, 8(5), 627-646.
Osgood, C.E., Suci, G., & Tannenbaum, P. 1957. The measurement of meaning. Urbana,
IL: University of Illinois Press.
Paolacci, G., Chandler, J., & Ipeirotis, P. G. 2010. Running experiments on Amazon
Mechanical Turk. Judgment and Decision Making, 5, 411–419.
Parkinson, J. & Adendorff, R. 2004. The use of popular science articles in teaching
scientific literacy. English for Specific Purposes, 23, 379-396.
Parkinson, J. & Adendorff, R. 2005. Variable discursive constructions of three genres of
science. Southern African Linguistics and Applied Language Studies, 23(3): 281-
303.
Peacock, M. 2011. A comparative study of introductory it in research articles across eight
disciplines. International Journal of Corpus Linguistics, 16(1), 72-100.
Pitcher, B., & Fang, Z. 2007. Can we trust levelled texts? An examination of their
reliability and quality from a linguistic perspective. Literacy, 41(1), 43-51.

166
Pride, J. 1975. The Readability of Selected Textbooks and the Reading Abilities of
Freshman Students at a Community College. Doctoral dissertation, Jackson State
University.
R Development Core Team 2012. R: A Language and Environment for Statistical
Computing, Reference Index Version 2.2.1. Vienna: R Foundation for Statistical
Computing. Available at: http://www.R-project.org.
Reinhart, T. 1980. Conditions for text coherence. Poetics Today, 1(4), 161-180.
Revelle, W. 2012. psych: Procedures for Personality and Psychological Research.
Evanston, WY: Northwestern University. Available at http://personality-
project.org/r/psych.manual.pdf.
Riffaterre, M. 1967. Criteria for style analysis. In S. Chatman & S. Levin (Eds.) Essays
on the language of literature (pp. 154-174). Boston: Houghton Mifflin.
Sandell, R. (1977). Linguistic Style and Persuasion. London: Academic Press.
Schraw, G., Bruning, R., and Svoboda, C. 1995. Sources of situational interest. Journal of
Reading Behavior, 27, 1–17.
Schiffrin, D. (1987). Discourse Markers. London: Cambridge University Press.
Shrout, P. E., & Fleiss, J. L. 1979. Intraclass correlations: Uses in assessing rater
reliability. Psychological Bulletin, 86, 420-428.
Simpson-Vlach, R. & Ellis, N. 2010. An academic formulas list: New methods in
phraseology research. Applied Linguistics, 31(4), 487 – 512.
Snider, J. G., and Osgood, C. E. 1969. Semantic Differential Technique: A Sourcebook.
Chicago: Aldine.
Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. Y. 2008. Cheap and fast—but is it good?
Evaluating non-expert annotations for natural language tasks. In Lapata, M. &
Ng, H.T. (Eds.), Proceedings of the Conference on Empirical Methods in Natural
Language Processing (pp. 254–263). New York: ACM.
Stubbs, N.A. 2001. Technical text: Improving comprehensibility through the application
of psycholinguistic research. Doctoral dissertation, University of Southern
California.
Suri, S., & Watts, D. J. 2011. Cooperation and contagion in Web-based, networked
public goods experiments. PLoS One, 6(3), e16836.
Swenson, K.L. 2007. Predicting reading comprehension from authentic text. Doctoral
dissertation, University of Utah.
Szmrecsanyi, Benedikt and Hinrichs, Lars 2008. Probabilistic determinants of genitive
variation in spoken and written English: a multivariate comparison across time,
space, and genres. In T. Nevalainen, I. Taavitsainen, P. Pahta, and M. Korhonen
(Eds.), The dynamics of linguistic variation: Corpus evidence on English past and
present, pp. 291–309. Amsterdam, Philadelphia: Benjamins.
Tadros, A. 1989. Predictive categories in university textbooks. English for Specific
Purposes, 8(1), 17-31.
Tinsley, H. & Weiss, D. 2000. Interrater reliability and agreement. In H. Tinsley & S.
Brown (Eds.) Handbook of applied multivariate statistics and mathematical
modelling (pp. 95 – 124). London: Academic Press.
Tuldava, J. 2004. The development of statistical stylistics: A survey. Journal of
Quantitative Linguistics, 11, 141–151.

167
Urbano, J., Morato, J., Marrero, M., & Martín, D. 2010. Crowdsourcing preference
judgments for evaluation of music similarity tasks. In M. Lease, V. Carvalho, &
E. Yilmaz (Eds.), Proceedings of the ACM SIGIR 2010 Workshop on
Crowdsourcing for Search Evaluation (CSE 2010) (pp. 9–16). Geneva,
Switzerland.
van Peer, W. 1986. Stylistics and psychology: Investigations of foregrounding. London:
Croom.
Varttala, T. 1999. Remarks on the communicative functions of hedging in popular
science and specialist research articles on medicine. English for Specific
Purposes, 18(2), 177-200.
Varttala, T. 2001. Hedging in Scientifically Oriented Discourse: Exploring Variation
According to Discipline and Intended Audience. Doctoral dissertation, University
of Tampere.
Vázquez, I. & Giner, D. 2009:.Writing with conviction: The use of boosters in modelling
persuasion in academic discourse. Revista Alicantina de Estudios Ingleses, 22: 219-
37.
Vogel, R. 2010. Lexical cohesion in popular vs. theoretical scientific texts. In:
Jančaříková, R. (ed.) Interpretation of Meaning Across Discourses. Brno:
Masaryk University. 61-74.
Vongpumivitch, V., Huang, J., & Change, Y-C. 2009. Frequency analysis of the
words in the Academic Word List (AWL) and non-AWL content words in
applied linguistics research papers. English for Specific Purposes, 28, 33-41.
Warchal, K. 2010. Moulding interpersonal relations through conditional clauses:
Consensus-building strategies in written academic discourse. Journal of
English for Academic Purposes.
Whissell, C. 1997. Content, style, and emotional tone of texts in introductory
psychology. Perceptual and Motor Skills, 84, 115-125.
Witte, S.P., & Faigley, L. 1981. Coherence, cohesion, and writing quality. College
Composition and Communication, 32(2), 189-204.

168
APPENDIX A. DETAILED INFORMATION FOR EACH TEXT IN THE CORPUS

# ID Author Publication Year Page LD_1 LD_2 LD_3 LD_4 LD_5 PD_1 PD_2
JA_BI_01 Dorchies, O.M. American Journal 2006 618 -5.78 -4.79 -8.69 -4.19 -1.37 32.00 22.00
1 et al. of Physiology
JA_BI_02 Trivieri, M.G. et Proceedings of the 2006 6043 -9.41 -1.52 -4.81 -5.35 9.42 45.00 23.00
al. National Academy
2 of Sciences
JA_BI_03 Menezes, V.A. et Journal of Natural 2008 2575 -7.11 -2.61 -.51 -5.26 -.30 45.00 27.00
3 al. History
JA_BI_04 Jussier, D. et al. Applied and 2006 222 -15.22 -6.11 -7.81 -.21 -2.91 31.00 15.00
Environmental
4 Microbiology
JA_BI_05 Danielsson, J. et Microbial Ecology 2007 135 -11.80 -9.19 -5.20 1.88 -6.85 53.00 18.50
5 al.
JA_BI_06 Nepstad, D. et al. Conservation 2005 69 -6.56 -6.20 -6.64 -4.74 2.06 34.50 18.50
6 Biology
JA_BI_07 Moraes, S.S. et Journal of Natural 2011 1514 -16.44 -9.15 -9.97 -1.04 -.27 26.00 22.00
7 al. History
JA_BI_08 Mroz, E. & Journal of Natural 2011 1578 -6.65 .56 -9.54 -3.17 -5.72 42.00 21.00
Wojciechowski, History
8 W.
JA_BI_09 Tachi Journal of Natural 2011 1165- -18.04 -5.30 -1.15 -2.38 -.30 37.00 20.00
9 History 1166
JA_BI_10 Ortuno, V.M. & Journal of Natural 2011 1250- -1.25 11.45 -1.04 -3.33 -2.76 36.00 20.00
10 Gilgado, J.D. History 1251
JA_BI_11 Alves-Silva, E. Journal of Natural 2011 394- -9.04 -6.16 -.47 2.44 -5.29 46.00 28.00
11 & Del-Claro, K. History 395
JA_BI_12 Cebeci, H. et al. Journal of Natural 2011 479- -11.98 -1.10 -5.42 -7.61 -7.34 28.50 21.00
12 History 480

169
JA_BI_13 Shnit-Orland, M. Microbial Ecology 2012 851- -10.86 -.76 8.49 -3.76 1.20 48.00 29.00
13 et al. 852
JA_BI_14 Kalscheur, K.N. Microbial Ecology 2012 882 -20.95 -8.65 -7.54 -3.86 -.79 37.00 19.00
14 et al.
15 JA_BI_15 Rosel, S. et al. Microbial Ecology 2012 575 -9.62 -7.27 -8.10 -2.71 -.77 42.50 19.50
16 JA_BI_16 Mulec, J. et al. Microbial Ecology 2012 655 -10.67 -4.73 -4.15 -3.08 .07 53.00 31.00
JA_BI_17 Oakes, P.W. et Journal of Cell 2012 373 -20.55 -5.41 -.82 -1.12 -.48 48.00 23.00
17 al. Biology
JA_BI_18 Perez, J. et al. Microbial Ecology 2012 284- -5.58 2.92 -.29 -2.30 .11 46.00 22.50
18 285
JA_BI_19 Sherman, C. & Microbial Ecology 2012 399 -11.51 -3.81 4.75 -6.24 7.77 54.00 23.00
19 Steinberger, Y.
JA_BI_20 Hayakawa, A. et Journal of Cell 2012 196 -11.64 -6.82 -5.50 -6.30 1.67 35.00 23.00
20 al. Biology
JA_BI_21 Zylbersztejn, K. Journal of Cell 2012 38 -5.62 -7.48 .03 -6.10 -4.02 34.50 19.50
21 et al. Biology
JA_BI_22 Haberman, A. et Journal of Cell 2012 275 -11.87 -8.75 -8.94 1.68 -6.69 33.50 17.50
22 al. Biology
JA_BI_23 Rainero, E. et al. Journal of Cell 2012 277 -8.90 6.72 -3.78 -5.12 -2.46 43.00 22.00
23 Biology
JA_BI_24 Nechipurenko, Journal of Cell 2012 346 -21.67 -6.74 .57 -5.37 -.67 33.00 21.00
I.V. & Broihier, Biology
24 H.T.
JA_BI_25 Wang, Y. & Journal of Cell 2012 384 -19.20 -8.76 -7.05 -3.58 -7.99 34.00 16.50
25 McNiven, M.A. Biology
JA_HI_01 Sauter, M.J. American 2007 685- 1.85 -5.22 3.47 -.54 .88 55.00 36.50
26 Historical Review 686
JA_HI_02 Levine, M. Journal of World 2007 171- 6.61 -3.61 -7.88 -1.13 2.53 52.00 38.00
27 History 172
28 JA_HI_03 Karn, N. Historical Research 2007 305 4.35 -2.20 2.05 .55 -.34 38.00 32.00

170
JA_HI_04 Pritchett, W.E. Journal of Urban 2008 281 2.00 -1.35 .90 -2.32 8.15 59.00 39.00
29 History
JA_HI_05 Lahti, J. The Western 2008 301- 6.24 -3.07 -3.07 6.04 2.51 68.00 43.50
30 Historical Quarterly 302
JA_HI_06 Worth, A. Journal of 2008 1 2.60 1.07 5.06 -3.66 .70 40.00 40.00
Colonialism and
31 Colonial History
JA_HI_07 Rosenberg, C. American 2012 688- -7.65 -2.47 .09 6.45 2.52 54.00 40.00
32 Historical Review 689
JA_HI_08 Cook, J.W. American 2012 768 9.56 -2.43 -4.28 3.32 -1.24 45.00 50.00
33 Historical Review
JA_HI_09 Downs, G.P. American 2012 387 2.13 -.94 -1.68 -2.17 1.33 57.00 39.00
34 Historical Review
JA_HI_10 Monroe, J.W. American 2012 468 -4.53 -4.06 -2.32 -1.55 4.15 59.00 39.00
35 Historical Review
JA_HI_11 Grafton, A. American 2012 24-25 2.09 -2.80 .92 .10 -1.82 53.00 40.00
36 Historical Review
JA_HI_12 Grandin, G. American 2012 82-83 -1.41 -5.25 1.00 -3.09 3.05 53.00 37.00
37 Historical Review
JA_HI_13 Xincheng, L. Journal of World 2012 499- .33 1.79 3.09 -2.80 6.49 50.50 38.00
38 History 500
JA_HI_14 Cao, S. et al. Journal of World 2012 605- .55 -.95 3.05 -1.30 7.77 57.00 38.50
39 History 606
JA_HI_15 Jankiewicz, S. Journal of World 2012 364- 6.79 1.98 20.90 5.90 -.61 * *
40 History 365
JA_HI_16 Woodard, J.P. Journal of World 2012 375- 9.90 -3.33 -2.99 .74 -2.51 45.00 47.00
41 History 376
JA_HI_17 Zhao, B. Journal of World 2012 70-71 -4.01 4.18 1.69 -1.93 7.06 53.00 40.00
42 History
43 JA_HI_18 Huang, E.C. Journal of World 2012 142- -4.34 -4.75 .47 -3.78 9.76 44.00 35.00

171
History 143
JA_HI_19 Olcott, J. Journal of 2012 25 -8.29 -8.08 .71 .94 9.20 57.00 47.00
44 Women’s History
JA_HI_20 Desmazieres, A. Journal of 2012 87 -9.03 -4.97 -2.13 .59 6.04 61.00 42.00
45 Women’s History
JA_HI_21 Yang, B. Journal of 2012 79-80 4.01 -1.50 5.89 -.57 -1.53 * *
46 Women’s History
JA_HI_22 Van Ingen, V. Journal of 2012 140- -.95 -4.30 -3.64 2.18 6.47 67.00 49.00
47 Women’s History 141
JA_HI_23 Aderinto, S. Journal of 2012 1 -3.18 -2.99 -1.24 -2.48 4.70 62.00 43.00
Colonialism and
48 Colonial History
JA_HI_24 Shellam, T. Journal of 2012 1 4.45 .53 5.94 3.40 -1.56 66.00 41.00
Colonialism and
49 Colonial History
JA_HI_25 Alessio, D. & Journal of 2012 1 -4.47 2.60 5.45 -.05 .19 * *
Meredith, K. Colonialism and
50 Colonial History
TB_BI_01 Harlow, W.M. et The Woody Plant 2008 60-61 10.04 2.72 -4.68 -6.39 -.77 75.00 58.00
51 al. Seed Manual
TB_BI_02 Alexopoulous, Introductory 2012 508 5.52 -2.43 .51 -4.16 -2.94 71.00 60.00
52 C.J. et al. Mycology
TB_BI_03 Starr, C. Human Biology 2013 262- .16 -6.28 -6.29 11.67 -3.28 76.50 69.50
53 263
TB_BI_04 Wilson, B.A. et Bacterial 2012 131 -3.43 2.97 1.89 1.23 3.64 66.00 42.00
54 al. Pathogenesis
TB_BI_05 King, J. Reaching for the 2011 246- -3.76 -6.29 -6.65 3.57 -1.06 68.50 51.50
Sun: How Plants 247
55 Work
56 TB_BI_06 Bitton, G. Wastewater 2011 132- -7.30 6.80 -5.99 -5.63 -1.04 44.50 28.50

172
Microbiology 133
TB_BI_07 Singh, U.S. & Introductory 2010 196 -5.25 -1.79 -4.89 -2.30 -1.99 80.00 50.00
57 Kapoor, K. Microbiology
TB_BI_08 Waites, M.J. et Industrial 2009 75 -5.19 6.48 .73 -.60 4.70 72.00 49.00
al. Microbiology: An
58 Introduction
TB_BI_09 Gaston, K.J. & Biodiversity: An 2009 10-12 10.45 24.97 2.35 -5.59 -2.84 70.00 52.00
59 Spicer, J.I. Introduction
TB_BI_10 Wilson, M. Bacteriology of 2009 18-20 10.06 9.03 -5.94 -6.59 .50 73.00 49.50
Humans: An
Ecological
60 Perspective
61 TB_BI_11 Lee, R.E. Phycology 1999 322 -3.91 -2.65 -8.29 -5.29 -4.39 62.00 49.00
TB_BI_12 Montville, T.J. Food Microbiology: 2012 5-6 .23 .60 8.50 1.22 -2.42 70.50 42.50
62 An Introduction
TB_BI_13 Frankham, R. et A Primer of 2004 56-57 -1.35 -1.99 -7.71 -1.90 9.07 67.00 45.00
al. Conservation
63 Genetics
TB_BI_14 Dickison, W.C. Integrative Plant 2000 464 -3.80 11.36 -2.20 -2.65 3.93 68.00 32.00
64 Anatomy
TB_BI_15 Heritage, J. et al. Microbiology in 1999 134- -.42 16.32 -1.13 -2.27 -.56 78.00 56.50
65 Action 135
TB_BI_16 Barton, L.L. & Microbial Ecology 2011 362- -3.99 12.20 -4.55 -3.83 8.84 53.00 48.00
66 Northrup, D.E. 363
TB_BI_17 Draelos, Z. & Physiology of the 2011 228 -4.20 9.87 -1.35 -.57 -4.40 66.00 57.00
67 Pugliese, P.T. Skin
TB_BI_18 Watson, C. et al. Brain: An 2010 161 -5.13 7.66 -4.79 -.61 3.44 64.00 58.00
Introduction to
Functional
68 Neuroanatomy

173
TB_BI_19 Sanders, E.R. & I, Microbiologist 2010 10-12 -2.66 3.21 3.35 -5.68 8.82 64.00 48.00
69 Miller, J.H.
TB_BI_20 Schneider, D.C. Quantitative 2009 49-50 -3.83 16.73 1.76 -1.82 4.08 70.00 43.50
Ecology:
Measurement,
70 Models and Scaling
TB_BI_21 Jorgensen, S.E. Ecological 2009 133 1.83 27.87 -1.73 -6.54 4.92 74.00 65.00
Modelling: An
71 Introduction
TB_BI_22 Heifman, G. et The Diversity of 2009 75 7.68 3.60 -2.30 -.80 7.07 65.50 37.00
al. Fishes: Biology,
Evolution, and
72 Ecology
TB_BI_23 Closs, G. et al. Freshwater 2009 93-94 2.52 3.08 -4.75 .74 -3.97 66.00 52.00
Ecology: A
Scientific
73 Introduction
TB_BI_24 Gillman, M. An Introduction to 2009 45-46 6.39 16.64 14.04 -6.08 -7.00 73.50 55.50
Mathematical
Models in Ecology
and Evolution:
74 Time and Space
TB_BI_25 Miller, C.B. et Biological 2012 247- 3.50 4.12 -7.20 -4.40 -1.39 66.00 42.00
75 al. Oceanography 248
TB_HI_01 Kennedy, D.M. The American 2011 41 -3.62 -.49 -1.54 6.34 -1.04 66.00 45.00
76 & Cohen, L. Pageant, Volume I
TB_HI_02 Brinkley, A. American History: 2003 836- 1.18 -5.27 -3.03 2.63 2.40 71.00 41.00
77 A Survey 837
TB_HI_03 Boyer, P. & Salem Possessed: 2003 37-38 5.40 -2.81 -3.83 2.41 -8.87 49.50 49.00
78 Nissenbaum, S. The Social Origins

174
of Witchcraft
TB_HI_04 Novick, P. That Noble Dream: 1999 206 7.97 8.45 -2.08 7.77 2.83 65.00 55.00
The ‘Objectivity
Question’ and the
American
Historical
79 Profession
TB_HI_05 Dull, J.R. American Naval 2012 35-36 -2.40 2.06 -2.76 5.04 1.63 67.00 50.00
History, 1607-
1865: Overcoming
the Colonial
80 Legacy
TB_HI_06 Holden, R.H. & Contemporary 2012 101 5.76 -2.36 -1.52 -2.14 8.84 69.00 49.00
81 Villars, R. Latin America
TB_HI_07 Thornton, J.K. A Cultural History 2012 159- 8.60 -1.19 -4.75 -.75 .55 60.00 50.00
of the Atlantic 160
82 World, 1250-1820
TB_HI_08 Martin, T.B. Ancient Greece 2013 36- -.70 -5.95 11.30 8.10 -1.90 61.00 58.00
83 337
TB_HI_09 Galinsky, K. Augustus: 2012 105 12.56 2.19 -.17 1.73 -4.11 * *
Introduction to the
84 Life of an Emperor
TB_HI_10 Jackson, B. & Margaret 2012 171 5.69 5.46 -.59 .14 4.21 70.00 51.00
85 Saunders, R. Thatcher’s Britain
TB_HI_11 Magness, J. The Archaeology of 2012 350- -7.54 -7.95 -2.17 6.43 -3.77 69.00 48.00
the Holy Land: 352
From the
Destruction of
Solomon’s Temple
86 to the Muslim

175
Conquest
TB_HI_12 Payne, S.G. The Spanish Civil 2012 104- 5.42 -3.29 -5.34 2.75 2.11 65.00 42.00
87 War 105
TB_HI_13 Ferraro, J.M. Venice: History of 2012 201- -1.41 -5.55 -6.98 5.05 -3.78 69.00 64.00
88 the Floating City 211
TB_HI_14 McDonald, Sons of the Father: 2013 139- .65 -.56 -4.09 7.26 -4.42 61.00 53.00
R.M.S. George Washington 140
89 and His Proteges
TB_HI_15 Schroeder, R.A. Africa after 2012 162 -1.12 1.72 -1.24 -.17 5.73 65.00 44.00
Apartheid: South
Africa, Race, and
90 Nation in Tanzania
TB_HI_16 Kingston, J. Contemporary 2011 120- .12 -1.93 -1.75 1.18 -.67 66.00 45.00
Japan: History, 121
Politics, and Social
Change since the
91 1980s
TB_HI_17 Tolan, J. et al. Europe and the 2012 388 -5.41 -6.78 -3.63 4.94 1.34 75.00 65.00
Islamic World: A
92 History
TB_HI_18 Ismael, T.Y. & Government and 2011 1 2.40 -4.58 -6.79 -5.47 2.00 63.00 55.00
Ismael, J.S. Politics of the
Middle East:
Continuity and
93 Change
TB_HI_19 Janesick, V.J. Oral History for the 2010 179- 12.89 -.93 9.21 -5.88 6.44 74.00 53.00
Qualitative 180
Researcher:
Choreographing the
94 Story

176
TB_HI_20 Crothers, L. Globalization and 2012 182- 6.19 13.00 4.93 -1.84 11.25 66.00 44.00
American Popular 184
95 Culture
TB_HI_21 Keating, A.D. Rising Up from 2012 236 -.60 -.12 .91 4.70 -4.53 65.00 38.00
Indian Country:
The Battle of Fort
Dearborn and the
96 Birth of Chicago
TB_HI_22 Candlin, K. The Last Caribbean 2012 98- 3.71 -4.97 -4.06 3.80 1.91 60.50 44.00
97 Frontier, 1795-1815 100
TB_HI_23 Lehoucq, F. The Politics of 2012 86 -6.36 -3.68 7.30 .54 1.32 69.00 49.00
Modern Central
America: Civil
War,
Democratization,
and
98 Underdevelopment
99 TB_HI_24 Kim, J. A History of Korea 2012 84-85 -3.99 -6.32 -.84 2.11 -2.44 67.00 44.00
TB_HI_25 Confer, C.W. Daily Life during 2011 204 8.30 -3.13 -.46 3.81 -.95 65.50 45.00
100 the Indian Wars
PA_BI_01 Alvarez, W. T. Rex and the 1997 3-4 12.04 .61 -1.50 1.30 -6.90 50.00 30.50
101 Crater of Doom
PA_BI_02 Andrews, L. The Clone Age 2000 1-2 -1.86 -2.72 5.40 9.65 -7.57 59.00 32.00
Adventures in the
New World of
Reproductive
102 Technology
PA_BI_03 Fortey, R. Trilobite! 2001 24-25 6.31 5.79 9.04 2.31 -6.89 69.00 53.50
Eyewitness to
103 Evolution

177
PA_BI_04 Goudsmit, J. Viral Sex: The 1998 8-9 3.00 -.03 2.01 -2.16 -2.82 68.50 41.50
104 Nature of Aids
PA_BI_05 Jolly, A. Lucy's Legacy Sex 2001 12 -1.62 -3.78 3.39 .30 .84 69.00 39.00
and Intelligence in
105 Human Evolution
PA_BI_06 Loewenstein, The Touchstone of 1998 9-10 6.72 5.79 -3.57 -5.75 -1.91 49.00 22.00
W.R. Life Molecular
Information, Cell
Communication,
and the
106 Foundations of Life
PA_BI_07 Clark. W.R. A Means to an End: 1999 3-4 12.95 .80 .20 -2.03 .21 55.50 30.00
The Biological
Basis of Aging and
107 Death
PA_BI_08 De Waal, F. Bonobo: The 1998 3 15.22 7.78 5.16 -4.19 1.99 53.00 29.00
108 Forgotten Ape
PA_BI_09 Dingus, L. The Mistaken 1998 7-8 3.50 6.43 4.32 .13 -1.62 60.00 39.00
Extinction:
Dinosaur Evolution
and the Origin of
109 Birds
PA_BI_10 Eisenberg, E. The Ecology of 1999 6-7 3.97 .39 -2.92 5.46 -4.20 59.00 31.50
110 Eden
PA_BI_11 Eldredge, N. Life in the Balance: 2000 9-10 7.41 -2.35 -4.73 .40 -2.62 40.00 21.00
Humanity and the
111 Biodiversity Crisis
PA_BI_12 Emsley, J. Molecules at an 1999 3-4 -3.50 3.49 -1.24 .98 -8.85 70.00 42.00
Exhibition:
112 Portraits of

178
Intriguing Materials
in Everyday Life
PA_BI_13 Greene, J.C. Debating Darwin: 1999 34-35 2.68 1.42 6.08 1.63 6.55 58.00 28.50
Adventures of a
113 Scholar
PA_BI_14 Greene, H. Snakes: The 1997 20 .37 -2.61 -4.38 -3.19 -1.22 55.00 27.00
Evolution of
114 Mystery in Nature
PA_BI_15 Hauser, M.D. Wild Minds: What 2001 6-7 8.27 3.98 8.61 6.62 -5.55 66.00 33.00
Animals Really
115 Think
PA_BI_16 Holub, M. Shedding Life: 1997 5-6 9.07 -1.20 2.23 -2.53 -.12 59.00 32.00
Disease, Politics,
and Other Human
116 Conditions
PA_BI_17 Hrdy, S.B. Mother Nature: A 1999 4-5 10.05 2.27 3.97 3.65 .72 65.50 33.50
History of Mothers,
Infants, and Natural
117 Selection
PA_BI_18 Jones, S. Darwin's Ghost: 2001 2-3 2.43 10.96 3.15 -2.87 -1.44 61.50 23.50
The Origin of
118 Species Updated
PA_BI_19 Tattersall, I. & Extinct Humans 2001 17 2.51 -.51 1.88 -1.36 -1.26 49.00 29.00
119 Schwartz, J.
PA_BI_20 Kurlansky, M. Cod: A Biography 1997 21-22 9.98 2.35 -4.00 5.12 -6.39 45.00 31.00
of the Fish That
120 Changed the World
PA_BI_21 Lowman, M.D. Life in the 2000 14-15 -1.68 .79 12.87 -.11 -.48 47.00 29.00
Treetops:
121 Adventures of a

179
Woman in Field
Biology
PA_BI_22 Lieberman, P. Eve Spoke: Human 1998 6-7 8.54 3.03 5.48 -4.29 7.60 61.50 36.00
Language and
122 Human Evolution
PA_BI_23 Matthiessen, P. The Birds of 2001 4-5 5.84 -2.70 -2.47 -4.54 -5.51 72.00 47.00
Heaven: Travels
123 With Cranes
PA_BI_24 Miller, G. The Mating Mind: 2000 3 6.58 .06 18.99 1.36 9.43 55.00 32.00
How Sexual Choice
Shaped the
Evolution of
124 Human Nature
PA_BI_25 Ryan, F. Virus X: Tracking 1998 1-2 1.48 -5.79 -3.73 2.87 -4.91 55.00 30.00
the New Killer
Plagues Out of the
Present and Into the
125 Future
PA_HI_01 Amar, A.R. The Bill of Rights 1998 3-4 1.71 -3.76 3.75 -3.40 4.92 68.00 47.00
Creation and
126 Reconstruction
PA_HI_02 Oldstone, Viruses, Plagues, 2010 7-8 -4.62 2.28 -1.66 -.73 -5.00 73.50 50.00
127 M.B.A. and History
PA_HI_03 Akenson, D.H. Surpassing Wonder 2001 28-29 11.04 3.01 -.90 3.97 -6.69 69.00 39.00
The Invention of
the Bible and the
128 Talmuds
PA_HI_04 Andrew, C. & The Sword and the 1999 10-12 .23 -3.25 7.01 10.19 -8.00 44.00 42.00
Mitrokhin, V. Shield The
129 Mitrokhin Archive

180
and the Secret
History of the KGB
PA_HI_05 Barkan, E. The Guilt of 2001 10-12 1.19 -2.25 9.13 -2.37 4.49 69.00 40.00
Nations Restitution
and Negotiating
130 Historical Injustices
PA_HI_06 Black, E. IBM and the 2002 7-8 .23 -2.49 -5.00 2.90 -.24 60.00 40.00
Holocaust The
Strategic Alliance
Between Nazi
Germany and
America's Most
Powerful
131 Corporation
PA_HI_07 Callahan, D. Unwinnable Wars: 1998 26-27 7.44 13.32 -3.98 2.17 5.14 70.00 43.00
American Power
132 and Ethnic Conflict
PA_HI_08 Carroll, J. Constantine's 2002 5-6 -.42 1.30 12.43 -1.62 -7.28 66.00 47.50
Sword: The Church
and the Jews: A
133 History
PA_HI_09 Chang, G.G. The Coming 2001 1-2 3.15 .53 -3.76 5.45 -5.89 51.50 45.50
134 Collapse of China
135 PA_HI_10 Dahl, R.A. On Democracy 2000 3-4 15.82 10.22 7.39 .10 -3.75 57.00 38.00
PA_HI_11 Cohen, S.F. Failed Crusade: 2001 8-9 -.98 -1.91 7.41 -.48 5.94 70.00 41.00
America and the
Tragedy of Post-
136 Communist Russia
PA_HI_12 Davies, N. The Isles: A 1999 7-8 -1.84 -4.02 -6.11 5.81 -4.71 70.00 42.00
137 History

181
PA_HI_13 Davis, K.S. FDR: The War 2000 3-4 9.76 1.59 -.60 6.34 -7.88 61.00 41.00
President, 1940-
138 1943. A History
PA_HI_14 Robert, N. De The Illustrated 2009 6-7 4.64 -2.09 1.30 -.92 -.56 67.00 43.00
Lange, M. History of the
139 Jewish People
PA_HI_15 Diggins, J.P. On Hallowed 2000 19-20 8.65 3.82 11.05 -1.12 .38 71.00 51.00
Ground: Abraham
Lincoln and the
Foundations of
140 American History
PA_HI_16 Eban, A.S. Diplomacy for the 1998 8-9 -3.10 -4.11 -2.01 .14 4.94 67.00 44.00
141 Next Century
PA_HI_17 Eisenhower, J. Yanks: The Epic 2001 1 3.47 .61 6.32 1.81 -6.00 64.00 37.00
Story of the
American Army in
142 World War I
PA_HI_18 Ellis, J.J. Founding Brothers: 2003 3-4 8.00 3.35 13.27 2.62 -.89 68.00 45.00
The Revolutionary
143 Generation
PA_HI_19 Farmer, S.B. Martyred Village: 1999 20-21 -2.69 -6.13 3.02 9.70 -5.47 62.00 59.00
Commemorating
the 1944 Massacre
at Oradour-sur-
144 Glane
PA_HI_20 Freeman, J.B. Working-Class 2001 6-8 -1.12 -5.04 3.20 9.21 .07 69.00 49.00
New York: Life and
Labor Since World
145 War II
146 PA_HI_21 Gaddis, J.L. We Now Know: 1997 2 12.19 -5.32 -3.59 .28 -.83 67.00 49.50

182
Rethinking Cold
War History
PA_HI_22 Gallagher, G.W. The Confederate 1999 8-9 .24 -3.69 3.14 -1.45 3.37 63.00 42.00
147 War
PA_HI_23 Glendon, M.A. A World Made 2001 4-6 -.30 -3.10 8.75 6.76 -1.67 54.50 33.50
New: Eleanor
Roosevelt and the
Universal
Declaration of
148 Human Rights
PA_HI_24 Gonzalez, J. Harvest of Empire: 2011 1-3 2.43 -4.60 -.61 4.96 -5.79 52.00 31.00
A History of
149 Latinos in America
PA_HI_25 Goodman, P. Of One Blood: 2000 5-6 2.39 -5.85 -3.78 .15 3.77 73.50 51.00
Abolitionism and
the Origins of
150 Racial Equality

*Perceptual scores for these texts were found to be unreliable. Therefore, this data was eliminated from the study.

183
APPENDIX B. SEMANTIC CLASSES OF NOUNS, VERBS, AND ADJECTIVES
AND FORMULAIC LANGUAGE LISTS

Table B1. Semantic classes of nouns (see Biber, 2006b).

Cognition Nouns
ability, analysis, assessment, assumption, attention, attitude, belief, calculation,
concentration, concept, concern, conclusion, consciousness, consequence,
consideration, decision, desire, emotion, evaluation, examination, expectation,
experience, fact, feeling, hypothesis, idea, judgment, knowledge, look, memory,
need, notion, observation, opinion, perception, perspective, possibility,
probability, reason, recognition, relation, responsibility, sense, theory, thought,
understanding, view
Animate Nouns
American, Indian, accountant, adult, adviser, adviser, agent, aide, ancestor,
animal, anthropologist, applicant, archaeologist, artist, artiste, assistant, associate,
attorney, audience, auditor, author, baby, bachelor, bird, boss, boy, brother,
Buddha, buyer, candidate, cat, child, citizen, client, colleague, collector,
competitor, consumer, counselor, couple, critic, customer, daughter, dean, deer,
defendant, designer, developer, director, doctor, dog, dr., driver, economist,
employee, employer, engineer, engineer, executive, expert, faculty, family,
farmer, father, female, feminist, freshman, friend, geologist, girl, god, graduate,
guy, hero, historian, host, hunter, husband, immigrant, individual, infant,
instructor, investor, Jew, judge, kid, king, lady, lawyer, leader, learner, listener,
maker, male, man, manager, manufacturer, member, miller, minister, mom,
monitor, monkey, mother, Mr., neighbor, observer, officer, official, owner, parent,
participant, partner, patient, peer, people, person, personnel, physician, plaintiff,
player, poet, police, president, processor, professional, professor, provider,
psychologist, reader, researcher, resident, respondent, schizophrenic, scholar,
scientist, secretary, server, shareholder, Sikh, sister, slave, son, speaker, species,
spouse, student, supervisor, supplier, teacher, theorist, tourist, undergraduate,
user, victim, wife, woman, worker, writer
Technical Nouns
angle, atom, bacteria, bill, carbon, cell, center, chapter, chromosome, circle,
cloud, component, compound, data, diagram, DNA, electron, element, equation,
exam, fire, formula, gene, graph, hydrogen, internet, ion, iron, isotope, jury, layer,
lead, letter, light, list, margin, mark, matter, message, mineral, mineral, molecule,
neuron, nuclei, nucleus, organism, oxygen, page, paragraph, particle, play, poem,
proton, ray, sample, schedule, sentence, software, solution, square, star, statement,
thesis, unit, unit, virus, wave, web, word
Other Abstract Nouns
absence, account, action, address, advantage, aid, alternative, aspect, authority,
axis, background, balance, base, beginning, benefit, bias, bond, capital, care,
career, cause, characteristic, charge, check, choice, circuit, circumstance, climate,
code, color, column, combination, complex, condition, connection, constant,
constraint, contact, content, context, contract, contrast, crime, criteria, cross,

184
culture, current, curriculum, curve, debt, density, design, detail, dimension,
direction, disorder, diversity, economy, emergency, emphasis, employment, end,
equilibrium, equity, error, expense, facility, factor, failure, fallacy, feature,
format, freedom, fun, gender, goal, grade, grammar, health, heat, help, identity,
image, impact, importance, influence, information, input, interest, issue, job, kind,
labor, language, law, leadership, level, life, link, manner, math, matrix, meaning,
model, music, name, nature, network, objective, opportunity, option, order, origin,
output, past, pattern, phase, philosophy, plan, policy, position, potential, power,
prerequisite, presence, pressure, principle, profile, profit, proposal, psychology,
quality, quiz, race, reality, relationship, religion, requirement, resource, respect,
rest, return, right, risk, role, rule, scene, science, security, series, set, setting, sex,
shape, share, show, side, sign, signal, situation, skill, sort, sound, source, spring,
stage, standard, start, state, stimulus, strength, stress, structure, style, subject,
substance, success, support, survey, symbol, system, topic, track, trait, trouble,
truth, type, value, variation, variety, velocity, version, way, whole
Process Nouns
accounting, achievement, act, action, activity, addition, administration, admission,
agreement, answer, application, approach, argument, arrangement, assignment,
attempt, attendance, birth, break, change, claim, comment, comparison,
competition, conflict, construction, consumption, contribution, control,
counseling, criticism, deal, death, debate, definition, demand, description,
development, discrimination, discussion, distribution, division, education, effect,
eruption, evolution, exchange, exercise, experiment, explanation, expression,
flow, formation, function, generation, graduation, management, marketing,
marriage, mechanism, meeting, method, operation, orientation, performance,
practice, presentation, procedure, process, production, progress, question,
reaction, registration, regulation, research, result, revolution, selection, service,
session, strategy, study, talk, task, teaching, technique, test, trade, tradition,
training, transfer, transition, treatment, trial, use, war, work
Concrete Nouns
acid, alcohol, aluminum, arm, artifact, asteroid, automobile, award, bag, ball,
banana, band, bar, basin, bed, bell, belt, block, board, boat, body, bone, book,
box, brain, branch, bubble, bud, bulb, bulletin, button, cake, camera, cap, car,
card, case, cent, chain, chair, chart, clay, clock, clothing, club, comet, computer,
copper, copy, counter, cover, crop, crystal, cylinder, deposit, desk, device, dinner,
disk, document, dollar, door, dot, drain, drawing, drink, drop, drug, dust, edge,
engine, envelope, equipment, eye, face, fiber, fig, file, film, filter, finger, fish,
flower, food, foot, frame, fruit, furniture, game, gap, gate, gel, gift, glacier, grain,
gun, hair, hand, handbook, handout, head, heart, ice, instrument, item, journal,
key, knot, lava, leaf, leg, lemon, liquid, load, machine, magazine, magnet, mail,
manual, map, marker, match, metal, mixture, modem, mole, motor, mound,
mouth, movie, mud, muscle, mushroom, nail, newspaper, node, note, notice,
novel, oak, object, package, page, paper, peak, pen, pencil, phone, picture, pie,
piece, pipe, plant, plate, pole, portrait, post, pot, pottery, radio, rain, reactor,
resistor, retina, ridge, ring, ripple, rock, root, salt, sand, score, screen, sculpture,
seat, seawater, sediment, seed, sheet, shell, ship, silica, slide, slope, snow, sodium,

185
soil, solid, solution, space, sphere, spot, statue, steam, steel, stem, step, stick,
stone, strata, string, sugar, syllabus, table, tank, tape, target, telephone, telescope,
textbook, ticket, tip, tissue, tool, tooth, train, transcript, transistor, tree, truck,
tube, vehicle, vessel, video, visa, wall, water, water, wheel, window, wire, wood

Table B2. Semantic classes of verbs.

Activity Verbs (see Biber, 2006b)


accompany, acquire, add, advance, apply, arrange, beat, behave, borrow,
bring, burn, buy, carry, catch, check, clear, climb, combine, come, control,
cover, defend, deliver, dig, divide, earn, eat, encounter, engage, exercise,
expand, explore, fix, form, get, give, go, hang, hold, left, lie, lose, made, meet,
move, obtain, obtain, open, pay, pick, play, roduce, provide, pull, put, react,
receive, reduce, repeat, run, save, sell, send, shake, share, show, sit, smile,
smile, spend, stare, take, throw, try, turn, use, visit, wait, walk, watch, wear,
win, work
Aspectual Verbs (see Biber, 2006b)
begin, cease, complete, continue, end, finish, keep, start, stop
Communication Verbs (see Biber, 2006b)
accuse, acknowledge, address, advise, announce, answer, appeal, argue, ask,
assure, challenge, claim, complain, consult, convince, declare, demand, deny,
describe, discuss, emphasize, encourage, excuse, explain, express, inform,
insist, invite, mention, offer, offer, persuade, phone, pray, promise, propose,
question, quote, recommend, remark, reply, response, say, shout, sign, sing,
speak, specify, state, suggest, swear, teach, tell, thank, threaten, urge, warn,
welcome, whisper, write
Mental Verbs (see Biber, 2006b)
accept, afford, agree, appreciate, approve, assess, assume, bear, believe, blame,
bother, calculate, conclude, care, celebrate, compare, confirm, consider, count,
dare, decide, deserve, detect, determine, discover, dismiss, distinguish, doubt,
enjoy, examine, expect, experience, face, fear, feel, find, forget, forgive, guess,
hate, hear, hope, identify, ignore, imagine, impress, intend, interpret, judge,
justify, know, learn, like, listen, love, mean, mind, miss, need, notice, observe,
perceive, plan, predict, pretend, prove, read, realize, recall, reckon, recognize,
regard, remember, remind, satisfy, see, solve, study, suffer, suppose, suspect,
think, trust, understand, want, wonder, worry
Suasive Verbs (see Quirk et al., 1985: 1182-3)
agree, allow, arrange, ask, beg, command, concede, decide, decree, demand,
desire, determine, enjoin, ensure, entreat, grant, insist, instruct, intend, move,
ordain, order, pledge, pray, prefer, pronounce, propose, recommend, request,
resolve, rule, stipulate, suggest, urge, vote
Common Phrasal Verbs (see Biber et al., 1999; Gardner & Davies, 2007)
go on, carry out, set up, pick up, go back, come back, go out, point out, find out,
come up, make up, take over, come out, come on, come in, go down, work out, set
out, take up, get back, sit down, turn out, take on, give up, get up, look up, carry

186
on, go up, get out, take out, come down, put down, put up, turn up, get on, bring
up, bring in, look back, look down, bring back, break down, take off, go off, bring
about, go in, set off, put out, look out, take back, hold up, get down, hold out, put
on, bring out, move on, turn back, put back, go round, break up, come along, sit
up, turn round, get in, come round, make out, get off, turn down, bring down,
come over, break out, go over, turn over, go through, hold on, pick out, sit back,
hold back, put in, move in, look around, take down, put off, come about, go along,
look round, set about, turn off, give in, move out, come through, move back,
break off, get through, give out, come off, take in, give back, set down, move up,
turn around, stand up, shut up, go ahead, run out, step up, walk in, build up, fill in,
keep up, pull up, pull down, sort out, take away, turn on, wake up, call in, grow
up, set in, hang on, keep on, made

Table B3. Semantic classes of adjectives (see Biber 2006b).

Topical Adjectives
economic, human, international, local, national, natural, normal, oral, physical,
political, public, public, sexual, social

Table B4. Formulaic Language.

Common Lexical Bundles in University Textbooks (see Biber, Conrad, & Cortes,
2004)
as a result of, in the form of, in the united states, on the basis of, the nature of the,
the size of the, a large number of, a result of the, a wide range of, a wide variety
of, as shown in figure, in a number of, in addition to the, in the absence of, in this
chapter we, on the one hand, the basis of the, the center of the, the extent to
which, the magnitude of the, the sum of the, can be used to, are more likely to, is
based on the, is likely to be, it is important to, it is possible to, shown in figure 1,
as well as the, more likely to be, one or more of
Academic Formulas List (Simpson-Vlach & Ellis, 2010)
on the other hand, as a result of the, due to the fact that, on the other hand the, it
should be noted, it is not possible to, a wide range of, there are a number of, in
such a way that, take into account the, as can be seen, it is clear that, take into
account, can be used to, in this paper we, are likely to, in the next section, a large
number of, the united kingdom, on the basis of the, that there is no, over a period
of, can be seen in, a wide range, there are a number, it is interesting to, it is
impossible to, it is obvious that, it is possible to, it is not possible, been carried
out, can be found in, it is important to, was carried out, is likely to be, wide range
of, the same way as, due to the fact, in accordance with the, it is necessary to, the
other hand, can be seen, it is likely that, such a way that, to carry out, it is possible
that, with respect to the, give rise to, carried out by, whether or not the, in the
present study, should be noted, be carried out, the other hand the, does not appear,
his or her, is not possible to, shown in figure, be used as a, for the purposes of, be

187
regarded as, to ensure that the, allows us to, it has been, little or no, carried out in,
to distinguish between, in accordance with, they do not, at this stage, is based on
the, shown in table, in the absence of, we have seen, to determine whether, in the
context of, a high degree, the difference between the, an increase in the, it is
possible, can be achieved, insight into the, can be expressed, we assume that, they
did not, there has been, on the part of, in this paper, the purpose of this, less likely
to, a large number, can easily be, with regard to, there are several, over a period,
in this case the, in conjunction with, at the time of, we do not, has been used,
appears to be, to do so, there are no, on the other, has also been, it is worth, can be
found, the next section, are a number of, this paper we, be seen as, be related to
the, to ensure that, it is important, be explained by, same way as, see for example,
the presence of a, that it is not, in some cases, to the fact that, high levels of, most
likely to, it appears that, it follows that, can also be, it is clear, by virtue of, the
most important, an attempt to, it is impossible, factors such as, is consistent with,
total number of, similar to those, as part of the, can be considered, at the outset, in
more detail, should not be, could be used, appear to be, as a consequence, in this
article, assumed to be, as a whole, important role in, it is interesting, does not
have, none of these, as shown in, is likely to, this means that, be noted that, be
achieved by, depends on the, at least in, a small number, in table 1, in most cases,
depending on the, in both cases, the validity of the, small number of, their ability
to, need not be, needs to be, have shown that, it is necessary, been shown to, such
as those, are as follows, for this purpose, is determined by, it is difficult, even
though the, this does not, was based on, in the course of, degree to which, be
argued that, in terms of a, for this reason, are based on, two types of, the total
number, is more likely, which can be, are able to, be considered as, be used to, b
and c, depend on the, is that it is, is affected by, should also be, if they are

188
APPENDIX C. SCREE PLOT OF THE SIX-FACTOR SOLUTION FOR THE
LINGUISTIC DATA

Figure C1. Scree plot of the six-factor solution for the linguistic data.

Scree Plot
6
Eigen values of factors

5
4
3
2
1
0

0 10 20 30 40 50 60

factor number

189
APPENDIX D. FULL FACTORIAL STRUCTURE MATRIX OF THE SIX-
FACTOR SOLUTION FOR THE LINGUISTIC DATA

Table D1. Factorial structure matrix of the six-factor solution for the linguistic data.

Feature Factor
1 2 3 4 5 6
phrasal 0.16 -0.04 -0.02 0.47 -0.06 -0.05
lex_bun -0.08 0.58 -0.03 -0.3 0.28 0
length -0.15 -0.04 -0.08 -0.33 0.65 -0.11
core_1.500 0.61 0.13 0.41 0.45 0.02 0.08
core__501.3000 -0.13 0.03 -0.08 -0.08 0.42 -0.09
acad -0.36 0.19 -0.15 -0.64 0.5 -0.08
NN -0.73 -0.14 -0.15 -0.25 0.01 -0.1
pres 0.21 0.73 0.05 -0.37 -0.12 0.08
pdem 0.15 0.4 -0.01 -0.07 0.03 0.05
gen_emph 0.36 0.09 0.23 0.1 0.06 -0.05
pro1 0.21 0.04 0.36 -0.03 -0.08 -0.15
it 0.24 0.34 0.17 0.27 -0.15 0.15
be_state 0.08 0.59 0.12 -0.16 -0.11 0.12
pany 0.06 0.08 0.29 0.16 -0.23 0.08
amplifr 0.43 0.11 -0.13 -0.3 -0.12 0.05
pos_mod 0.11 0.67 0.16 -0.15 -0.06 0.08
o_and 0.39 -0.11 -0.11 0.03 -0.13 -0.19
n -0.73 -0.25 -0.23 -0.33 0.1 -0.06
prep 0.13 -0.29 -0.15 0.06 0.21 0.44
adj_attr 0.17 -0.15 -0.17 -0.25 0.53 0.05
pasttnse -0.15 -0.54 0.31 0.59 -0.09 -0.19
pro3 0.19 -0.15 0.36 0.39 -0.21 0.02
rel_obj 0.12 -0.12 0.04 -0.02 -0.28 0.45
rel_subj -0.24 0.14 0.24 -0.01 -0.19 0.54
rel_pipe 0.09 0.09 -0.06 -0.03 0.07 0.44
n_nom -0.11 0.11 0.17 -0.06 0.71 -0.18
tm_adv 0.01 0.03 0.05 0.3 -0.32 0.12
advs 0.59 0.12 -0.16 0.12 -0.06 -0.04
inf 0.01 0.21 0.53 0.46 0.13 0.03
prd_mod 0.29 0.38 0.15 0.14 0.06 -0.05
sua_vb -0.12 -0.07 0.42 0.15 0.03 0.07
spl_aux 0.06 0.27 -0.07 0.02 -0.06 -0.1
conjncts 0.34 0.17 -0.06 -0.02 0.19 0.14
agls_psv -0.42 0.17 -0.21 -0.18 -0.19 -0.16
sub_othr 0.3 -0.05 0.09 0.27 -0.1 -0.33
downtone 0.27 0.06 -0.18 -0.13 -0.1 -0.1

190
pred_adj 0.22 0.53 -0.07 -0.26 -0.07 -0.05
allconj 0.51 -0.07 -0.04 0.08 -0.12 -0.28
allwhrel -0.1 0.13 0.15 -0.06 -0.26 0.97
have 0.36 -0.05 0.01 -0.15 0.04 -0.12
vprogrsv 0.12 -0.08 -0.04 0.37 -0.04 -0.1
that_rel 0.36 0.23 -0.03 -0.02 0.1 -0.15
nonf_vth -0.21 -0.03 0.53 -0.04 0.04 0.16
fact_vth -0.17 0.04 0.3 -0.17 0.16 -0.15
lkly_vth 0.13 -0.02 0.46 -0.02 0.15 -0.15
factadvl 0.37 -0.02 0.21 0.05 0 -0.07
lklyadvl 0.27 0.04 0.06 -0.04 -0.16 -0.02
all_jth 0.05 0.57 0.07 0.16 0.2 -0.04
all_nth 0.06 0.04 0.38 -0.1 0.05 0.1
all_jto 0.03 0.68 0.01 0.19 0.21 -0.08
humann 0.07 -0.04 0.41 0.07 -0.2 -0.07
prcessn -0.07 0.03 0.26 -0.07 0.59 -0.08
cognitn 0.1 0.03 0.39 -0.15 0.19 0.08
abstrcn 0.09 0.12 0.16 0.04 0.47 0
concrtn -0.15 0.3 -0.28 -0.07 -0.21 -0.12
tccncrtn -0.31 0.19 -0.3 -0.21 -0.04 -0.06
placen 0.03 -0.06 -0.29 0.15 -0.13 0.19
topicj 0.16 -0.13 0.29 -0.03 0.47 0.07
actv -0.2 0 0.03 0.54 -0.12 -0.18
commv -0.07 -0.07 0.58 0.08 0.04 0.11
mentalv 0 0.17 0.58 0.08 0.01 -0.02
aspectv 0.18 -0.12 -0.2 0.44 0 0.1

191
APPENDIX E. SIGNIFICANCE TESTING FOR THE LINGUISTIC DATA

Table E1. ANOVA results for linguistic dimension 1.

Source Type III Sum df Mean Square F Sig.


of Squares
Corrected Model 4359.924a 5 871.985 30.401 .000
Intercept .307 1 .307 .011 .918
Register 2413.024 2 1206.512 42.064 .000
Discipline 601.001 1 601.001 20.953 .000
Register * Discipline 1345.899 2 672.949 23.462 .000
Error 4130.361 144 28.683
Total 8490.592 150
Corrected Total 8490.286 149
a. R Squared = .514 (Adjusted R Squared = .497)

Table E2. Within Discipline Simple Effects for Linguistic Dimension 1

Source of Variation SS DF MS F Sig of F


Register in Biology 3680.90 2 1840.45 64.17 .000
Register in History 78.02 2 39.01 1.36 .260

Table E3. Within Register Simple Effects for Linguistic Dimension 1.

Source of Variation SS DF MS F Sig of F


Discipline in Articles 1842.24 1 1842.24 64.23 .000
Discipline in Textbooks 49.01 1 49.01 1.71 .193
Discipline in Pop. Sci. 55.65 1 55.65 1.94 .166

192
Table E4. ANOVA results for linguistic dimension 2.

Source Type III Sum df Mean Square F Sig.


of Squares
Corrected Model 2029.922a 5 405.984 13.417 .000
Intercept .191 1 .191 .006 .937
Register 842.513 2 421.257 13.922 .000
Discipline 505.266 1 505.266 16.698 .000
Register * Discipline 682.143 2 341.072 11.272 .000
Error 4357.191 144 30.258
Total 6387.304 150
Corrected Total 6387.114 149
a. R Squared = .318 (Adjusted R Squared = .294)

Table E5. Within Discipline Simple Effects for Linguistic Dimension 2.

Source of Variation SS DF MS F Sig of F


Register in Biology 1500.24 2 750.12 24.79 .000
Register in History 24.42 2 12.21 .40 .669

Table E6. Within Register Simple Effects for Linguistic Dimension 2.

Source of Variation SS DF MS F Sig of F


Discipline in Articles 11.52 1 11.52 .38 .538
Discipline in Textbooks 1088.86 1 1088.86 35.99 .000
Discipline in Pop. Sci. 87.03 1 87.03 2.88 .092

193
Table E7. ANOVA results for linguistic dimension 3.

Source Type III Sum df Mean Square F Sig.


of Squares
Corrected Model 837.351a 5 167.470 6.020 .000
Intercept .107 1 .107 .004 .951
Register 513.676 2 256.838 9.233 .000
Discipline 148.631 1 148.631 5.343 .022
Register * Discipline 175.044 2 87.522 3.146 .046
Error 4005.874 144 27.819
Total 4843.332 150
Corrected Total 4843.225 149
a. R Squared = .173 (Adjusted R Squared = .144)

Table E8. Within Discipline Simple Effects for Linguistic Dimension 3.

Source of Variation SS DF MS F Sig of F


Register in Biology 526.75 2 263.37 9.47 .000
Register in History 161.97 2 80.99 2.91 .058

Table E9. Within Register Simple Effects for Linguistic Dimension 3.

Source of Variation SS DF MS F Sig of F


Discipline in Articles 312.74 1 312.74 11.24 .001
Discipline in Textbooks 10.92 1 10.92 .39 .532
Discipline in Pop. Sci. .02 1 .02 .00 .981

194
Table E10. ANOVA results for linguistic dimension 4.

Source Type III Sum df Mean Square F Sig.


of Squares
Corrected Model 839.128a 5 167.826 15.183 .000
Intercept .102 1 .102 .009 .924
Register 358.623 2 179.312 16.223 .000
Discipline 438.520 1 438.520 39.674 .000
Register * Discipline 41.985 2 20.993 1.899 .153
Error 1591.659 144 11.053
Total 2430.889 150
Corrected Total 2430.787 149
a. R Squared = .345 (Adjusted R Squared = .322)

Table E11. Tukey’s post-hoc results for linguistic dimension 4.

Tukey HSD
(I) (J) Register Mean Std. Sig. 95% Confidence Interval
Register Difference (I- Error Lower Upper
J) Bound Bound
TB -1.8424* .73373 .035 -3.5805 -.1043
JA *
PS -3.1565 .73012 .000 -4.8861 -1.4269
*
JA 1.8424 .73373 .035 .1043 3.5805
TB
PS -1.3140 .72240 .167 -3.0254 .3973
*
JA 3.1565 .73012 .000 1.4269 4.8861
PS
TB 1.3140 .72240 .167 -.3973 3.0254
Based on observed means.
The error term is Mean Square(Error) = 12.915.
*. The mean difference is significant at the 0.05 level.

195
Table E12. ANOVA results for linguistic dimension 5.

Source Type III Sum df Mean Square F Sig.


of Squares
Corrected Model 424.669a 5 84.934 4.325 .001
Intercept .120 1 .120 .006 .938
Register 187.856 2 93.928 4.783 .010
Discipline 79.254 1 79.254 4.036 .046
Register * Discipline 157.558 2 78.779 4.012 .020
Error 2827.833 144 19.638
Total 3252.622 150
Corrected Total 3252.502 149
a. R Squared = .131 (Adjusted R Squared = .100)

Table E13. Within Discipline Simple Effects for Linguistic Dimension 5.

Source of Variation SS DF MS F Sig of F


Register in Biology 96.60 2 48.30 2.46 .089
Register in History 248.82 2 124.41 6.34 .002

Table E14. Within Register Simple Effects for Linguistic Dimension 5.

Source of Variation SS DF MS F Sig of F


Discipline in Articles 235.77 1 235.77 12.01 .001
Discipline in Textbooks .47 1 .47 .02 .877
Discipline in Pop. Sci. .57 1 .57 .03 .865

196
APPENDIX F. STYLISTIC PERCEPTION SURVEY

Text Perceptions

After entering some basic personal information, rate the passage you just read on each of
the scales below. There are no right or wrong answers. Try to be as precise as you can
about your judgment. Rating your reaction is done by clicking on the number that best
represents the degree of your opinion. Don’t worry if some of the adjectives puzzle you.
It is your intuitive response I want. So fill in the scales as you “feel” the items should be
judged.
* Required

Please enter your Worker ID *

Please enter the Text ID from the passage you just read *
This can be found in the parentheses after the title

How old are you? *

What is your gender? *


Female
Male

What is the highest level of education you have achieved? *


Some high school
High school graduate
Some college
Bachelors degree
Masters degree or higher

197
1. effective o o o o o o ineffective

2. readable o o o o o o unreadable

3. biased o o o o o o unbiased

4. free o o o o o o constrained

5. abstract o o o o o o concrete

6. graceful o o o o o o awkward

7. vague o o o o o o explicit

8. bad o o o o o o good

9. impartial o o o o o o opinionated

10. exciting o o o o o o dull

11. plain o o o o o o expressive

12. detached o o o o o o interactive

13. emotional o o o o o o unemotional

14. relatable o o o o o o unrelatable

15. intimate o o o o o o distant

16. humorous o o o o o o serious

17. profound o o o o o o superficial

18. varied o o o o o o monotonous

19. boring o o o o o o engaging

20. well-organized o o o o o o poorly organized

21. incomprehensible o o o o o o comprehensible

22. successful o o o o o o unsuccessful

23. dense o o o o o o not dense

198
24. casual o o o o o o formal

25. easy to follow o o o o o o hard to follow

26. undescriptive o o o o o o descriptive

27. conversational o o o o o o academic

28. unclear o o o o o o clear

29. focused o o o o o o not focused

30. entertaining o o o o o o not entertaining

31. unimportant o o o o o o important

32. informative o o o o o o not informative

33. not useful o o o o o o useful

34. modern o o o o o o old-fashioned

35. relevant o o o o o o irrelevant

36. not technical o o o o o o technical

37. personal o o o o o o impersonal

38. undetailed o o o o o o detailed

199
APPENDIX G. SCREE PLOT OF THE TWO-FACTOR SOLUTION FOR THE
PERCEPTUAL DATA

Figure G1. Scree plot of the two-factor solution for the perceptual data.

Scree Plot
15
Eigen values of factors

10
5
0

0 5 10 15 20 25 30

factor number

200
APPENDIX H. FULL FACTORIAL STRUCTURE MATRIX OF THE TWO-
FACTOR SOLUTION FOR THE PERCEPTUAL DATA

Table H1. Full factorial structure matrix for the perceptual data.

Item Factor
1 2
1. effective—ineffective 0.36 0.68
2. readable—unreadable 0.71 0.48
3. biased—unbiased 0.79 -0.26
4. free—constrained 0.85 0.21
5. abstract—concrete 0.7 -0.39
6. graceful—awkward 0.54 0.54
7. vague—explicit 0.66 -0.51
8. bad—good -0.36 -0.68
9. impartial—opinionated -0.82 0.23
10. exciting—dull 0.74 0.44
11. plain—expressive -0.83 -0.22
12. detached—interactive -0.83 -0.24
13. emotional—unemotional 0.88 0
14. relatable—unrelatable 0.73 0.36
15. intimate—distant 0.86 0.19
16. humorous—serious 0.68 -0.08
17. profound—superficial 0.03 0.37
18. varied—monotonous 0.77 0.38
19. boring—engaging -0.7 -0.48
20. well-organized—poorly organized -0.03 0.75
21. incomprehensible—comprehensible -0.54 -0.58
22. successful—unsuccessful 0.29 0.76
23. dense—not dense -0.76 -0.35
24. casual—formal 0.83 0.07
25. easy to follow—hard to follow 0.64 0.56
26. undescriptive—descriptive 0.33 -0.49
27. conversational—academic 0.88 0.05
28. unclear—clear -0.4 -0.64
29. focused—not focused -0.34 0.58
30. entertaining—not entertaining 0.71 0.4
31. unimportant—important 0.1 -0.65
32. informative—not informative -0.46 0.59
33. not useful—useful 0.1 -0.77
34. modern—old-fashioned 0.04 0.38

201
35. relevant—irrelevant 0.15 0.57
36. not technical—technical 0.89 -0.05
37. personal—impersonal 0.9 -0.01
38. undetailed—detailed 0.71 -0.24

202
APPENDIX I. SIGNIFICANCE TESTING FOR THE PERCEPTUAL DATA

Table I1. ANOVA results for perceptual dimension 1.

Source Type III df Mean Square F Sig.


Sum of
Squares
Corrected Model 13470.922a 5 2694.184 46.988 .000
Intercept 19539.472 1 19539.472 340.780 .000
Register 9926.265 2 4963.132 86.560 .000
Discipline 1350.983 1 1350.983 23.562 .000
Register * Discipline 1696.103 2 848.051 14.791 .000
Error 8027.243 140 57.337
Total 40968.000 146
Corrected Total 21498.164 145
a. R Squared = .627 (Adjusted R Squared = .613)

Table I2. Within discipline simple effects for perceptual dimension 1.

Source of Variation SS DF MS F Sig of F


Register in Biology 10253.53 2 5126.76 89.41 .000
Register in History 1866.41 2 933.21 16.28 .000

Table I3. Within register simple effects for perceptual dimension 1.

Source of Variation SS DF MS F Sig of F


Discipline in Articles 2969.30 1 2969.30 51.79 .000
Discipline in Textbooks 91.31 1 91.31 1.59 .209
Discipline in Pop. Sci. 483.61 1 483.61 8.43 .004

203
Table I4. ANOVA results for perceptual dimension 2.

Source Type III df Mean Square F Sig.


Sum of
Squares
Corrected Model 14300.371a 5 2860.074 64.233 .000
Intercept 286865.031 1 286865.031 6442.608 .000
Register 8131.694 2 4065.847 91.314 .000
Discipline 3466.243 1 3466.243 77.847 .000
Register * Discipline 2273.498 2 1136.749 25.530 .000
Error 6233.672 140 44.526
Total 309161.250 146
Corrected Total 20534.043 145
a. R Squared = .696 (Adjusted R Squared = .686)

Table I5. Within discipline simple effects for perceptual dimension 2.

Source of Variation SS DF MS F Sig of F


Register in Biology 9885.93 2 4942.96 111.01 .000
Register in History 948.20 2 474.10 10.65 .000

Table I6. Within register simple effects for perceptual dimension 2.

Source of Variation SS DF MS F Sig of F


Discipline in Articles 4718.29 1 4718.29 105.97 .000
Discipline in Textbooks 8.38 1 8.38 .19 .665
Discipline in Pop. Sci. 1441.85 1 1441.85 32.38 .000

204
APPENDIX J. MULTIPLE REGRESSION OUTPUT

Tables J1 – J2. Multiple regression output for linguistic dimension predictors of


perceptual dimension 1 in the complete corpus.

Model Summary

Model R R Square Adjusted R Std. Error of the Change Statistics


Square Estimate R Square F Change df1
Change
a
1 .448 .201 .195 10.92492 .201 36.121 1
b
2 .477 .227 .217 10.77719 .027 4.975 1
c
3 .528 .279 .263 10.45084 .051 10.071 1
d
4 .569 .324 .305 10.15129 .046 9.504 1
a. Predictors: (Constant), LD1
b. Predictors: (Constant), LD1, LD4
c. Predictors: (Constant), LD1, LD4, LD2
d. Predictors: (Constant), LD1, LD4, LD2, LD5

205
Tables J3 – J4. Multiple regression output for linguistic dimension predictors of
perceptual dimension 2 in the complete corpus.

Model Summary

Model R R Square Adjusted R Std. Error of the Change Statistics


Square Estimate R Square F Change df1
Change
a
1 .401 .161 .155 10.94013 .161 27.565 1
b
2 .460 .211 .200 10.64119 .051 9.205 1
c
3 .494 .244 .228 10.45493 .033 6.141 1
d
4 .526 .276 .256 10.26520 .032 6.298 1

a. Predictors: (Constant), LD1


b. Predictors: (Constant), LD1, LD4
c. Predictors: (Constant), LD1, LD4, LD2
d. Predictors: (Constant), LD1, LD4, LD2, LD5

206
Tables J5 – J6. Multiple regression output for linguistic feature predictors of
perceptual dimension 1 in the complete corpus.

Model Summary
Model R R Square Adjusted R Std. Error of Change Statistics
Square the Estimate R Square F Change df1
Change
1 .609a .371 .366 9.69151 .371 84.886 1
a. Predictors: (Constant), core_1500

Tables J7 – J8. Multiple regression output for linguistic feature predictors of


perceptual dimension 2 in the complete corpus.

Model Summary
Model R R Square Adjusted R Std. Error of Change Statistics
Square the Estimate R Square F Change df1
Change
1 .497a .247 .241 10.36462 .247 47.147 1
a. Predictors: (Constant), core_1500

207
Tables J9 – J10. Multiple regression output for linguistic dimension predictors of
perceptual dimension 1 in the biology sub-corpus.

Model Summary

Model R R Square Adjusted R Std. Error of the Change Statistics


Square Estimate R Square F Change df1
Change
a
1 .562 .315 .306 11.81430 .315 33.626 1
b
2 .625 .390 .373 11.22595 .075 8.852 1
c
3 .660 .435 .411 10.88185 .045 5.625 1

a. Predictors: (Constant), LD1


b. Predictors: (Constant), LD1, LD2
c. Predictors: (Constant), LD1, LD2, LD4

208
Tables J11 – J12. Multiple regression output for linguistic dimension predictors of
perceptual dimension 2 in the biology sub-corpus.

Model Summary

Model R R Square Adjusted R Std. Error of the Change Statistics


Square Estimate R Square F Change df1
Change
a
1 .508 .258 .248 11.82158 .258 25.445 1
b
2 .557 .310 .291 11.47944 .052 5.416 1

a. Predictors: (Constant), LD2


b. Predictors: (Constant), LD2, LD1

209
Tables J13 – J14. Multiple regression output for linguistic dimension predictors of
perceptual dimension 1 in the journal article sub-corpus.

Model Summary

Model R R Square Adjusted R Std. Error of the Change Statistics


Square Estimate R Square F Change df1
Change
a
1 .535 .286 .270 9.25035 .286 18.038 1
b
2 .732 .536 .515 7.53972 .250 23.736 1
c
3 .769 .591 .562 7.16595 .054 5.710 1

a. Predictors: (Constant), LD5


b. Predictors: (Constant), LD5, LD4
c. Predictors: (Constant), LD5, LD4, LD3

210
Tables J15 – J16. Multiple regression output for linguistic dimension predictors of
perceptual dimension 2 in the journal article sub-corpus.

Model Summary

Model R R Square Adjusted R Std. Error of the Change Statistics


Square Estimate R Square F Change df1
Change
a
1 .730 .533 .523 7.18232 .533 51.441 1
b
2 .816 .666 .651 6.14694 .132 17.436 1
c
3 .875 .765 .749 5.21112 .099 18.222 1

a. Predictors: (Constant), LD1


b. Predictors: (Constant), LD1, LD5
c. Predictors: (Constant), LD1, LD5, LD4

211
APPENDIX K. FULL CORRELATION MATRIX OF ALL LINGUISTIC
FEATURES AND DIMENSIONS AND PERCEPTUAL DIMENSIONS AND
ITEMS
LD1 LD2 LD3 LD4 LD5 PD1 PD2
LD1 1 .381
**
.289
**
.161 -.007 .448
**
.401
**

LD2 .381
**
1 .227
**
-.244
**
.096 .317
**
.240
**

LD3 .289
**
.227
**
1 .090 .104 .237
**
.113
LD4 .161 -.244
**
.090 1 -.304
**
.234
**
.287
**

LD5 -.007 .096 .104 -.304


**
1 .151 .100
PD1 -.448
**
-.317
**
-.237
**
-.234
**
-.151 1 .780
**

PD2 -.401
**
-.240
**
-.113 -.287
**
-.100 .780
**
1
phrasal .123 -.028 -.016 .583
**
-.108 -.118 -.172
*

lex_bun .033 .603


**
.040 -.413
**
.294
**
-.073 -.054
length -.185
*
-.022 -.103 -.322
**
.686
**
.023 -.026
vocab_high .766
**
.358
**
.480
**
.362
**
-.038 -.609
**
-.497
**

vocab_mod -.176
*
.001 -.044 -.136 .563
**
-.048 -.029
vocab_acad -.387
**
.103 -.182
*
-.724
**
.484
**
.277
**
.306
**

NN -.764
**
-.359
**
-.286
**
-.226
**
.047 .437
**
.408
**

prv_vb .209
*
.153 .546
**
.186
*
-.125 -.216
**
-.112
pres .333
**
.737
**
.188
*
-.389
**
.016 -.226
**
-.109
pdem .215
**
.525
**
.064 -.098 .056 -.173
*
-.145
emph .513
**
.243
**
.275
**
.065 .047 -.254
**
-.202
*

pro1 .282
**
.168
*
.520
**
-.036 -.100 -.054 .058
it .349
**
.458
**
.247
**
.109 -.121 -.329
**
-.305
**

be_state .205
*
.689
**
.198
*
-.185
*
-.069 -.210
*
-.160
pany .150 .132 .309
**
.184
*
-.199
*
-.109 -.082
amplifr .427
**
.223
**
-.078 -.241
**
-.067 .038 .041
pos_mod .294
**
.743
**
.295
**
-.180
*
-.001 -.243
**
-.204
*

o_and .454
**
.021 -.074 .087 -.107 -.123 -.114
n -.787
**
-.469
**
-.358
**
-.279
**
.103 .449
**
.409
**

prep .064 -.287


**
-.139 .089 .150 .027 -.059
adj_attr .045 -.088 -.145 -.226
**
.624
**
-.045 -.050
pasttnse -.123 -.489
**
.133 .621
**
-.180
*
.013 -.011
pro3 .278
**
-.035 .260
**
.596
**
-.191
*
-.206
*
-.155
pub_vb .058 .043 .457
**
.077 .120 -.107 -.050
rel_obj .209
*
-.111 .041 .065 -.226
**
-.132 -.157
rel_subj -.023 .058 .150 .089 -.056 -.106 -.116
rel_pipe .125 .066 .040 .021 .057 -.007 -.038
n_nom -.027 .186
*
.136 -.169
*
.691
**
-.181
*
-.102
tm_adv .099 -.027 .077 .289
**
-.476
**
-.048 -.015
advs .547
**
.217
**
-.036 .079 -.027 -.255
**
-.267
**

inf .247
**
.274
**
.549
**
.367
**
.114 -.384
**
-.351
**

212
LD1 LD2 LD3 LD4 LD5 PD1 PD2
prd_mod .357** .574** .207* .078 .043 -.238** -.172*
sua_vb .001 -.024 .423** .145 .028 -.152 -.090
spl_aux .076 .231** .012 -.033 -.066 -.031 .034
conjncts .403** .260** .030 -.120 .142 -.235** -.230**
agls_psv -.469** -.015 -.246** -.227** -.159 .304** .293**
sub_othr .279** .086 .114 .211* -.096 -.110 -.092
vcmp .037 .078 .576** -.053 .159 -.030 .017
downtone .174* .099 -.111 -.102 -.074 -.038 -.044
pred_adj .265** .643** .026 -.320** .038 -.073 .001
allconj .544** .128 .022 .126 -.077 -.190* -.132
allwhrel .138 .028 .137 .095 -.103 -.130 -.165*
allpro .409** .078 .535** .482** -.230** -.206* -.106
have .424** .099 .043 -.144 -.002 -.062 -.067
vprogrsv .066 -.109 .000 .513** -.061 -.133 -.172*
that_rel .445** .338** .073 -.022 .047 -.208* -.171*
nonf_vth .028 -.003 .564** .018 .067 -.072 -.060
fact_vth -.034 .077 .382** -.119 .176* .041 .137
lkly_vth .218** .129 .526** .031 .161 -.055 -.044
factadvl .510** .141 .276** .085 .005 -.044 -.084
lklyadvl .196* .112 .097 -.039 -.175* .044 .032
all_jth .182* .639** .098 -.073 .173* -.182* -.186*
all_nth .162 .147 .480** -.075 .044 -.194* -.137
all_th .178* .322** .800** -.068 .232** -.149 -.081
all_jto .174* .702** .072 -.065 .169* -.255** -.187*
all_to .235** .426** .481** .078 .217** -.326** -.294**
all_advl .461** .229** .254** -.033 -.064 -.002 -.064
humann .175* .048 .408** .148 -.161 -.034 .017
prcessn .003 .130 .232** -.148 .604** -.054 .033
cognitn .185* .155 .468** -.139 .139 -.057 .052
abstrcn .154 .199* .175* -.119 .519** -.234** -.163*
concrtn -.164* .162 -.195* -.160 -.138 .075 .132
tccncrtn -.476** .012 -.277** -.207* -.016 .224** .174*
placen -.025 -.144 -.216** .085 -.122 -.032 -.130
topicj .240** .012 .304** -.034 .555** -.210* -.146
actv -.136 -.114 .005 .589** -.142 -.132 -.172*
commv .138 .022 .589** .106 .052 -.114 -.117

213
LD1 LD2 LD3 LD4 LD5 PD1 PD2
mentalv .200
*
.256
**
.623
**
.066 .056 -.238
**
-.099
aspectv .077 -.127 -.184
*
.516
**
-.089 -.118 -.198
*

ineffective -.281 -.181 -.205 -.340


*
-.346
*
.743
**
.345
*

unreadable -.541
**
-.141 -.449
**
-.484
**
-.595
**
.934
**
.814
**

unbiased -.608
**
-.181 -.358
*
-.239 -.610
**
.604
**
.830
**

constrained -.662
**
-.194 -.409
**
-.515
**
-.518
**
.777
**
.898
**

concrete -.496
**
-.195 -.352
*
-.145 -.306
*
.246 .686
**

awkward -.340
*
-.155 -.376
**
-.324
*
-.498
**
.848
**
.584
**

explicit -.630
**
-.326
*
-.409
**
-.248 -.258 .345
*
.718
**

good .069 -.007 .194 .223 .374


**
-.734
**
-.249
opinionated .677
**
.274 .403
**
.321
*
.570
**
-.629
**
-.888
**

dull -.569
**
-.201 -.448
**
-.512
**
-.442
**
.883
**
.801
**

expressive .647
**
.147 .419
**
.470
**
.502
**
-.774
**
-.868
**

interactive .502
**
.085 .449
**
.455
**
.563
**
-.782
**
-.836
**

unemotional -.655
**
-.088 -.335
*
-.642
**
-.422
**
.732
**
.933
**

unrelatable -.600
**
-.166 -.362
*
-.484
**
-.481
**
.824
**
.753
**

distant -.517
**
-.182 -.425
**
-.552
**
-.507
**
.859
**
.815
**

serious -.508
**
-.172 -.154 -.383
**
-.166 .299
*
.611
**

superficial .204 .013 -.028 -.057 -.316


*
.319
*
-.165
monotonous -.588
**
-.265 -.469
**
-.499
**
-.477
**
.846
**
.810
**

engaging .530
**
.139 .403
**
.452
**
.454
**
-.890
**
-.705
**

poorly_organized .151 .124 -.110 -.064 -.125 .476


**
-.043
comprehensible .386
**
.180 .407
**
.368
*
.521
**
-.894
**
-.619
**

unsuccessful -.101 -.063 -.257 -.364


*
-.242 .741
**
.286
not_dense .559
**
.194 .445
**
.448
**
.551
**
-.856
**
-.870
**

formal -.584
**
-.042 -.323
*
-.504
**
-.466
**
.637
**
.880
**

hard_to_follow -.458
**
-.169 -.417
**
-.539
**
-.494
**
.911
**
.728
**

descriptive -.549
**
-.252 -.307
*
-.284 -.221 .248 .533
**

academic -.659
**
-.058 -.236 -.607
**
-.416
**
.656
**
.909
**

clear .256 .062 .409


**
.319
*
.386
**
-.808
**
-.490
**

not_focused .314
*
.166 .171 -.122 .053 .041 -.324
*

not_entertaining -.558
**
-.163 -.421
**
-.500
**
-.436
**
.800
**
.773
**

important -.176 -.090 .022 -.104 .310


*
-.309
*
.086
not_informative .317
*
.024 .153 .148 .039 -.020 -.408
**

useful -.188 -.060 -.073 -.027 .156 -.385


**
.070
old_fashioned .276 .154 .262 .150 .007 -.017 -.277
irrelevant .008 .104 -.063 -.007 -.313
*
.459
**
.067
technical -.715
**
-.137 -.324
*
-.543
**
-.471
**
.656
**
.932
**

impersonal -.674
**
-.126 -.401
**
-.551
**
-.415
**
.739
**
.933
**

detailed -.668
**
-.180 -.411
**
-.469
**
-.323
*
.545
**
.851
**

214
phrasal lex_bun length vocab_high vocab_mod vocab_acad NN
LD1 .123 .033 -.185
*
.766
**
-.176
*
-.387
**
-.764
**

LD2 -.028 .603


**
-.022 .358
**
.001 .103 -.359
**

LD3 -.016 .040 -.103 .480


**
-.044 -.182
*
-.286
**

LD4 .583
**
-.413
**
-.322
**
.362
**
-.136 -.724
**
-.226
**

LD5 -.108 .294


**
.686
**
-.038 .563
**
.484
**
.047
PD1 -.118 -.073 .023 -.609
**
-.048 .277
**
.437
**

PD2 -.172
*
-.054 -.026 -.497
**
-.029 .306
**
.408
**

phrasal 1 -.159 -.104 .193


*
-.071 -.313
**
-.095
lex_bun -.159 1 .271
**
.041 .167
*
.468
**
-.060
length -.104 .271
**
1 -.339
**
.267
**
.507
**
.231
**

vocab_high .193
*
.041 -.339
**
1 -.160 -.473
**
-.711
**

vocab_mod -.071 .167


*
.267
**
-.160 1 .260
**
.200
*

vocab_acad -.313
**
.468
**
.507
**
-.473
**
.260
**
1 .390
**

NN -.095 -.060 .231


**
-.711
**
.200
*
.390
**
1
prv_vb -.033 -.093 -.227
**
.341
**
-.173
*
-.247
**
-.268
**

pres -.092 .417


**
-.028 .233
**
-.041 .146 -.277
**

pdem -.025 .268


**
-.017 .176
*
.008 -.029 -.173
*

emph .137 .095 -.060 .418


**
-.111 -.190
*
-.318
**

pro1 .003 -.027 -.315


**
.341
**
-.136 -.153 -.253
**

it .059 -.015 -.316


**
.473
**
-.099 -.309
**
-.357
**

be_state .031 .296


**
.000 .178
*
-.158 .082 -.204
*

pany .102 -.085 -.287


**
.284
**
-.151 -.205
*
-.168
*

amplifr -.113 .072 -.061 .118 -.122 .008 -.203


*

pos_mod -.097 .462


**
-.047 .278
**
.010 .040 -.319
**

o_and -.005 -.009 -.021 .161 -.126 -.180


*
-.152
n -.132 -.058 .315
**
-.788
**
.204
*
.435
**
.847
**

prep .016 -.155 -.010 .120 .068 -.032 -.193


*

adj_attr -.042 .090 .519


**
-.125 .257
**
.274
**
-.075
pasttnse .150 -.440
**
-.158 .078 -.118 -.404
**
.084
pro3 .232
**
-.172
*
-.265
**
.361
**
-.128 -.482
**
-.280
**

pub_vb .011 .028 .023 .253


**
.047 -.058 -.014
rel_obj -.039 -.147 -.181
*
.130 -.233
**
-.257
**
-.187
*

rel_subj -.020 -.065 -.014 .110 -.067 -.129 -.017


rel_pipe .056 .164
*
-.011 .122 .059 -.010 -.172
*

n_nom -.028 .321


**
.560
**
.053 .353
**
.458
**
.093
tm_adv .172
*
-.162 -.299
**
.241
**
-.170
*
-.314
**
-.120
advs .101 -.074 -.051 .309
**
-.035 -.231
**
-.378
**

inf .237
**
.016 -.042 .469
**
-.024 -.289
**
-.277
**

215
phrasal lex_bun length vocab_high vocab_mod vocab_acad NN
prd_mod .145 .214
**
-.096 .381
**
-.042 -.116 -.350
**

sua_vb .164
*
-.111 .024 .167
*
-.053 -.133 -.008
spl_aux .053 .160 .103 .038 .014 .035 .005
conjncts -.043 .108 .030 .251
**
.045 .043 -.311
**

agls_psv -.161 .171


*
.090 -.460
**
-.013 .306
**
.411
**

sub_othr .172
*
-.087 -.141 .220
**
.030 -.179
*
-.185
*

vcmp -.128 .087 .087 .071 -.036 .065 -.047


downtone -.024 .042 .014 .016 .019 -.058 -.179
*

pred_adj -.087 .294


**
.016 .121 .030 .157 -.210
*

allconj .027 .003 -.026 .262


**
-.071 -.204
*
-.275
**

allwhrel -.002 -.017 -.091 .190


*
-.112 -.198
*
-.179
*

allpro .210
*
-.170
*
-.423
**
.519
**
-.181
*
-.502
**
-.393
**

have -.099 .050 -.086 .218


**
-.040 -.062 -.255
**

vprogrsv .165
*
-.178
*
-.062 .116 .120 -.252
**
-.121
that_rel .037 .163
*
.053 .285
**
-.061 -.111 -.322
**

nonf_vth -.056 -.001 .006 .154 -.022 -.097 -.041


fact_vth -.145 .206
*
.181
*
-.054 .102 .211
*
.080
lkly_vth -.124 .086 .020 .213
**
.008 -.030 -.190
*

factadvl .153 -.002 -.091 .313


**
-.126 -.124 -.265
**

lklyadvl -.006 -.061 -.148 .140 -.055 -.072 -.259


**

all_jth -.012 .367


**
.000 .195
*
.054 .079 -.140
all_nth .035 .014 -.050 .254
**
-.022 -.028 -.142
all_th -.129 .229
**
.079 .259
**
.054 .040 -.146
all_jto .065 .491
**
.073 .190
*
.072 .123 -.180
*

all_to .093 .295


**
.072 .367
**
.042 .001 -.243
**

all_advl .159 .061 -.076 .263


**
-.091 -.063 -.302
**

humann -.065 .004 -.180


*
.231
**
-.141 -.224
**
-.212
*

prcessn -.063 .237


**
.279
**
.088 .257
**
.305
**
.000
cognitn -.099 .027 -.006 .220
**
.035 .004 -.217
**

abstrcn -.009 .101 .098 .242


**
.171
*
.143 -.165
*

concrtn -.138 .200


*
-.186
*
-.219
**
.189
*
.145 .133
tccncrtn -.155 .152 .104 -.463
**
.126 .256
**
.292
**

placen .021 -.071 -.178


*
.041 .129 -.113 .034
topicj -.023 .043 .253
**
.279
**
.171
*
.036 -.168
*

actv .237
**
-.090 -.126 .094 -.011 -.286
**
-.018
commv .007 -.058 -.027 .296
**
-.019 -.168
*
-.090

216
vocab_mo vocab_aca
phrasal lex_bun length vocab_high d d NN
mentalv -.037 .057 -.179
*
.370
**
.023 -.112 -.237
**

aspectv .334
**
-.197
*
-.111 .185
*
-.093 -.255
**
-.117
ineffective -.187 -.132 -.197 -.413
**
-.015 .044 .161
unreadable -.283 -.095 -.500
**
-.736
**
-.082 .240 .339
*

unbiased -.319
*
.052 -.566
**
-.575
**
-.041 .249 .484
**

constrained -.333
*
.015 -.431
**
-.687
**
-.002 .387
**
.474
**

concrete -.099 .054 -.409


**
-.348
*
.044 .310
*
.322
*

awkward -.287 -.115 -.377


**
-.540
**
-.111 .010 .066
explicit -.303
*
-.128 -.303
*
-.532
**
.077 .202 .394
**

good .168 .075 .193 .348


*
.076 .120 -.005
opinionated .297
*
.043 .577
**
.648
**
.008 -.337
*
-.459
**

dull -.310
*
-.148 -.404
**
-.680
**
.113 .351
*
.432
**

expressive .280 -.029 .461


**
.701
**
-.019 -.299
*
-.467
**

interactive .265 .046 .500


**
.611
**
.084 -.200 -.279
unemotional -.400
**
.048 -.330
*
-.731
**
.070 .518
**
.458
**

unrelatable -.286 -.063 -.514


**
-.695
**
-.015 .261 .450
**

distant -.279 -.120 -.409


**
-.650
**
-.057 .330
*
.353
*

serious -.210 .013 -.210 -.401


**
.083 .412
**
.334
*

superficial -.109 -.163 -.173 .071 -.216 -.326


*
-.232
monotonous -.365
*
-.044 -.414
**
-.678
**
.139 .356
*
.399
**

engaging .174 -.002 .463


**
.629
**
-.130 -.232 -.413
**

poorly_organize
.125 -.094 -.005 -.121 -.102 -.279 -.226
d
comprehensible .155 .097 .328
*
.627
**
.087 -.154 -.242
unsuccessful -.098 -.166 -.265 -.338
*
.044 .057 .040
not_dense .259 .067 .516
**
.697
**
.064 -.364
*
-.340
*

formal -.278 .122 -.395


**
-.560
**
-.028 .524
**
.427
**

hard_to_follow -.229 -.079 -.360


*
-.678
**
-.070 .268 .289
*

descriptive -.330
*
-.167 -.218 -.406
**
.026 .226 .284
academic -.365
*
.056 -.371
*
-.649
**
-.008 .507
**
.389
**

clear .140 .010 .291


*
.523
**
.064 -.109 -.073
not_focused .001 -.042 .289
*
.116 -.162 -.094 -.394
**

not_entertaining -.225 -.055 -.367


*
-.635
**
.005 .356
*
.469
**

important -.022 .087 .097 .039 .248 .196 .071


not_informative .062 -.076 .106 .254 .013 -.242 -.212
useful -.079 .061 .019 -.041 .139 .190 .206
old_fashioned .050 -.046 .169 .230 -.212 -.359
*
-.509
**

irrelevant -.059 .060 -.237 -.105 -.230 -.286 -.214


technical -.369
*
.080 -.364
*
-.722
**
.047 .494
**
.487
**

impersonal -.310
*
-.005 -.406
**
-.711
**
.073 .457
**
.458
**

detailed -.382
**
-.020 -.358
*
-.675
**
.150 .495
**
.503
**

217
prv_vb pres pdem emph pro1 it be_state
LD1 .209
*
.333
**
.215
**
.513
**
.282
**
.349
**
.205
*

LD2 .153 .737


**
.525
**
.243
**
.168
*
.458
**
.689
**

LD3 .546
**
.188
*
.064 .275
**
.520
**
.247
**
.198
*

LD4 .186
*
-.389
**
-.098 .065 -.036 .109 -.185
*

LD5 -.125 .016 .056 .047 -.100 -.121 -.069


PD1 -.216
**
-.226
**
-.173
*
-.254
**
-.054 -.329
**
-.210
*

PD2 -.112 -.109 -.145 -.202


*
.058 -.305
**
-.160
phrasal -.033 -.092 -.025 .137 .003 .059 .031
lex_bun -.093 .417
**
.268
**
.095 -.027 -.015 .296
**

length -.227
**
-.028 -.017 -.060 -.315
**
-.316
**
.000
vocab_high .341
**
.233
**
.176
*
.418
**
.341
**
.473
**
.178
*

vocab_mod -.173
*
-.041 .008 -.111 -.136 -.099 -.158
vocab_acad -.247
**
.146 -.029 -.190
*
-.153 -.309
**
.082
NN -.268
**
-.277
**
-.173
*
-.318
**
-.253
**
-.357
**
-.204
*

prv_vb 1 .173
*
.058 .115 .387
**
.164
*
.109
pres .173
*
1 .364
**
.243
**
.259
**
.250
**
.508
**

pdem .058 .364


**
1 .071 .102 .108 .280
**

emph .115 .243


**
.071 1 .241
**
.126 .172
*

pro1 .387
**
.259
**
.102 .241
**
1 .114 .093
it .164
*
.250
**
.108 .126 .114 1 .198
*

be_state .109 .508


**
.280
**
.172
*
.093 .198
*
1
pany .269
**
.090 .017 .000 .229
**
.201
*
.189
*

amplifr -.117 .302


**
.195
*
.122 .087 .030 .151
pos_mod .216
**
.593
**
.362
**
.135 .184
*
.284
**
.607
**

o_and -.047 .098 -.004 .205


*
.032 .019 .044
n -.304
**
-.331
**
-.268
**
-.381
**
-.296
**
-.491
**
-.261
**

prep -.082 -.340


**
-.079 -.083 -.072 -.146 -.110
adj_attr -.235
**
-.064 .003 -.020 -.135 -.116 -.167
*

pasttnse .150 -.779


**
-.186
*
-.081 -.066 -.051 -.343
**

pro3 .262
**
-.121 .035 .090 -.051 .094 -.017
pub_vb .082 .009 -.001 .164
*
.003 .113 .086
rel_obj .089 -.096 -.064 .086 .036 .112 -.060
rel_subj .186
*
.112 .064 .045 -.107 .080 .095
rel_pipe -.038 .022 .014 .006 -.060 .085 .049
n_nom .025 .027 .030 .011 -.046 -.083 .107
tm_adv .120 -.059 -.013 .116 .048 .172
*
.031
advs .045 .268
**
.170
*
.185
*
.038 .215
**
.099
inf .316
**
.038 .193
*
.265
**
.161 .297
**
.159

218
prv_vb pres pdem emph pro1 it be_state
prd_mod .060 .328
**
.128 .348
**
.148 .298
**
.450
**

sua_vb .100 -.044 -.069 .058 .104 .182


*
.055
spl_aux -.009 .222
**
.202
*
.068 -.036 .143 .097
conjncts .033 .175
*
.108 .017 -.033 .176
*
.117
agls_psv -.199
*
.041 .005 -.125 -.193
*
-.222
**
.024
sub_othr .082 -.028 -.080 .056 .103 .247
**
.047
vcmp .331
**
.109 -.039 .114 .182
*
.034 .099
downtone -.076 .193
*
.050 .138 -.012 .050 .114
pred_adj .006 .551
**
.293
**
.001 .127 .199
*
.426
**

allconj .036 .140 .064 .197


*
.095 .100 .074
allwhrel .143 .046 .024 .069 -.089 .145 .065
allpro .452
**
.075 .079 .243
**
.579
**
.144 .059
have .016 .080 .055 .182
*
.126 -.046 .101
vprogrsv .068 -.104 -.164
*
-.040 .007 .085 -.101
that_rel .099 .355
**
.303
**
.189
*
.142 .115 .183
*

nonf_vth .186
*
.042 -.037 .163
*
.088 .080 .062
fact_vth .304
**
.097 -.020 -.010 .112 -.067 -.020
lkly_vth .274
**
.137 .130 .231
**
.357
**
-.037 .077
factadvl .165
*
.132 .041 .341
**
.244
**
.108 .047
lklyadvl .067 .211
*
.077 .060 .073 .071 .149
all_jth .141 .306
**
.274
**
.186
*
.061 .292
**
.272
**

all_nth .099 .078 .043 .090 .111 .211


*
.195
*

all_th .417
**
.228
**
.098 .240
**
.283
**
.150 .208
*

all_jto .136 .341


**
.236
**
.154 .001 .290
**
.321
**

all_to .262
**
.195
*
.182
*
.256
**
.100 .218
**
.242
**

all_advl .144 .249


**
.087 .287
**
.206
*
.082 .234
**

humann .262
**
.088 -.030 .103 .188
*
.062 .051
prcessn -.016 .031 .060 .109 -.004 -.045 .005
cognitn .164
*
.085 .095 .113 .275
**
.151 .144
abstrcn .001 .082 .033 .148 .065 .165
*
.031
concrtn -.060 .209
*
.025 -.093 .080 .001 .044
tccncrtn -.121 .085 -.003 -.180
*
-.072 -.236
**
.070
placen -.092 -.125 -.176
*
.002 -.067 .058 -.110
topicj .145 .005 .131 .252
**
.130 .079 -.111
actv .137 -.171
*
-.095 -.067 -.055 -.043 -.188
*

commv .274
**
.066 -.119 .174
*
.102 .144 .056

219
prv_vb pres pdem emph pro1 it be_state
mentalv .643
**
.187
*
.066 .101 .372
**
.233
**
.227
**

aspectv -.061 -.191


*
.001 .044 -.140 -.003 -.060
ineffective -.346
*
.047 .063 -.043 .032 -.111 -.168
unreadable -.345
*
.057 .017 -.412
**
.216 -.359
*
-.029
unbiased -.049 -.118 -.032 -.343
*
.146 -.462
**
-.104
constrained -.262 -.034 -.070 -.436
**
.188 -.373
**
-.034
concrete -.188 -.167 -.072 -.162 .071 -.469
**
-.072
awkward -.453
**
.060 .012 -.270 .089 -.242 -.164
explicit -.316
*
-.264 -.255 -.351
*
-.145 -.366
*
-.171
good .281 -.120 -.278 .054 .011 .071 .010
opinionated .173 .203 .095 .370
*
-.038 .548
**
.028
dull -.353
*
.084 .019 -.281 .167 -.388
**
-.103
expressive .295
*
-.104 .076 .334
*
-.093 .385
**
.070
interactive .315
*
-.060 .002 .301
*
.001 .184 .011
unemotional -.186 .082 -.016 -.452
**
.098 -.447
**
.042
unrelatable -.194 .015 .011 -.377
**
.281 -.361
*
-.024
distant -.341
*
.074 .065 -.278 .066 -.247 -.120
serious -.066 -.129 .094 -.095 .053 -.243 -.088
superficial -.072 .191 .128 .143 -.004 .281 -.005
monotonous -.371
*
-.049 -.107 -.243 .132 -.445
**
-.139
engaging .390
**
-.057 -.038 .309
*
-.170 .340
*
.071
poorly_organized -.177 .283 .306
*
.205 .182 .125 .133
comprehensible .381
**
-.077 .102 .320
*
-.254 .270 .167
unsuccessful -.261 .169 .092 -.015 .268 .059 -.114
not_dense .356
*
.056 -.011 .452
**
-.168 .402
**
.056
formal -.282 .064 -.121 -.313
*
.106 -.273 .040
hard_to_follow -.306
*
.061 .035 -.400
**
.142 -.279 -.138
descriptive -.076 -.250 -.091 -.384
**
-.166 -.277 -.038
academic -.153 .048 -.048 -.377
**
.075 -.261 .087
clear .306
*
-.065 -.207 .225 -.107 .206 .092
not_focused .133 .337
*
.062 .074 .109 .452
**
.117
not_entertaining -.234 .011 -.018 -.308
*
.080 -.284 -.020
important -.033 -.238 -.198 -.111 -.125 .104 -.081
not_informative .156 .180 -.111 .154 .230 .088 .038
useful .018 -.226 -.157 -.300
*
-.093 -.150 .053
old_fashioned .204 .213 .146 .218 .155 .258 .264
irrelevant -.063 .132 .161 .017 .151 .098 .060
technical -.172 -.013 -.092 -.550
**
.098 -.497
**
-.029
impersonal -.250 .029 -.040 -.460
**
.114 -.418
**
-.016
detailed -.108 -.118 -.072 -.468
**
.017 -.579
**
-.059

220
pany amplifr pos_mod o_and n prep adj_attr
LD1 .150 .427
**
.294
**
.454
**
-.787
**
.064 .045
LD2 .132 .223
**
.743
**
.021 -.469
**
-.287
**
-.088
LD3 .309
**
-.078 .295
**
-.074 -.358
**
-.139 -.145
LD4 .184
*
-.241
**
-.180
*
.087 -.279
**
.089 -.226
**

LD5 -.199
*
-.067 -.001 -.107 .103 .150 .624
**

PD1 -.109 .038 -.243


**
-.123 .449
**
.027 -.045
PD2 -.082 .041 -.204
*
-.114 .409
**
-.059 -.050
phrasal .102 -.113 -.097 -.005 -.132 .016 -.042
lex_bun -.085 .072 .462
**
-.009 -.058 -.155 .090
length -.287
**
-.061 -.047 -.021 .315
**
-.010 .519
**

vocab_high .284
**
.118 .278
**
.161 -.788
**
.120 -.125
vocab_mod -.151 -.122 .010 -.126 .204
*
.068 .257
**

vocab_acad -.205
*
.008 .040 -.180
*
.435
**
-.032 .274
**

NN -.168
*
-.203
*
-.319
**
-.152 .847
**
-.193
*
-.075
prv_vb .269
**
-.117 .216
**
-.047 -.304
**
-.082 -.235
**

pres .090 .302


**
.593
**
.098 -.331
**
-.340
**
-.064
pdem .017 .195
*
.362
**
-.004 -.268
**
-.079 .003
emph .000 .122 .135 .205
*
-.381
**
-.083 -.020
pro1 .229
**
.087 .184
*
.032 -.296
**
-.072 -.135
it .201
*
.030 .284
**
.019 -.491
**
-.146 -.116
be_state .189
*
.151 .607
**
.044 -.261
**
-.110 -.167
*

pany 1 -.066 .123 -.056 -.269


**
.066 -.163
*

amplifr -.066 1 .209


*
.176
*
-.300
**
-.030 .063
pos_mod .123 .209
*
1 .059 -.368
**
-.192
*
-.166
*

o_and -.056 .176


*
.059 1 -.077 -.076 -.129
n -.269
**
-.300
**
-.368
**
-.077 1 -.025 -.009
prep .066 -.030 -.192
*
-.076 -.025 1 .104
adj_attr -.163
*
.063 -.166
*
-.129 -.009 .104 1
pasttnse .121 -.369
**
-.390
**
-.012 .022 .114 -.213
*

pro3 .304
**
.004 .012 .081 -.337
**
.000 -.153
pub_vb .177
*
-.138 .089 -.116 -.085 -.026 -.042
rel_obj .003 .120 -.073 .132 -.197
*
.172
*
.063
rel_subj .173
*
-.075 .049 -.068 -.058 .065 .004
rel_pipe .056 .155 .071 .010 -.115 .375
**
.018
n_nom -.107 -.108 .157 -.050 .095 .102 .200
*

tm_adv -.034 -.107 .016 .038 -.171


*
-.069 -.259
**

advs .160 .163


*
.114 .146 -.475
**
-.016 .147
inf .144 -.090 .216
**
-.095 -.390
**
-.125 -.094

221
pany amplifr pos_mod o_and n prep adj_attr
prd_mod .042 .139 .358
**
-.020 -.421
**
-.106 -.102
sua_vb .235
**
-.107 .032 -.012 .018 -.021 -.098
spl_aux -.015 .094 .130 -.054 -.159 -.322
**
.026
conjncts .108 .107 .159 .057 -.245
**
.253
**
.064
agls_psv -.123 -.101 -.058 .016 .351
**
-.284
**
-.249
**

sub_othr .261
**
.057 .044 .136 -.263
**
-.115 -.162
vcmp .117 -.088 .188
*
-.121 -.095 -.146 -.023
downtone -.020 .216
**
.110 .076 -.166
*
-.108 .072
pred_adj .083 .345
**
.401
**
.047 -.299
**
-.269
**
-.003
allconj -.005 .203
*
.123 .842
**
-.268
**
-.130 -.112
allwhrel .146 .081 .042 .014 -.181
*
.312
**
.038
allpro .387
**
.064 .130 .078 -.472
**
-.038 -.215
**

have -.025 .235


**
.116 .093 -.232
**
.032 .066
vprogrsv .021 -.031 -.082 .082 -.150 .040 -.020
that_rel -.028 .220
**
.251
**
.091 -.316
**
-.182
*
.059
nonf_vth .103 -.086 .098 -.074 -.096 -.061 -.117
fact_vth .081 -.007 .125 -.104 .020 -.233
**
.037
lkly_vth .122 -.065 .130 .022 -.250
**
-.042 .029
factadvl .127 .165
*
.053 .267
**
-.322
**
-.077 .041
lklyadvl .060 .098 .035 -.037 -.251
**
-.011 -.098
all_jth .142 .035 .235
**
-.095 -.233
**
-.178
*
.004
all_nth .053 -.066 .186
*
-.061 -.153 -.091 .051
all_th .211
*
-.092 .299
**
-.125 -.257
**
-.230
**
-.006
all_jto .032 -.068 .394
**
-.004 -.235
**
-.236
**
-.037
all_to .086 -.039 .301
**
-.087 -.328
**
-.188
*
-.024
all_advl .075 .174
*
.119 .173
*
-.338
**
-.082 -.008
humann .211
*
.041 .034 -.003 -.190
*
-.097 -.236
**

prcessn -.116 -.113 .073 -.027 .002 .101 .216


**

cognitn .053 .019 .216


**
-.085 -.173
*
.107 .027
abstrcn -.082 .014 .067 -.067 -.099 .147 .134
concrtn -.117 -.055 .200
*
.053 .117 -.136 -.141
tccncrtn -.072 -.107 .037 -.181
*
.319
**
-.157 .046
placen -.084 -.028 -.058 -.020 .012 .216
**
-.006
topicj -.078 -.096 -.086 -.043 -.176
*
.120 .371
**

actv .012 -.351


**
-.051 -.035 -.004 -.122 -.164
*

commv .189
*
-.073 .116 .019 -.121 -.010 -.132

222
pany amplifr pos_mod o_and n prep adj_attr
mentalv .294
**
-.091 .287
**
-.046 -.347
**
-.115 -.136
aspectv .001 -.125 -.095 .069 -.120 .283
**
-.066
ineffective -.020 .144 -.146 -.113 .295
*
.082 -.136
unreadable -.025 .200 -.070 -.240 .401
**
-.272 -.325
*

unbiased -.129 .041 -.132 -.098 .466


**
-.406
**
-.545
**

constrained -.075 .050 -.202 -.223 .591


**
-.272 -.367
*

concrete .024 -.268 -.198 -.114 .353


*
-.261 -.297
*

awkward .011 .230 -.107 -.254 .199 -.140 -.173


explicit -.125 -.213 -.251 -.286 .402
**
-.278 -.201
good .059 -.283 -.069 -.043 -.155 -.049 .212
opinionated .056 .023 .131 .207 -.480
**
.310
*
.372
*

dull -.038 .150 -.156 -.232 .498


**
-.268 -.241
expressive .101 -.039 .170 .223 -.559
**
.419
**
.339
*

interactive .107 -.218 .063 .281 -.318


*
.409
**
.241
unemotional -.092 .162 -.039 -.331
*
.476
**
-.396
**
-.179
unrelatable -.110 .096 -.102 -.284 .478
**
-.206 -.261
distant -.087 .083 -.112 -.215 .467
**
-.265 -.233
serious -.216 -.183 -.151 -.358
*
.356
*
-.150 -.167
superficial .013 .403
**
.003 .034 -.117 .283 -.209
monotonous -.061 .058 -.210 -.181 .476
**
-.366
*
-.242
engaging .094 -.148 .142 .163 -.444
**
.277 .292
*

poorly_organized -.011 .236 .121 -.034 -.063 .131 -.005


comprehensible .035 -.126 .140 .127 -.362
*
.262 .342
*

unsuccessful -.108 .281 -.042 -.086 .186 .119 -.153


not_dense .096 -.119 .152 .192 -.442
**
.248 .292
*

formal -.032 .066 -.097 -.274 .442


**
-.442
**
-.354
*

hard_to_follow .003 .253 -.182 -.254 .396


**
-.239 -.237
descriptive -.038 -.082 -.250 -.409
**
.263 -.164 -.052
academic -.043 .120 -.027 -.437
**
.420
**
-.339
*
-.162
clear .008 -.225 .031 .166 -.224 .021 .097
not_focused -.078 .289
*
-.010 .037 -.290
*
.193 .108
not_entertaining -.074 .093 -.163 -.313
*
.541
**
-.184 -.280
important -.060 -.254 -.239 -.055 .080 -.105 .194
not_informative .083 .237 .061 .236 -.189 .143 -.057
useful .019 -.253 -.157 -.027 .099 -.222 .130
old_fashioned .250 .019 .159 -.012 -.355
*
.291
*
.047
irrelevant -.075 .246 .066 -.139 -.069 .104 -.180
technical -.019 .086 -.133 -.376
**
.504
**
-.503
**
-.273
impersonal -.031 .110 -.099 -.337
*
.486
**
-.390
**
-.189
detailed -.091 -.016 -.153 -.332
*
.485
**
-.349
*
-.154

223
pasttnse pro3 pub_vb rel_obj rel_subj rel_pipe n_nom
LD1 -.123 .278
**
.058 .209
*
-.023 .125 -.027
LD2 -.489
**
-.035 .043 -.111 .058 .066 .186
*

LD3 .133 .260


**
.457
**
.041 .150 .040 .136
LD4 .621
**
.596
**
.077 .065 .089 .021 -.169
*

LD5 -.180
*
-.191
*
.120 -.226
**
-.056 .057 .691
**

PD1 .013 -.206


*
-.107 -.132 -.106 -.007 -.181
*

PD2 -.011 -.155 -.050 -.157 -.116 -.038 -.102


phrasal .150 .232
**
.011 -.039 -.020 .056 -.028
lex_bun -.440
**
-.172
*
.028 -.147 -.065 .164
*
.321
**

length -.158 -.265


**
.023 -.181
*
-.014 -.011 .560
**

vocab_high .078 .361


**
.253
**
.130 .110 .122 .053
vocab_mod -.118 -.128 .047 -.233
**
-.067 .059 .353
**

vocab_acad -.404
**
-.482
**
-.058 -.257
**
-.129 -.010 .458
**

NN .084 -.280
**
-.014 -.187
*
-.017 -.172
*
.093
prv_vb .150 .262
**
.082 .089 .186
*
-.038 .025
pres -.779
**
-.121 .009 -.096 .112 .022 .027
pdem -.186
*
.035 -.001 -.064 .064 .014 .030
emph -.081 .090 .164
*
.086 .045 .006 .011
pro1 -.066 -.051 .003 .036 -.107 -.060 -.046
it -.051 .094 .113 .112 .080 .085 -.083
be_state -.343
**
-.017 .086 -.060 .095 .049 .107
pany .121 .304
**
.177
*
.003 .173
*
.056 -.107
amplifr -.369
**
.004 -.138 .120 -.075 .155 -.108
pos_mod -.390
**
.012 .089 -.073 .049 .071 .157
o_and -.012 .081 -.116 .132 -.068 .010 -.050
n .022 -.337
**
-.085 -.197
*
-.058 -.115 .095
prep .114 .000 -.026 .172
*
.065 .375
**
.102
adj_attr -.213
*
-.153 -.042 .063 .004 .018 .200
*

pasttnse 1 .371
**
.098 .094 .057 -.155 -.061
pro3 .371
**
1 .160 .204
*
.125 .101 -.088
pub_vb .098 .160 1 .029 .130 .119 .118
rel_obj .094 .204
*
.029 1 .173
*
.062 -.258
**

rel_subj .057 .125 .130 .173


*
1 -.037 -.053
rel_pipe -.155 .101 .119 .062 -.037 1 -.059
n_nom -.061 -.088 .118 -.258
**
-.053 -.059 1
tm_adv .189
*
.126 .065 .172
*
.181
*
-.047 -.133
advs -.224
**
.114 -.081 .045 -.054 .065 -.106
inf .244
**
.294
**
.369
**
.034 .150 .029 .177
*

224
pasttnse pro3 pub_vb rel_obj rel_subj rel_pipe n_nom
prd_mod -.148 .155 .131 -.001 -.050 .057 .133
sua_vb .132 .100 .223
**
-.001 .140 .016 .079
spl_aux -.121 .042 .076 -.135 -.039 .059 -.042
conjncts -.184
*
-.010 -.017 .049 -.051 .163
*
.145
agls_psv .002 -.289
**
-.177
*
-.144 -.058 -.099 -.060
sub_othr .175
*
.174
*
.002 -.023 -.130 -.101 -.037
vcmp .076 .110 .407
**
-.084 .036 -.007 .136
downtone -.250
**
-.150 -.096 -.148 -.006 -.043 -.148
pred_adj -.393
**
-.122 -.141 -.065 .085 -.114 .058
allconj .027 .198
*
-.095 .139 -.077 -.047 -.028
allwhrel -.004 .221
**
.161 .571
**
.723
**
.525
**
-.174
*

allpro .251
**
.776
**
.138 .182
*
.031 .052 -.104
have -.024 -.030 -.116 .045 -.084 -.109 -.057
vprogrsv .109 .176
*
.034 -.084 -.037 .099 -.024
that_rel -.237
**
-.004 .026 -.042 -.156 .006 .017
nonf_vth .110 .113 .515
**
.037 .237
**
.005 .038
fact_vth .055 .028 .029 -.125 .014 -.014 .177
*

lkly_vth .062 .204


*
.222
**
-.023 -.045 .035 .083
factadvl -.028 .126 .098 .119 .001 .088 -.026
lklyadvl -.194
*
.080 .113 -.025 -.058 .075 -.214
**

all_jth -.163
*
-.043 -.045 -.128 -.003 .067 .151
all_nth -.032 -.003 .318
**
.092 .073 .101 .030
all_th .037 .131 .402
**
-.055 .111 .040 .189
*

all_jto -.195
*
-.039 .006 -.176
*
.003 -.003 .266
**

all_to .043 .182


*
.337
**
.017 .095 .046 .319
**

all_advl -.205
*
.065 .096 .048 -.075 .088 -.078
humann .101 .439
**
-.021 .010 .098 -.004 -.062
prcessn -.054 -.146 .164
*
-.194
*
-.037 .103 .396
**

cognitn -.081 -.015 .136 .035 -.029 .111 .124


abstrcn -.099 -.086 .128 -.145 -.046 .068 .366
**

concrtn -.219
**
-.175
*
-.196
*
-.050 -.090 -.139 -.098
tccncrtn -.236
**
-.256
**
-.180
*
-.224
**
-.102 -.095 -.069
placen -.009 -.082 -.047 .165
*
-.020 .041 -.132
topicj .040 .082 .187
*
.054 .132 .038 .257
**

actv .350
**
.208
*
-.108 -.125 .084 -.170
*
.028
commv .095 .169
*
.570
**
.038 .175
*
.045 .054

225
pasttnse pro3 pub_vb rel_obj rel_subj rel_pipe n_nom
mentalv .124 .177
*
.167
*
.092 .126 -.039 .098
aspectv .181
*
.019 .067 -.036 .029 .148 -.062
ineffective -.170 -.302
*
-.112 .006 -.161 .022 -.469
**

unreadable -.163 -.467


**
-.293
*
-.146 -.240 -.083 -.612
**

unbiased .135 -.399


**
-.287 -.177 -.194 -.085 -.331
*

constrained -.105 -.552


**
-.287 -.330
*
-.196 -.066 -.411
**

concrete .222 -.254 -.259 -.406


**
-.216 -.311
*
-.165
awkward -.220 -.297
*
-.264 -.103 -.218 .039 -.645
**

explicit .238 -.397


**
-.226 -.282 -.013 -.309
*
-.173
good .228 .124 .081 -.063 .206 -.259 .570
**

opinionated -.138 .458


**
.383
**
.310
*
.215 .167 .448
**

dull -.260 -.528


**
-.320
*
-.339
*
-.233 -.070 -.540
**

expressive .171 .586


**
.304
*
.328
*
.184 .112 .526
**

interactive .148 .474


**
.188 .104 .110 .194 .524
**

unemotional -.223 -.631


**
-.199 -.219 -.181 -.122 -.423
**

unrelatable -.144 -.544


**
-.279 -.274 -.238 .044 -.562
**

distant -.190 -.634


**
-.254 -.122 -.205 -.109 -.549
**

serious .037 -.347


*
-.197 -.249 -.048 -.056 -.067
superficial -.230 .017 -.028 .215 .004 .451
**
-.394
**

monotonous -.124 -.556


**
-.174 -.253 -.384
**
-.141 -.562
**

engaging .207 .501


**
.179 .236 .329
*
.041 .580
**

poorly_organized -.356
*
.032 .116 .018 .011 .094 -.348
*

comprehensible .149 .418


**
.165 .095 .365
*
.037 .539
**

unsuccessful -.338
*
-.347
*
-.070 .016 -.169 .154 -.418
**

not_dense .098 .536


**
.370
*
.309
*
.235 .092 .516
**

formal -.131 -.483


**
-.173 -.221 -.248 -.202 -.283
hard_to_follow -.235 -.525
**
-.235 -.018 -.166 .059 -.597
**

descriptive .099 -.288


*
-.223 -.001 .106 -.369
*
-.138
academic -.201 -.620
**
-.179 -.172 -.116 -.104 -.413
**

clear .272 .314


*
.197 .026 .298
*
-.126 .580
**

not_focused -.395
**
.074 .191 .344
*
.128 .302
*
-.104
not_entertaining -.245 -.541
**
-.237 -.085 -.195 .024 -.383
**

important .170 -.120 .035 -.033 .065 -.149 .351


*

not_informative -.150 .099 .203 .262 -.037 .032 .037


useful .190 -.033 -.088 -.085 .114 -.280 .276
old_fashioned -.279 .260 .119 .146 .037 .421
**
-.225
irrelevant -.204 .034 -.041 .025 -.120 .310
*
-.402
**

technical -.080 -.542


**
-.186 -.281 -.209 -.282 -.354
*

impersonal -.166 -.608


**
-.236 -.206 -.138 -.198 -.432
**

detailed .004 -.442


**
-.329
*
-.196 -.098 -.269 -.237

226
tm_adv advs inf prd_mod sua_vb spl_aux conjncts
LD1 .099 .547
**
.247
**
.357
**
.001 .076 .403
**

LD2 -.027 .217


**
.274
**
.574
**
-.024 .231
**
.260
**

LD3 .077 -.036 .549


**
.207
*
.423
**
.012 .030
LD4 .289
**
.079 .367
**
.078 .145 -.033 -.120
LD5 -.476
**
-.027 .114 .043 .028 -.066 .142
PD1 -.048 -.255
**
-.384
**
-.238
**
-.152 -.031 -.235
**

PD2 -.015 -.267


**
-.351
**
-.172
*
-.090 .034 -.230
**

phrasal .172
*
.101 .237
**
.145 .164
*
.053 -.043
lex_bun -.162 -.074 .016 .214
**
-.111 .160 .108
length -.299
**
-.051 -.042 -.096 .024 .103 .030
vocab_high .241
**
.309
**
.469
**
.381
**
.167
*
.038 .251
**

vocab_mod -.170
*
-.035 -.024 -.042 -.053 .014 .045
vocab_acad -.314
**
-.231
**
-.289
**
-.116 -.133 .035 .043
NN -.120 -.378
**
-.277
**
-.350
**
-.008 .005 -.311
**

prv_vb .120 .045 .316


**
.060 .100 -.009 .033
pres -.059 .268
**
.038 .328
**
-.044 .222
**
.175
*

pdem -.013 .170


*
.193
*
.128 -.069 .202
*
.108
emph .116 .185
*
.265
**
.348
**
.058 .068 .017
pro1 .048 .038 .161 .148 .104 -.036 -.033
it .172
*
.215
**
.297
**
.298
**
.182
*
.143 .176
*

be_state .031 .099 .159 .450


**
.055 .097 .117
pany -.034 .160 .144 .042 .235
**
-.015 .108
amplifr -.107 .163
*
-.090 .139 -.107 .094 .107
pos_mod .016 .114 .216
**
.358
**
.032 .130 .159
o_and .038 .146 -.095 -.020 -.012 -.054 .057
n -.171
*
-.475
**
-.390
**
-.421
**
.018 -.159 -.245
**

prep -.069 -.016 -.125 -.106 -.021 -.322


**
.253
**

adj_attr -.259
**
.147 -.094 -.102 -.098 .026 .064
pasttnse .189
*
-.224
**
.244
**
-.148 .132 -.121 -.184
*

pro3 .126 .114 .294


**
.155 .100 .042 -.010
pub_vb .065 -.081 .369
**
.131 .223
**
.076 -.017
rel_obj .172
*
.045 .034 -.001 -.001 -.135 .049
rel_subj .181
*
-.054 .150 -.050 .140 -.039 -.051
rel_pipe -.047 .065 .029 .057 .016 .059 .163
*

n_nom -.133 -.106 .177


*
.133 .079 -.042 .145
tm_adv 1 .019 .133 .087 .113 .165
*
.009
advs .019 1 .068 .094 -.151 .278
**
.361
**

inf .133 .068 1 .208


*
.285
**
.180
*
.005

227
tm_adv advs inf prd_mod sua_vb spl_aux conjncts
prd_mod .087 .094 .208
*
1 -.042 .161 .144
sua_vb .113 -.151 .285
**
-.042 1 -.100 -.091
spl_aux .165
*
.278
**
.180
*
.161 -.100 1 .027
conjncts .009 .361
**
.005 .144 -.091 .027 1
agls_psv .045 -.214
**
-.182
*
-.129 -.211
*
.226
**
-.217
**

sub_othr -.004 .178


*
.077 .118 .009 .028 .061
vcmp -.111 -.104 .160 .078 .191
*
-.008 .070
downtone -.042 .235
**
-.168
*
-.020 -.075 -.035 -.055
pred_adj -.148 .261
**
.007 .245
**
-.098 .085 .210
*

allconj .010 .257


**
-.035 .100 -.104 .031 .059
allwhrel .167
*
.017 .131 -.004 .102 -.050 .074
allpro .159 .124 .343
**
.233
**
.147 .017 -.027
have -.002 .110 -.043 .078 -.073 -.094 .104
vprogrsv .041 .097 .207
*
.020 .022 .067 -.069
that_rel .013 .165
*
.111 .228
**
-.046 .220
**
.105
nonf_vth .016 -.155 .263
**
-.025 .328
**
-.055 -.012
fact_vth -.093 -.091 .007 .020 -.001 .066 .039
lkly_vth -.136 .054 .241
**
.157 .056 .105 .065
factadvl .114 .240
**
.108 .157 -.014 .041 .092
lklyadvl .109 .270
**
.005 .117 -.061 .128 .065
all_jth -.030 .111 .307
**
.307
**
-.022 .117 .246
**

all_nth .073 .025 .141 .177


*
.091 -.005 .061
all_th -.066 -.035 .366
**
.198
*
.189
*
.082 .108
all_jto -.059 .118 .292
**
.298
**
-.029 .143 .201
*

all_to -.024 -.006 .665


**
.227
**
.160 .118 .091
all_advl .106 .302
**
.048 .187
*
-.048 .110 .055
humann .077 .061 .145 .086 .071 -.058 .034
prcessn -.104 -.236
**
.192
*
.157 .083 -.140 -.002
cognitn .088 -.029 .050 .235
**
.066 -.017 .146
abstrcn -.187
*
.043 .195
*
.156 .112 -.136 .242
**

concrtn -.025 -.010 -.118 .036 -.184


*
.118 -.071
tccncrtn -.051 -.112 -.238
**
-.130 -.179
*
.008 -.094
placen .059 .106 -.110 -.082 .018 -.059 .101
topicj -.093 .126 .255
**
.072 .099 .040 .150
actv .188
*
-.034 .189
*
-.055 .040 -.073 -.205
*

commv .085 -.053 .398


**
.045 .319
**
-.066 -.036

228
tm_adv advs inf prd_mod sua_vb spl_aux conjncts
mentalv .025 .036 .357
**
.133 .064 .047 -.016
aspectv .162 .044 .066 .094 .012 -.063 .057
ineffective .011 .018 -.283 -.017 -.080 -.204 -.102
unreadable .188 -.280 -.555
**
.007 -.148 -.089 -.333
*

unbiased .338
*
-.506
**
-.347
*
-.049 -.235 -.022 -.430
**

constrained .383
**
-.563
**
-.630
**
-.085 -.036 -.131 -.456
**

concrete .189 -.629


**
-.325
*
-.203 -.116 -.027 -.619
**

awkward .019 -.011 -.361


*
.005 -.130 -.015 -.090
explicit .247 -.428
**
-.331
*
-.131 -.093 -.184 -.560
**

good .060 -.133 .295


*
-.156 .116 .091 -.197
opinionated -.277 .507
**
.413
**
.176 .153 .152 .480
**

dull .182 -.313


*
-.598
**
-.106 -.199 -.045 -.294
*

expressive -.171 .431


**
.469
**
.135 .228 .048 .488
**

interactive -.233 .241 .448


**
-.017 .186 .136 .390
**

unemotional .233 -.490


**
-.487
**
-.107 -.277 -.051 -.410
**

unrelatable .202 -.330


*
-.500
**
-.086 -.104 -.045 -.322
*

distant .210 -.300


*
-.526
**
-.080 -.224 -.114 -.320
*

serious .130 -.436


**
-.206 -.123 -.167 -.088 -.375
**

superficial -.030 .270 -.062 .039 .159 .013 .282


monotonous .213 -.342
*
-.586
**
-.209 -.319
*
-.085 -.417
**

engaging -.094 .230 .451


**
.164 .234 .020 .325
*

poorly_organized -.241 .271 -.173 .179 -.152 -.030 .095


comprehensible -.205 .127 .402
**
.036 .108 .067 .194
unsuccessful -.045 .060 -.276 .148 -.109 -.104 .026
not_dense -.247 .303
*
.529
**
.116 .114 .060 .377
**

formal .381
**
-.522
**
-.411
**
-.022 -.045 .017 -.540
**

hard_to_follow .077 -.226 -.451


**
-.125 -.115 -.142 -.181
descriptive .197 -.250 -.297
*
-.130 -.113 -.194 -.357
*

academic .275 -.550


**
-.382
**
-.042 -.001 -.090 -.424
**

clear .142 .053 .472


**
-.087 .268 .056 .101
not_focused -.232 .332
*
.077 .064 -.006 .073 .230
not_entertaining .241 -.347
*
-.377
**
-.054 -.132 -.111 -.248
important .175 -.283 .166 -.254 .176 -.161 -.320
*

not_informative -.066 .272 .106 .032 -.093 -.012 .235


useful .255 -.151 .091 -.316
*
.080 .034 -.233
old_fashioned -.114 .189 -.013 .114 .234 -.073 .303
*

irrelevant -.083 .135 -.228 .258 .021 -.019 .053


technical .275 -.489
**
-.437
**
-.102 -.178 .023 -.483
**

impersonal .235 -.525


**
-.458
**
-.116 -.182 -.138 -.496
**

detailed .170 -.469


**
-.362
*
-.050 -.366
*
-.089 -.434
**

229
agls_psv sub_othr vcmp downtone pred_adj allconj allwhrel
LD1 -.469
**
.279
**
.037 .174
*
.265
**
.544
**
.138
LD2 -.015 .086 .078 .099 .643
**
.128 .028
LD3 -.246
**
.114 .576
**
-.111 .026 .022 .137
LD4 -.227
**
.211
*
-.053 -.102 -.320
**
.126 .095
LD5 -.159 -.096 .159 -.074 .038 -.077 -.103
PD1 .304
**
-.110 -.030 -.038 -.073 -.190
*
-.130
PD2 .293
**
-.092 .017 -.044 .001 -.132 -.165
*

phrasal -.161 .172


*
-.128 -.024 -.087 .027 -.002
lex_bun .171
*
-.087 .087 .042 .294
**
.003 -.017
length .090 -.141 .087 .014 .016 -.026 -.091
vocab_high -.460
**
.220
**
.071 .016 .121 .262
**
.190
*

vocab_mod -.013 .030 -.036 .019 .030 -.071 -.112


vocab_acad .306
**
-.179
*
.065 -.058 .157 -.204
*
-.198
*

NN .411
**
-.185
*
-.047 -.179
*
-.210
*
-.275
**
-.179
*

prv_vb -.199
*
.082 .331
**
-.076 .006 .036 .143
pres .041 -.028 .109 .193
*
.551
**
.140 .046
pdem .005 -.080 -.039 .050 .293
**
.064 .024
emph -.125 .056 .114 .138 .001 .197
*
.069
pro1 -.193
*
.103 .182
*
-.012 .127 .095 -.089
it -.222
**
.247
**
.034 .050 .199
*
.100 .145
be_state .024 .047 .099 .114 .426
**
.074 .065
pany -.123 .261
**
.117 -.020 .083 -.005 .146
amplifr -.101 .057 -.088 .216
**
.345
**
.203
*
.081
pos_mod -.058 .044 .188
*
.110 .401
**
.123 .042
o_and .016 .136 -.121 .076 .047 .842
**
.014
n .351
**
-.263
**
-.095 -.166
*
-.299
**
-.268
**
-.181
*

prep -.284
**
-.115 -.146 -.108 -.269
**
-.130 .312
**

adj_attr -.249
**
-.162 -.023 .072 -.003 -.112 .038
pasttnse .002 .175
*
.076 -.250
**
-.393
**
.027 -.004
pro3 -.289
**
.174
*
.110 -.150 -.122 .198
*
.221
**

pub_vb -.177
*
.002 .407
**
-.096 -.141 -.095 .161
rel_obj -.144 -.023 -.084 -.148 -.065 .139 .571
**

rel_subj -.058 -.130 .036 -.006 .085 -.077 .723


**

rel_pipe -.099 -.101 -.007 -.043 -.114 -.047 .525


**

n_nom -.060 -.037 .136 -.148 .058 -.028 -.174


*

tm_adv .045 -.004 -.111 -.042 -.148 .010 .167


*

advs -.214
**
.178
*
-.104 .235
**
.261
**
.257
**
.017
inf -.182
*
.077 .160 -.168
*
.007 -.035 .131

230
agls_psv sub_othr vcmp downtone pred_adj allconj allwhrel
prd_mod -.129 .118 .078 -.020 .245
**
.100 -.004
sua_vb -.211
*
.009 .191
*
-.075 -.098 -.104 .102
spl_aux .226
**
.028 -.008 -.035 .085 .031 -.050
conjncts -.217
**
.061 .070 -.055 .210
*
.059 .074
agls_psv 1 -.036 -.149 -.041 .038 -.043 -.150
sub_othr -.036 1 .069 .159 .126 .442
**
-.150
vcmp -.149 .069 1 -.132 -.042 -.087 -.014
downtone -.041 .159 -.132 1 .123 .109 -.089
pred_adj .038 .126 -.042 .123 1 .192
*
-.030
allconj -.043 .442
**
-.087 .109 .192
*
1 -.019
allwhrel -.150 -.150 -.014 -.089 -.030 -.019 1
allpro -.358
**
.190
*
.194
*
-.114 -.023 .208
*
.122
have -.146 .079 .009 .210
*
.105 .085 -.092
vprogrsv -.199
*
.108 -.068 .075 -.134 .078 -.010
that_rel -.090 .059 -.041 .107 .176
*
.108 -.119
nonf_vth -.054 -.017 .594
**
-.103 -.140 -.047 .176
*

fact_vth .000 .067 .631


**
-.114 .011 -.053 -.050
lkly_vth -.062 .110 .392
**
-.020 .065 .099 -.022
factadvl -.170
*
.053 .045 -.027 .093 .279
**
.095
lklyadvl -.031 .081 .092 .217
**
.147 .075 -.010
all_jth -.002 .062 .010 -.068 .284
**
-.048 -.019
all_nth -.103 .098 .168
*
.123 .086 .026 .139
all_th -.076 .096 .738
**
-.088 .082 -.021 .072
all_jto .032 .099 -.031 .035 .370
**
.067 -.072
all_to -.085 .014 .123 -.080 .177
*
-.003 .094
all_advl -.097 .075 .066 .205
*
.223
**
.246
**
.015
humann -.086 .141 .099 -.020 .113 .099 .066
prcessn -.069 -.048 .181
*
-.157 -.050 -.045 -.054
cognitn -.146 -.081 .140 -.089 -.034 -.044 .052
abstrcn -.196
*
-.028 .133 -.118 .027 -.068 -.057
concrtn .245
**
.052 -.126 .013 .157 .057 -.152
tccncrtn .216
**
-.273
**
-.116 .102 .025 -.230
**
-.212
*

placen -.039 .045 -.179


*
-.092 -.099 -.083 .078
topicj -.192
*
-.076 .160 -.069 -.047 -.001 .131
actv .158 .061 -.038 -.131 -.184
*
-.070 -.087
commv -.220
**
.003 .366
**
-.120 -.109 .033 .154

231
agls_psv sub_othr vcmp downtone pred_adj allconj allwhrel
mentalv -.110 .118 .246
**
-.017 .112 .057 .103
aspectv -.151 .009 -.104 -.005 -.244
**
.058 .079
ineffective .247 -.167 -.159 .211 -.119 -.161 -.089
unreadable .433
**
-.058 -.096 .217 .102 -.277 -.256
unbiased .621
**
-.109 -.136 -.054 -.024 -.217 -.238
constrained .462
**
.029 -.134 .092 .030 -.292
*
-.283
concrete .476
**
.070 -.156 .102 .064 -.180 -.470
**

awkward .310
*
-.155 -.137 .373
**
.099 -.260 -.153
explicit .440
**
-.017 -.097 .191 .102 -.283 -.292
*

good -.137 .174 .103 -.162 .130 .004 -.044


opinionated -.610
**
.148 .130 -.109 -.034 .332
*
.348
*

dull .460
**
.060 -.195 .144 .121 -.243 -.313
*

expressive -.591
**
.032 .236 -.082 -.164 .301
*
.300
*

interactive -.461
**
.007 .105 -.179 -.256 .282 .223
unemotional .428
**
.048 .080 .172 .191 -.349
*
-.267
unrelatable .404
**
-.116 -.131 .112 .046 -.354
*
-.224
distant .422
**
.063 -.129 .162 .088 -.196 -.241
serious .279 -.078 -.021 .140 .027 -.369
*
-.151
superficial .020 -.253 -.134 .133 -.153 -.095 .348
*

monotonous .509
**
.044 -.180 .207 .072 -.172 -.422
**

engaging -.447
**
.057 .205 -.165 -.117 .221 .320
*

poorly_organized -.144 -.083 -.059 .294


*
-.100 -.042 .070
comprehensible -.336
*
-.082 .115 -.162 .000 .140 .291
*

unsuccessful .079 .012 -.238 .133 -.041 -.105 -.010


not_dense -.419
**
.039 .138 -.272 -.105 .268 .316
*

formal .465
**
.205 -.096 .018 .175 -.247 -.358
*

hard_to_follow .296
*
-.087 -.086 .174 .166 -.291
*
-.078
descriptive .216 -.030 -.090 .335
*
.106 -.311
*
-.153
academic .396
**
.034 .109 .175 .209 -.441
**
-.197
clear -.152 .251 .120 -.317
*
.009 .234 .126
not_focused -.238 -.002 .108 .221 -.008 .036 .385
**

not_entertaining .308
*
.003 -.158 .005 .071 -.376
**
-.142
important -.049 .186 .011 -.005 .111 .022 -.059
not_informative -.161 .046 .045 -.126 -.111 .255 .086
useful -.023 .088 .021 .034 .149 .012 -.124
old_fashioned -.211 .071 .114 .362
*
-.181 .053 .325
*

irrelevant .054 -.250 -.163 .202 -.096 -.159 .118


technical .459
**
.070 .037 .209 .195 -.387
**
-.403
**

impersonal .445
**
.057 -.047 .178 .207 -.351
*
-.280
detailed .402
**
-.015 -.106 .068 .210 -.324
*
-.293
*

232
allpro have vprogrsv that_rel nonf_vth fact_vth lkly_vth
LD1 .409
**
.424
**
.066 .445
**
.028 -.034 .218
**

LD2 .078 .099 -.109 .338


**
-.003 .077 .129
LD3 .535
**
.043 .000 .073 .564
**
.382
**
.526
**

LD4 .482
**
-.144 .513
**
-.022 .018 -.119 .031
LD5 -.230
**
-.002 -.061 .047 .067 .176
*
.161
PD1 -.206
*
-.062 -.133 -.208
*
-.072 .041 -.055
PD2 -.106 -.067 -.172
*
-.171
*
-.060 .137 -.044
phrasal .210
*
-.099 .165
*
.037 -.056 -.145 -.124
lex_bun -.170
*
.050 -.178
*
.163
*
-.001 .206
*
.086
length -.423
**
-.086 -.062 .053 .006 .181
*
.020
vocab_high .519
**
.218
**
.116 .285
**
.154 -.054 .213
**

vocab_mod -.181
*
-.040 .120 -.061 -.022 .102 .008
vocab_acad -.502
**
-.062 -.252
**
-.111 -.097 .211
*
-.030
NN -.393
**
-.255
**
-.121 -.322
**
-.041 .080 -.190
*

prv_vb .452
**
.016 .068 .099 .186
*
.304
**
.274
**

pres .075 .080 -.104 .355


**
.042 .097 .137
pdem .079 .055 -.164
*
.303
**
-.037 -.020 .130
emph .243
**
.182
*
-.040 .189
*
.163
*
-.010 .231
**

pro1 .579
**
.126 .007 .142 .088 .112 .357
**

it .144 -.046 .085 .115 .080 -.067 -.037


be_state .059 .101 -.101 .183
*
.062 -.020 .077
pany .387
**
-.025 .021 -.028 .103 .081 .122
amplifr .064 .235
**
-.031 .220
**
-.086 -.007 -.065
pos_mod .130 .116 -.082 .251
**
.098 .125 .130
o_and .078 .093 .082 .091 -.074 -.104 .022
n -.472
**
-.232
**
-.150 -.316
**
-.096 .020 -.250
**

prep -.038 .032 .040 -.182


*
-.061 -.233
**
-.042
adj_attr -.215
**
.066 -.020 .059 -.117 .037 .029
pasttnse .251
**
-.024 .109 -.237
**
.110 .055 .062
pro3 .776
**
-.030 .176
*
-.004 .113 .028 .204
*

pub_vb .138 -.116 .034 .026 .515


**
.029 .222
**

rel_obj .182
*
.045 -.084 -.042 .037 -.125 -.023
rel_subj .031 -.084 -.037 -.156 .237
**
.014 -.045
rel_pipe .052 -.109 .099 .006 .005 -.014 .035
n_nom -.104 -.057 -.024 .017 .038 .177
*
.083
tm_adv .159 -.002 .041 .013 .016 -.093 -.136
advs .124 .110 .097 .165
*
-.155 -.091 .054
inf .343
**
-.043 .207
*
.111 .263
**
.007 .241
**

233
allpro have vprogrsv that_rel nonf_vth fact_vth lkly_vth
prd_mod .233
**
.078 .020 .228
**
-.025 .020 .157
sua_vb .147 -.073 .022 -.046 .328
**
-.001 .056
spl_aux .017 -.094 .067 .220
**
-.055 .066 .105
conjncts -.027 .104 -.069 .105 -.012 .039 .065
agls_psv -.358
**
-.146 -.199
*
-.090 -.054 .000 -.062
sub_othr .190
*
.079 .108 .059 -.017 .067 .110
vcmp .194
*
.009 -.068 -.041 .594
**
.631
**
.392
**

downtone -.114 .210


*
.075 .107 -.103 -.114 -.020
pred_adj -.023 .105 -.134 .176
*
-.140 .011 .065
allconj .208
*
.085 .078 .108 -.047 -.053 .099
allwhrel .122 -.092 -.010 -.119 .176
*
-.050 -.022
allpro 1 .057 .168
*
.087 .137 .081 .385
**

have .057 1 -.176


*
.268
**
-.003 .068 .066
vprogrsv .168
*
-.176
*
1 .063 -.028 -.064 .088
that_rel .087 .268
**
.063 1 -.068 -.009 .204
*

nonf_vth .137 -.003 -.028 -.068 1 .214


**
.242
**

fact_vth .081 .068 -.064 -.009 .214


**
1 .108
lkly_vth .385
**
.066 .088 .204
*
.242
**
.108 1
factadvl .266
**
.136 -.045 .122 .002 .022 .138
lklyadvl .125 .064 -.022 .094 .057 .022 .174
*

all_jth .004 .073 .004 .143 -.048 .112 .053


all_nth .071 .094 -.163
*
.061 .166
*
-.005 .206
*

all_th .275
**
.092 -.038 .096 .664
**
.610
**
.566
**

all_jto -.037 .013 -.035 .217


**
-.050 .018 .017
all_to .204
*
-.016 .069 .176
*
.338
**
.088 .269
**

all_advl .204
*
.185
*
-.073 .207
*
-.003 .011 .216
**

humann .465
**
-.008 .019 -.130 .071 .204
*
.112
prcessn -.125 -.013 -.049 .028 .130 .174
*
.130
cognitn .160 .124 -.143 .154 .117 .038 .216
**

abstrcn -.035 .027 -.141 .023 .129 .018 .036


concrtn -.094 -.013 .064 -.042 -.183
*
-.018 -.065
tccncrtn -.242
**
-.123 .045 -.067 -.135 -.046 -.110
placen -.108 -.029 .054 -.130 -.035 -.124 -.170
*

topicj .137 .085 -.072 .118 .167


*
.053 .311
**

actv .145 -.174


*
.223
**
-.021 -.043 .067 -.016
commv .217
**
-.086 .040 -.125 .440
**
.075 .140

234
allpro have vprogrsv that_rel nonf_vth fact_vth lkly_vth
mentalv .384
**
-.034 .024 .114 .188
*
.316
**
.257
**

aspectv -.040 -.155 .195


*
-.037 -.115 -.216
**
-.114
ineffective -.265 .113 -.057 -.299
*
-.031 -.057 .023
unreadable -.344
*
.193 -.189 -.213 -.073 .000 -.106
unbiased -.320
*
-.002 -.125 -.182 -.091 .116 -.152
constrained -.444
**
.090 -.186 -.276 -.150 .109 -.275
concrete -.224 .306
*
-.233 -.152 -.225 .011 -.086
awkward -.240 .137 -.076 -.106 .064 -.138 .047
explicit -.441
**
.199 -.216 -.359
*
-.071 -.019 -.141
good .114 -.050 .077 .006 .026 .125 -.093
opinionated .421
**
-.166 .181 .341
*
.116 -.095 .172
dull -.427
**
.011 -.074 -.256 -.089 -.009 -.071
expressive .513
**
.024 .128 .245 .127 .007 .158
interactive .444
**
-.082 .237 .279 .004 .010 .208
unemotional -.551
**
.155 -.172 -.292
*
-.041 .235 -.213
unrelatable -.386
**
.059 -.164 -.213 -.083 -.046 -.118
distant -.568
**
.196 -.231 -.285 -.157 -.012 -.199
serious -.312
*
.073 -.233 -.302
*
-.107 .109 -.034
superficial .020 -.049 -.083 .206 .059 -.223 .054
monotonous -.465
**
.050 -.100 -.248 -.068 -.024 -.118
engaging .394
**
.025 .139 .183 .082 .057 .126
poorly_organized .113 .001 -.119 .085 -.001 -.109 .014
comprehensible .283 -.137 .043 .113 .120 .094 .030
unsuccessful -.206 .098 .089 -.058 -.152 -.198 .025
not_dense .435
**
-.187 .117 .327
*
.129 -.076 .210
formal -.410
**
.050 -.141 -.222 -.123 .117 -.207
hard_to_follow -.433
**
.163 -.203 -.158 -.166 -.030 -.147
descriptive -.337
*
.050 -.290
*
-.518
**
-.110 .044 -.327
*

academic -.552
**
.062 -.171 -.261 -.018 .212 -.184
clear .251 -.236 .018 -.024 .225 .045 .086
not_focused .126 -.201 .185 .028 .124 .012 .052
not_entertaining -.478
**
.021 -.084 -.140 -.288
*
-.020 -.217
important -.160 -.107 -.099 -.247 -.030 -.016 -.265
not_informative .202 -.224 .188 .143 .143 -.005 -.009
useful -.075 -.051 -.027 -.141 -.153 .068 -.341
*

old_fashioned .317
*
-.096 .006 .094 .286 -.207 .235
irrelevant .111 -.069 -.147 .092 .170 -.210 .053
technical -.475
**
.157 -.149 -.336
*
-.090 .249 -.234
impersonal -.522
**
.141 -.203 -.254 -.094 .128 -.248
detailed -.411
**
.275 -.100 -.399
**
-.276 .130 -.168

235
factadvl lklyadvl all_jth all_nth all_th all_jto all_to
LD1 .510
**
.196
*
.182
*
.162 .178
*
.174
*
.235
**

LD2 .141 .112 .639


**
.147 .322
**
.702
**
.426
**

LD3 .276
**
.097 .098 .480
**
.800
**
.072 .481
**

LD4 .085 -.039 -.073 -.075 -.068 -.065 .078


LD5 .005 -.175
*
.173
*
.044 .232
**
.169
*
.217
**

PD1 -.044 .044 -.182


*
-.194
*
-.149 -.255
**
-.326
**

PD2 -.084 .032 -.186


*
-.137 -.081 -.187
*
-.294
**

phrasal .153 -.006 -.012 .035 -.129 .065 .093


lex_bun -.002 -.061 .367
**
.014 .229
**
.491
**
.295
**

length -.091 -.148 .000 -.050 .079 .073 .072


vocab_high .313
**
.140 .195
*
.254
**
.259
**
.190
*
.367
**

vocab_mod -.126 -.055 .054 -.022 .054 .072 .042


vocab_acad -.124 -.072 .079 -.028 .040 .123 .001
NN -.265
**
-.259
**
-.140 -.142 -.146 -.180
*
-.243
**

prv_vb .165
*
.067 .141 .099 .417
**
.136 .262
**

pres .132 .211


*
.306
**
.078 .228
**
.341
**
.195
*

pdem .041 .077 .274


**
.043 .098 .236
**
.182
*

emph .341
**
.060 .186
*
.090 .240
**
.154 .256
**

pro1 .244
**
.073 .061 .111 .283
**
.001 .100
it .108 .071 .292
**
.211
*
.150 .290
**
.218
**

be_state .047 .149 .272


**
.195
*
.208
*
.321
**
.242
**

pany .127 .060 .142 .053 .211


*
.032 .086
amplifr .165
*
.098 .035 -.066 -.092 -.068 -.039
pos_mod .053 .035 .235
**
.186
*
.299
**
.394
**
.301
**

o_and .267
**
-.037 -.095 -.061 -.125 -.004 -.087
n -.322
**
-.251
**
-.233
**
-.153 -.257
**
-.235
**
-.328
**

prep -.077 -.011 -.178


*
-.091 -.230
**
-.236
**
-.188
*

adj_attr .041 -.098 .004 .051 -.006 -.037 -.024


pasttnse -.028 -.194
*
-.163
*
-.032 .037 -.195
*
.043
pro3 .126 .080 -.043 -.003 .131 -.039 .182
*

pub_vb .098 .113 -.045 .318


**
.402
**
.006 .337
**

rel_obj .119 -.025 -.128 .092 -.055 -.176


*
.017
rel_subj .001 -.058 -.003 .073 .111 .003 .095
rel_pipe .088 .075 .067 .101 .040 -.003 .046
n_nom -.026 -.214
**
.151 .030 .189
*
.266
**
.319
**

tm_adv .114 .109 -.030 .073 -.066 -.059 -.024


advs .240
**
.270
**
.111 .025 -.035 .118 -.006
inf .108 .005 .307
**
.141 .366
**
.292
**
.665
**

236
factadvl lklyadvl all_jth all_nth all_th all_jto all_to
prd_mod .157 .117 .307
**
.177
*
.198
*
.298
**
.227
**

sua_vb -.014 -.061 -.022 .091 .189


*
-.029 .160
spl_aux .041 .128 .117 -.005 .082 .143 .118
conjncts .092 .065 .246
**
.061 .108 .201
*
.091
agls_psv -.170
*
-.031 -.002 -.103 -.076 .032 -.085
sub_othr .053 .081 .062 .098 .096 .099 .014
vcmp .045 .092 .010 .168
*
.738
**
-.031 .123
downtone -.027 .217
**
-.068 .123 -.088 .035 -.080
pred_adj .093 .147 .284
**
.086 .082 .370
**
.177
*

allconj .279
**
.075 -.048 .026 -.021 .067 -.003
allwhrel .095 -.010 -.019 .139 .072 -.072 .094
allpro .266
**
.125 .004 .071 .275
**
-.037 .204
*

have .136 .064 .073 .094 .092 .013 -.016


vprogrsv -.045 -.022 .004 -.163
*
-.038 -.035 .069
that_rel .122 .094 .143 .061 .096 .217
**
.176
*

nonf_vth .002 .057 -.048 .166


*
.664
**
-.050 .338
**

fact_vth .022 .022 .112 -.005 .610


**
.018 .088
lkly_vth .138 .174
*
.053 .206
*
.566
**
.017 .269
**

factadvl 1 .067 .156 .200


*
.172
*
.107 .178
*

lklyadvl .067 1 -.043 .015 .102 .008 -.019


all_jth .156 -.043 1 -.071 .322
**
.689
**
.362
**

all_nth .200
*
.015 -.071 1 .407
**
.018 .186
*

all_th .172
*
.102 .322
**
.407
**
1 .215
**
.449
**

all_jto .107 .008 .689


**
.018 .215
**
1 .490
**

all_to .178
*
-.019 .362
**
.186
*
.449
**
.490
**
1
all_advl .707
**
.589
**
.083 .231
**
.188
*
.120 .144
humann .247
**
.190
*
-.014 .037 .171
*
-.092 .150
prcessn .059 -.108 .203
*
.102 .280
**
.147 .163
*

cognitn .135 .029 .037 .421


**
.310
**
.025 .002
abstrcn .086 -.075 .327
**
.080 .173
*
.260
**
.222
**

concrtn -.119 -.152 .051 -.154 -.154 .098 -.090


tccncrtn -.183
*
-.039 .008 -.141 -.166
*
.066 -.192
*

placen -.165
*
-.017 -.135 -.121 -.218
**
-.112 -.084
topicj .187
*
-.023 .035 .085 .253
**
-.049 .199
*

actv -.040 -.172


*
.071 -.088 .004 .087 .085
commv .271
**
.046 -.001 .248
**
.370
**
.000 .322
**

237
factadvl lklyadvl all_jth all_nth all_th all_jto all_to
mentalv .173
*
-.011 .126 .231
**
.474
**
.194
*
.385
**

aspectv .064 .082 -.078 -.084 -.240


**
-.027 -.138
ineffective -.241 .101 -.069 -.151 -.126 -.435
**
-.192
unreadable -.364
*
.079 -.132 -.296
*
-.213 -.397
**
-.515
**

unbiased -.410
**
-.032 -.016 -.283 -.134 -.179 -.336
*

constrained -.485
**
-.024 -.027 -.231 -.202 -.359
*
-.521
**

concrete -.358
*
-.060 -.086 -.184 -.182 .023 -.239
awkward -.192 .267 -.095 -.291
*
-.157 -.422
**
-.326
*

explicit -.452
**
-.094 -.155 -.135 -.176 -.213 -.330
*

good .070 -.188 .008 .131 .125 .402


**
.337
*

opinionated .443
**
.045 .103 .162 .137 .320
*
.435
**

dull -.326
*
.043 -.086 -.232 -.186 -.481
**
-.548
**

expressive .336
*
.083 .040 .216 .211 .334
*
.431
**

interactive .254 -.110 .230 .194 .209 .401


**
.418
**

unemotional -.426
**
-.016 -.013 -.054 -.027 -.351
*
-.486
**

unrelatable -.294
*
-.112 -.100 -.180 -.200 -.365
*
-.522
**

distant -.287 .009 -.146 -.032 -.245 -.567


**
-.525
**

serious -.340
*
.007 -.204 .141 -.024 -.277 -.360
*

superficial .165 .245 .054 -.193 -.154 -.322


*
-.012
monotonous -.333
*
.091 -.060 -.296
*
-.188 -.380
**
-.493
**

engaging .252 -.067 .056 .260 .230 .339


*
.471
**

poorly_organized .177 .301


*
-.139 -.204 -.144 -.214 -.142
comprehensible .143 -.139 .036 .323
*
.248 .274 .367
*

unsuccessful -.067 .160 -.110 -.141 -.261 -.417


**
-.215
not_dense .331
*
-.092 .079 .157 .191 .380
**
.490
**

formal -.402
**
.011 .028 -.206 -.130 -.183 -.383
**

hard_to_follow -.212 .045 -.090 -.169 -.266 -.435


**
-.479
**

descriptive -.478
**
-.075 -.055 -.064 -.171 -.273 -.411
**

academic -.481
**
.065 -.121 -.021 -.023 -.328
*
-.395
**

clear .059 -.099 .163 .172 .275 .327


*
.451
**

not_focused .182 .436


**
-.004 .023 .062 -.006 .137
not_entertaining -.310
*
-.004 -.033 -.177 -.312
*
-.440
**
-.521
**

important -.200 -.069 .098 .142 -.048 .198 .093


not_informative .230 -.021 .121 -.207 .034 -.025 .217
useful -.140 -.111 .274 .156 -.013 .213 -.074
old_fashioned .054 .412
**
-.050 .048 .109 -.012 .099
irrelevant -.026 .118 -.074 -.359
*
-.122 -.114 -.134
technical -.490
**
.040 -.012 -.142 -.055 -.239 -.395
**

impersonal -.486
**
-.034 -.059 -.133 -.147 -.291
*
-.454
**

detailed -.478
**
-.186 .011 -.116 -.144 -.239 -.441
**

238
all_advl humann prcessn cognitn abstrcn concrtn tccncrtn
LD1 .461
**
.175
*
.003 .185
*
.154 -.164
*
-.476
**

LD2 .229
**
.048 .130 .155 .199
*
.162 .012
LD3 .254
**
.408
**
.232
**
.468
**
.175
*
-.195
*
-.277
**

LD4 -.033 .148 -.148 -.139 -.119 -.160 -.207


*

LD5 -.064 -.161 .604


**
.139 .519
**
-.138 -.016
PD1 -.002 -.034 -.054 -.057 -.234
**
.075 .224
**

PD2 -.064 .017 .033 .052 -.163


*
.132 .174
*

phrasal .159 -.065 -.063 -.099 -.009 -.138 -.155


lex_bun .061 .004 .237
**
.027 .101 .200
*
.152
length -.076 -.180
*
.279
**
-.006 .098 -.186
*
.104
vocab_high .263
**
.231
**
.088 .220
**
.242
**
-.219
**
-.463
**

vocab_mod -.091 -.141 .257


**
.035 .171
*
.189
*
.126
vocab_acad -.063 -.224
**
.305
**
.004 .143 .145 .256
**

NN -.302
**
-.212
*
.000 -.217
**
-.165
*
.133 .292
**

prv_vb .144 .262


**
-.016 .164
*
.001 -.060 -.121
pres .249
**
.088 .031 .085 .082 .209
*
.085
pdem .087 -.030 .060 .095 .033 .025 -.003
emph .287
**
.103 .109 .113 .148 -.093 -.180
*

pro1 .206
*
.188
*
-.004 .275
**
.065 .080 -.072
it .082 .062 -.045 .151 .165
*
.001 -.236
**

be_state .234
**
.051 .005 .144 .031 .044 .070
pany .075 .211
*
-.116 .053 -.082 -.117 -.072
amplifr .174
*
.041 -.113 .019 .014 -.055 -.107
pos_mod .119 .034 .073 .216
**
.067 .200
*
.037
o_and .173
*
-.003 -.027 -.085 -.067 .053 -.181
*

n -.338
**
-.190
*
.002 -.173
*
-.099 .117 .319
**

prep -.082 -.097 .101 .107 .147 -.136 -.157


adj_attr -.008 -.236
**
.216
**
.027 .134 -.141 .046
pasttnse -.205
*
.101 -.054 -.081 -.099 -.219
**
-.236
**

pro3 .065 .439


**
-.146 -.015 -.086 -.175
*
-.256
**

pub_vb .096 -.021 .164


*
.136 .128 -.196
*
-.180
*

rel_obj .048 .010 -.194


*
.035 -.145 -.050 -.224
**

rel_subj -.075 .098 -.037 -.029 -.046 -.090 -.102


rel_pipe .088 -.004 .103 .111 .068 -.139 -.095
n_nom -.078 -.062 .396
**
.124 .366
**
-.098 -.069
tm_adv .106 .077 -.104 .088 -.187
*
-.025 -.051
advs .302
**
.061 -.236
**
-.029 .043 -.010 -.112
inf .048 .145 .192
*
.050 .195
*
-.118 -.238
**

239
all_advl humann prcessn cognitn abstrcn concrtn tccncrtn
prd_mod .187
*
.086 .157 .235
**
.156 .036 -.130
sua_vb -.048 .071 .083 .066 .112 -.184
*
-.179
*

spl_aux .110 -.058 -.140 -.017 -.136 .118 .008


conjncts .055 .034 -.002 .146 .242
**
-.071 -.094
agls_psv -.097 -.086 -.069 -.146 -.196
*
.245
**
.216
**

sub_othr .075 .141 -.048 -.081 -.028 .052 -.273


**

vcmp .066 .099 .181


*
.140 .133 -.126 -.116
downtone .205
*
-.020 -.157 -.089 -.118 .013 .102
pred_adj .223
**
.113 -.050 -.034 .027 .157 .025
allconj .246
**
.099 -.045 -.044 -.068 .057 -.230
**

allwhrel .015 .066 -.054 .052 -.057 -.152 -.212


*

allpro .204
*
.465
**
-.125 .160 -.035 -.094 -.242
**

have .185
*
-.008 -.013 .124 .027 -.013 -.123
vprogrsv -.073 .019 -.049 -.143 -.141 .064 .045
that_rel .207
*
-.130 .028 .154 .023 -.042 -.067
nonf_vth -.003 .071 .130 .117 .129 -.183
*
-.135
fact_vth .011 .204
*
.174
*
.038 .018 -.018 -.046
lkly_vth .216
**
.112 .130 .216
**
.036 -.065 -.110
factadvl .707
**
.247
**
.059 .135 .086 -.119 -.183
*

lklyadvl .589
**
.190
*
-.108 .029 -.075 -.152 -.039
all_jth .083 -.014 .203
*
.037 .327
**
.051 .008
all_nth .231
**
.037 .102 .421
**
.080 -.154 -.141
all_th .188
*
.171
*
.280
**
.310
**
.173
*
-.154 -.166
*

all_jto .120 -.092 .147 .025 .260


**
.098 .066
all_to .144 .150 .163
*
.002 .222
**
-.090 -.192
*

all_advl 1 .269
**
.001 .151 -.023 -.173
*
-.170
*

humann .269
**
1 -.054 .037 -.021 -.019 -.167
*

prcessn .001 -.054 1 .222


**
.299
**
-.107 -.107
cognitn .151 .037 .222
**
1 .151 -.157 -.106
abstrcn -.023 -.021 .299
**
.151 1 -.112 -.008
concrtn -.173
*
-.019 -.107 -.157 -.112 1 .230
**

tccncrtn -.170
*
-.167
*
-.107 -.106 -.008 .230
**
1
placen -.143 -.091 -.194
*
-.241
**
-.113 .125 -.011
topicj .080 .003 .288
**
.183
*
.174
*
-.230
**
-.211
*

actv -.183
*
.103 -.043 -.136 -.065 .132 -.018
commv .189
*
.209
*
.154 .126 .105 -.251
**
-.198
*

240
all_advl humann prcessn cognitn abstrcn concrtn tccncrtn
mentalv .135 .191
*
.153 .243
**
.098 -.017 -.146
aspectv .042 -.197
*
.047 -.095 .048 -.183
*
.014
ineffective -.148 .061 -.404
**
-.201 -.210 .107 .231
unreadable -.198 -.095 -.267 -.369
*
-.450
**
.392
**
.421
**

unbiased -.361
*
-.134 -.139 -.232 -.550
**
.330
*
.351
*

constrained -.347
*
-.074 -.037 -.222 -.534
**
.361
*
.338
*

concrete -.287 -.089 .079 -.283 -.422


**
.339
*
.178
awkward -.045 .017 -.368
*
-.213 -.358
*
.263 .466
**

explicit -.342
*
-.221 -.061 -.286 -.196 .333
*
.409
**

good -.031 -.085 .341


*
-.028 .201 -.038 -.218
opinionated .376
**
.198 .117 .280 .638
**
-.331
*
-.423
**

dull -.242 -.080 -.079 -.258 -.478


**
.436
**
.421
**

expressive .276 .054 .045 .221 .536


**
-.402
**
-.322
*

interactive .114 .078 .292


*
.363
*
.487
**
-.239 -.389
**

unemotional -.313
*
-.248 -.018 -.251 -.530
**
.441
**
.464
**

unrelatable -.272 -.169 -.078 -.172 -.393


**
.427
**
.481
**

distant -.227 -.092 -.199 -.244 -.445


**
.382
**
.384
**

serious -.229 -.032 .082 .049 -.321


*
.206 .206
superficial .222 .300
*
-.323
*
-.015 -.088 -.192 -.055
monotonous -.229 -.120 -.225 -.271 -.460
**
.391
**
.528
**

engaging .133 .089 .187 .298


*
.407
**
-.395
**
-.480
**

poorly_organized .247 .192 -.293


*
-.024 -.006 -.142 .015
comprehensible .026 .098 .302
*
.351
*
.322
*
-.214 -.374
**

unsuccessful .038 -.029 -.173 -.104 -.073 -.019 .111


not_dense .184 .189 .142 .346
*
.503
**
-.373
**
-.426
**

formal -.244 -.031 .062 -.235 -.538


**
.403
**
.252
hard_to_follow -.095 -.091 -.263 -.303
*
-.367
*
.285 .425
**

descriptive -.370
*
-.062 -.164 -.097 -.196 .378
**
.442
**

academic -.287 -.125 .055 -.279 -.515


**
.374
**
.421
**

clear -.010 .012 .345


*
.156 .284 -.150 -.420
**

not_focused .463
**
.260 -.102 .054 .166 -.428
**
-.278
not_entertaining -.258 -.019 -.034 -.141 -.540
**
.414
**
.345
*

important -.153 -.122 .210 -.037 .332


*
.077 .044
not_informative .130 .024 -.090 .001 .178 -.013 -.108
useful -.202 -.225 .170 -.147 .021 .216 .088
old_fashioned .358
*
.445
**
.025 .385
**
.188 -.457
**
-.035
irrelevant .091 .301
*
-.280 .296
*
-.016 -.058 .099
technical -.367
*
-.166 -.118 -.348
*
-.567
**
.481
**
.478
**

impersonal -.372
**
-.228 -.024 -.341
*
-.455
**
.451
**
.431
**

detailed -.462
**
-.272 -.075 -.133 -.437
**
.522
**
.428
**

241
placen topicj actv commv mentalv aspectv ineffective
LD1 -.025 .240
**
-.136 .138 .200
*
.077 -.281
LD2 -.144 .012 -.114 .022 .256
**
-.127 -.181
LD3 -.216
**
.304
**
.005 .589
**
.623
**
-.184
*
-.205
LD4 .085 -.034 .589
**
.106 .066 .516
**
-.340
*

LD5 -.122 .555


**
-.142 .052 .056 -.089 -.346
*

PD1 -.032 -.210


*
-.132 -.114 -.238
**
-.118 .743
**

PD2 -.130 -.146 -.172


*
-.117 -.099 -.198
*
.345
*

phrasal .021 -.023 .237


**
.007 -.037 .334
**
-.187
lex_bun -.071 .043 -.090 -.058 .057 -.197
*
-.132
length -.178
*
.253
**
-.126 -.027 -.179
*
-.111 -.197
vocab_high .041 .279
**
.094 .296
**
.370
**
.185
*
-.413
**

vocab_mod .129 .171


*
-.011 -.019 .023 -.093 -.015
vocab_acad -.113 .036 -.286
**
-.168
*
-.112 -.255
**
.044
NN .034 -.168
*
-.018 -.090 -.237
**
-.117 .161
prv_vb -.092 .145 .137 .274
**
.643
**
-.061 -.346
*

pres -.125 .005 -.171


*
.066 .187
*
-.191
*
.047
pdem -.176
*
.131 -.095 -.119 .066 .001 .063
emph .002 .252
**
-.067 .174
*
.101 .044 -.043
pro1 -.067 .130 -.055 .102 .372
**
-.140 .032
it .058 .079 -.043 .144 .233
**
-.003 -.111
be_state -.110 -.111 -.188
*
.056 .227
**
-.060 -.168
pany -.084 -.078 .012 .189
*
.294
**
.001 -.020
amplifr -.028 -.096 -.351
**
-.073 -.091 -.125 .144
pos_mod -.058 -.086 -.051 .116 .287
**
-.095 -.146
o_and -.020 -.043 -.035 .019 -.046 .069 -.113
n .012 -.176
*
-.004 -.121 -.347
**
-.120 .295
*

prep .216
**
.120 -.122 -.010 -.115 .283
**
.082
adj_attr -.006 .371
**
-.164
*
-.132 -.136 -.066 -.136
pasttnse -.009 .040 .350
**
.095 .124 .181
*
-.170
pro3 -.082 .082 .208
*
.169
*
.177
*
.019 -.302
*

pub_vb -.047 .187


*
-.108 .570
**
.167
*
.067 -.112
rel_obj .165
*
.054 -.125 .038 .092 -.036 .006
rel_subj -.020 .132 .084 .175
*
.126 .029 -.161
rel_pipe .041 .038 -.170
*
.045 -.039 .148 .022
n_nom -.132 .257
**
.028 .054 .098 -.062 -.469
**

tm_adv .059 -.093 .188


*
.085 .025 .162 .011
advs .106 .126 -.034 -.053 .036 .044 .018
inf -.110 .255
**
.189
*
.398
**
.357
**
.066 -.283

242
placen topicj actv commv mentalv aspectv ineffective
prd_mod -.082 .072 -.055 .045 .133 .094 -.017
sua_vb .018 .099 .040 .319
**
.064 .012 -.080
spl_aux -.059 .040 -.073 -.066 .047 -.063 -.204
conjncts .101 .150 -.205
*
-.036 -.016 .057 -.102
agls_psv -.039 -.192
*
.158 -.220
**
-.110 -.151 .247
sub_othr .045 -.076 .061 .003 .118 .009 -.167
vcmp -.179
*
.160 -.038 .366
**
.246
**
-.104 -.159
downtone -.092 -.069 -.131 -.120 -.017 -.005 .211
pred_adj -.099 -.047 -.184
*
-.109 .112 -.244
**
-.119
allconj -.083 -.001 -.070 .033 .057 .058 -.161
allwhrel .078 .131 -.087 .154 .103 .079 -.089
allpro -.108 .137 .145 .217
**
.384
**
-.040 -.265
have -.029 .085 -.174
*
-.086 -.034 -.155 .113
vprogrsv .054 -.072 .223
**
.040 .024 .195
*
-.057
that_rel -.130 .118 -.021 -.125 .114 -.037 -.299
*

nonf_vth -.035 .167


*
-.043 .440
**
.188
*
-.115 -.031
fact_vth -.124 .053 .067 .075 .316
**
-.216
**
-.057
lkly_vth -.170
*
.311
**
-.016 .140 .257
**
-.114 .023
factadvl -.165
*
.187
*
-.040 .271
**
.173
*
.064 -.241
lklyadvl -.017 -.023 -.172
*
.046 -.011 .082 .101
all_jth -.135 .035 .071 -.001 .126 -.078 -.069
all_nth -.121 .085 -.088 .248
**
.231
**
-.084 -.151
all_th -.218
**
.253
**
.004 .370
**
.474
**
-.240
**
-.126
all_jto -.112 -.049 .087 .000 .194
*
-.027 -.435
**

all_to -.084 .199


*
.085 .322
**
.385
**
-.138 -.192
all_advl -.143 .080 -.183
*
.189
*
.135 .042 -.148
humann -.091 .003 .103 .209
*
.191
*
-.197
*
.061
prcessn -.194
*
.288
**
-.043 .154 .153 .047 -.404
**

cognitn -.241
**
.183
*
-.136 .126 .243
**
-.095 -.201
abstrcn -.113 .174
*
-.065 .105 .098 .048 -.210
concrtn .125 -.230
**
.132 -.251
**
-.017 -.183
*
.107
tccncrtn -.011 -.211
*
-.018 -.198
*
-.146 .014 .231
placen 1 -.027 .097 -.083 -.169
*
.151 .027
topicj -.027 1 -.105 .187
*
.216
**
-.024 -.257
actv .097 -.105 1 -.032 .001 .135 -.130
commv -.083 .187
*
-.032 1 .204
*
-.007 -.070

243
placen topicj actv commv mentalv aspectv ineffective
mentalv -.169
*
.216
**
.001 .204
*
1 -.122 -.225
aspectv .151 -.024 .135 -.007 -.122 1 -.358
*

ineffective .027 -.257 -.130 -.070 -.225 -.358


*
1
unreadable .003 -.612
**
-.161 -.436
**
-.302
*
-.298
*
.617
**

unbiased .087 -.611


**
.188 -.458
**
-.132 -.206 .187
constrained .012 -.543
**
-.018 -.373
**
-.264 -.388
**
.436
**

concrete .210 -.263 .124 -.470


**
-.053 -.062 -.042
awkward -.078 -.447
**
-.096 -.416
**
-.454
**
-.212 .644
**

explicit .046 -.261 .107 -.470


**
-.238 -.205 .060
good .106 .286 .131 .006 .381
**
.181 -.723
**

opinionated -.184 .522


**
-.138 .499
**
.204 .281 -.315
*

dull .034 -.530


**
-.068 -.447
**
-.340
*
-.359
*
.571
**

expressive -.060 .561


**
-.066 .463
**
.197 .446
**
-.425
**

interactive -.125 .521


**
.097 .421
**
.347
*
.299
*
-.448
**

unemotional .015 -.625


**
-.090 -.363
*
-.166 -.400
**
.350
*

unrelatable .057 -.485


**
-.110 -.408
**
-.175 -.327
*
.488
**

distant .129 -.508


**
-.100 -.341
*
-.314
*
-.351
*
.626
**

serious .046 -.178 -.084 -.429


**
-.110 -.189 .038
superficial .039 -.097 .044 .131 -.191 -.171 .460
**

monotonous .136 -.522


**
-.094 -.412
**
-.232 -.294
*
.544
**

engaging -.126 .468


**
.126 .335
*
.185 .339
*
-.590
**

poorly_organized -.023 -.013 -.192 .076 -.185 .022 .543


**

comprehensible .079 .483


**
.195 .252 .379
**
.254 -.661
**

unsuccessful .107 -.234 -.312


*
-.104 -.252 -.252 .759
**

not_dense -.011 .576


**
.027 .482
**
.275 .327
*
-.548
**

formal .022 -.542


**
-.052 -.417
**
-.136 -.311
*
.260
hard_to_follow -.039 -.537
**
-.188 -.313
*
-.356
*
-.373
**
.602
**

descriptive .035 -.216 .051 -.360


*
-.060 -.100 .007
academic -.027 -.550
**
-.107 -.356
*
-.145 -.334
*
.316
*

clear -.014 .409


**
.211 .323
*
.300
*
.115 -.618
**

not_focused -.061 -.072 -.241 .231 .019 -.092 .219


not_entertaining .026 -.439
**
-.097 -.338
*
-.366
*
-.361
*
.453
**

important .045 .205 -.049 .089 .076 -.069 -.350


*

not_informative -.078 -.002 .033 .375


**
.239 .070 .186
useful -.015 .201 .056 -.015 .118 -.023 -.567
**

old_fashioned .016 -.001 -.094 .191 -.148 .305


*
.065
irrelevant -.074 -.225 .006 -.179 -.092 .056 .417
**

technical -.029 -.565


**
-.042 -.345
*
-.111 -.385
**
.275
impersonal .044 -.561
**
-.032 -.421
**
-.178 -.347
*
.348
*

detailed -.036 -.475


**
-.037 -.461
**
-.131 -.310
*
.203

244
unreadable unbiased constrained concrete awkward explicit good
LD1 -.541
**
-.608
**
-.662
**
-.496
**
-.340
*
-.630
**
.069
LD2 -.141 -.181 -.194 -.195 -.155 -.326
*
-.007
LD3 -.449
**
-.358
*
-.409
**
-.352
*
-.376
**
-.409
**
.194
LD4 -.484
**
-.239 -.515
**
-.145 -.324
*
-.248 .223
LD5 -.595
**
-.610
**
-.518
**
-.306
*
-.498
**
-.258 .374
**

PD1 .934
**
.604
**
.777
**
.246 .848
**
.345
*
-.734
**

PD2 .814
**
.830
**
.898
**
.686
**
.584
**
.718
**
-.249
phrasal -.283 -.319
*
-.333
*
-.099 -.287 -.303
*
.168
lex_bun -.095 .052 .015 .054 -.115 -.128 .075
length -.500
**
-.566
**
-.431
**
-.409
**
-.377
**
-.303
*
.193
vocab_high -.736
**
-.575
**
-.687
**
-.348
*
-.540
**
-.532
**
.348
*

vocab_mod -.082 -.041 -.002 .044 -.111 .077 .076


vocab_acad .240 .249 .387
**
.310
*
.010 .202 .120
NN .339
*
.484
**
.474
**
.322
*
.066 .394
**
-.005
prv_vb -.345
*
-.049 -.262 -.188 -.453
**
-.316
*
.281
pres .057 -.118 -.034 -.167 .060 -.264 -.120
pdem .017 -.032 -.070 -.072 .012 -.255 -.278
emph -.412
**
-.343
*
-.436
**
-.162 -.270 -.351
*
.054
pro1 .216 .146 .188 .071 .089 -.145 .011
it -.359
*
-.462
**
-.373
**
-.469
**
-.242 -.366
*
.071
be_state -.029 -.104 -.034 -.072 -.164 -.171 .010
pany -.025 -.129 -.075 .024 .011 -.125 .059
amplifr .200 .041 .050 -.268 .230 -.213 -.283
pos_mod -.070 -.132 -.202 -.198 -.107 -.251 -.069
o_and -.240 -.098 -.223 -.114 -.254 -.286 -.043
n .401
**
.466
**
.591
**
.353
*
.199 .402
**
-.155
prep -.272 -.406
**
-.272 -.261 -.140 -.278 -.049
adj_attr -.325
*
-.545
**
-.367
*
-.297
*
-.173 -.201 .212
pasttnse -.163 .135 -.105 .222 -.220 .238 .228
pro3 -.467
**
-.399
**
-.552
**
-.254 -.297
*
-.397
**
.124
pub_vb -.293
*
-.287 -.287 -.259 -.264 -.226 .081
rel_obj -.146 -.177 -.330
*
-.406
**
-.103 -.282 -.063
rel_subj -.240 -.194 -.196 -.216 -.218 -.013 .206
rel_pipe -.083 -.085 -.066 -.311
*
.039 -.309
*
-.259
n_nom -.612
**
-.331
*
-.411
**
-.165 -.645
**
-.173 .570
**

tm_adv .188 .338


*
.383
**
.189 .019 .247 .060
advs -.280 -.506
**
-.563
**
-.629
**
-.011 -.428
**
-.133
inf -.555
**
-.347
*
-.630
**
-.325
*
-.361
*
-.331
*
.295
*

245
unreadable unbiased constrained concrete awkward explicit good
prd_mod .007 -.049 -.085 -.203 .005 -.131 -.156
sua_vb -.148 -.235 -.036 -.116 -.130 -.093 .116
spl_aux -.089 -.022 -.131 -.027 -.015 -.184 .091
conjncts -.333
*
-.430
**
-.456
**
-.619
**
-.090 -.560
**
-.197
agls_psv .433
**
.621
**
.462
**
.476
**
.310
*
.440
**
-.137
sub_othr -.058 -.109 .029 .070 -.155 -.017 .174
vcmp -.096 -.136 -.134 -.156 -.137 -.097 .103
downtone .217 -.054 .092 .102 .373
**
.191 -.162
pred_adj .102 -.024 .030 .064 .099 .102 .130
allconj -.277 -.217 -.292
*
-.180 -.260 -.283 .004
allwhrel -.256 -.238 -.283 -.470
**
-.153 -.292
*
-.044
allpro -.344
*
-.320
*
-.444
**
-.224 -.240 -.441
**
.114
have .193 -.002 .090 .306
*
.137 .199 -.050
vprogrsv -.189 -.125 -.186 -.233 -.076 -.216 .077
that_rel -.213 -.182 -.276 -.152 -.106 -.359
*
.006
nonf_vth -.073 -.091 -.150 -.225 .064 -.071 .026
fact_vth .000 .116 .109 .011 -.138 -.019 .125
lkly_vth -.106 -.152 -.275 -.086 .047 -.141 -.093
factadvl -.364
*
-.410
**
-.485
**
-.358
*
-.192 -.452
**
.070
lklyadvl .079 -.032 -.024 -.060 .267 -.094 -.188
all_jth -.132 -.016 -.027 -.086 -.095 -.155 .008
all_nth -.296
*
-.283 -.231 -.184 -.291
*
-.135 .131
all_th -.213 -.134 -.202 -.182 -.157 -.176 .125
all_jto -.397
**
-.179 -.359
*
.023 -.422
**
-.213 .402
**

all_to -.515
**
-.336
*
-.521
**
-.239 -.326
*
-.330
*
.337
*

all_advl -.198 -.361


*
-.347
*
-.287 -.045 -.342
*
-.031
humann -.095 -.134 -.074 -.089 .017 -.221 -.085
prcessn -.267 -.139 -.037 .079 -.368
*
-.061 .341
*

cognitn -.369
*
-.232 -.222 -.283 -.213 -.286 -.028
abstrcn -.450
**
-.550
**
-.534
**
-.422
**
-.358
*
-.196 .201
concrtn .392
**
.330
*
.361
*
.339
*
.263 .333
*
-.038
tccncrtn .421
**
.351
*
.338
*
.178 .466
**
.409
**
-.218
placen .003 .087 .012 .210 -.078 .046 .106
topicj -.612
**
-.611
**
-.543
**
-.263 -.447
**
-.261 .286
actv -.161 .188 -.018 .124 -.096 .107 .131
commv -.436
**
-.458
**
-.373
**
-.470
**
-.416
**
-.470
**
.006

246
unreadable unbiased constrained concrete awkward explicit good
mentalv -.302
*
-.132 -.264 -.053 -.454
**
-.238 .381
**

aspectv -.298
*
-.206 -.388
**
-.062 -.212 -.205 .181
ineffective ** ** ** -
.617 .187 .436 -.042 .644 .060 **
.723
unreadable ** ** ** ** ** -
1 .690 .809 .389 .763 .475 **
.607
unbiased .690
**
1 .757
**
.558
**
.439
**
.543
**
-.226
constrained .809
**
.757
**
1 .532
**
.569
**
.566
**
-.371
*

concrete .389
**
.558
**
.532
**
1 .160 .629
**
.219
awkward ** ** ** ** -
.763 .439 .569 .160 1 .377 **
.610
explicit .475
**
.543
**
.566
**
.629
**
.377
**
1 .131
good -.607
**
-.226 -.371
*
.219 -.610
**
.131 1
opinionated -.709
**
-.800
**
-.759
**
-.700
**
-.481
**
-.595
**
.239
dull ** ** ** * ** ** -
.844 .637 .837 .369 .720 .388 **
.536
expressive -.756
**
-.726
**
-.827
**
-.530
**
-.621
**
-.498
**
.384
**

interactive -.795
**
-.681
**
-.731
**
-.384
**
-.696
**
-.610
**
.431
**

unemotional .781
**
.697
**
.835
**
.532
**
.560
**
.610
**
-.288
unrelatable ** ** ** * ** ** -
.823 .626 .747 .306 .639 .417 **
.469
distant ** ** ** ** ** ** -
.818 .568 .775 .380 .670 .445 **
.581
serious .333
*
.469
**
.420
**
.466
**
.197 .558
**
.032
superficial ** ** * -
.176 -.075 -.005 -.390 .374 -.347 **
.558
monotonous ** ** ** ** ** ** -
.821 .668 .795 .467 .727 .461 **
.505
engaging -.812
**
-.621
**
-.714
**
-.272 -.733
**
-.287 .607
**

poorly_organized * ** -
.300 -.086 .052 -.268 .414 -.245 **
.651
comprehensible -.867
**
-.520
**
-.666
**
-.208 -.770
**
-.347
*
.620
**

unsuccessful ** ** ** -
.598 .231 .431 -.122 .557 -.016 **
.702
not_dense -.856
**
-.671
**
-.854
**
-.478
**
-.734
**
-.565
**
.456
**

formal .695
**
.692
**
.832
**
.615
**
.462
**
.549
**
-.182
hard_to_follow ** ** ** ** ** -
.884 .589 .738 .218 .770 .391 **
.609
descriptive .354
*
.392
**
.420
**
.304
*
.262 .663
**
.065
academic .715
**
.710
**
.813
**
.539
**
.510
**
.605
**
-.195
clear -.748
**
-.425
**
-.475
**
-.159 -.682
**
-.153 .701
**

not_focused -.051 -.276 -.172 -.483


**
-.010 -.387
**
-.244
not_entertaining ** ** ** * ** * -
.720 .644 .766 .308 .594 .346 **
.442
important -.265 -.034 -.001 .218 -.251 .388
**
.501
**

not_informative -.110 -.201 -.220 -.530


**
-.005 -.532
**
-.185
useful * * **
-.260 .002 -.017 .214 -.342 .294 .594

247
opinionated dull expressive interactive unemotional unrelatable distant
LD1 .677
**
-.569
**
.647
**
.502
**
-.655
**
-.600
**
-.517
**

LD2 .274 -.201 .147 .085 -.088 -.166 -.182


LD3 .403
**
-.448
**
.419
**
.449
**
-.335
*
-.362
*
-.425
**

LD4 .321
*
-.512
**
.470
**
.455
**
-.642
**
-.484
**
-.552
**

LD5 .570
**
-.442
**
.502
**
.563
**
-.422
**
-.481
**
-.507
**

PD1 -.629
**
.883
**
-.774
**
-.782
**
.732
**
.824
**
.859
**

PD2 -.888
**
.801
**
-.868
**
-.836
**
.933
**
.753
**
.815
**

phrasal .297
*
-.310
*
.280 .265 -.400
**
-.286 -.279
lex_bun .043 -.148 -.029 .046 .048 -.063 -.120
length .577
**
-.404
**
.461
**
.500
**
-.330
*
-.514
**
-.409
**

vocab_high .648
**
-.680
**
.701
**
.611
**
-.731
**
-.695
**
-.650
**

vocab_mod .008 .113 -.019 .084 .070 -.015 -.057


vocab_acad -.337
*
.351
*
-.299
*
-.200 .518
**
.261 .330
*

NN -.459
**
.432
**
-.467
**
-.279 .458
**
.450
**
.353
*

prv_vb .173 -.353


*
.295
*
.315
*
-.186 -.194 -.341
*

pres .203 .084 -.104 -.060 .082 .015 .074


pdem .095 .019 .076 .002 -.016 .011 .065
emph .370
*
-.281 .334
*
.301
*
-.452
**
-.377
**
-.278
pro1 -.038 .167 -.093 .001 .098 .281 .066
it .548
**
-.388
**
.385
**
.184 -.447
**
-.361
*
-.247
be_state .028 -.103 .070 .011 .042 -.024 -.120
pany .056 -.038 .101 .107 -.092 -.110 -.087
amplifr .023 .150 -.039 -.218 .162 .096 .083
pos_mod .131 -.156 .170 .063 -.039 -.102 -.112
o_and .207 -.232 .223 .281 -.331
*
-.284 -.215
n -.480
**
.498
**
-.559
**
-.318
*
.476
**
.478
**
.467
**

prep .310
*
-.268 .419
**
.409
**
-.396
**
-.206 -.265
adj_attr .372
*
-.241 .339
*
.241 -.179 -.261 -.233
pasttnse -.138 -.260 .171 .148 -.223 -.144 -.190
pro3 .458
**
-.528
**
.586
**
.474
**
-.631
**
-.544
**
-.634
**

pub_vb .383
**
-.320
*
.304
*
.188 -.199 -.279 -.254
rel_obj .310
*
-.339
*
.328
*
.104 -.219 -.274 -.122
rel_subj .215 -.233 .184 .110 -.181 -.238 -.205
rel_pipe .167 -.070 .112 .194 -.122 .044 -.109
n_nom .448
**
-.540
**
.526
**
.524
**
-.423
**
-.562
**
-.549
**

tm_adv -.277 .182 -.171 -.233 .233 .202 .210


advs .507
**
-.313
*
.431
**
.241 -.490
**
-.330
*
-.300
*

inf .413
**
-.598
**
.469
**
.448
**
-.487
**
-.500
**
-.526
**

248
opinionated dull expressive interactive unemotional unrelatable distant
prd_mod .176 -.106 .135 -.017 -.107 -.086 -.080
sua_vb .153 -.199 .228 .186 -.277 -.104 -.224
spl_aux .152 -.045 .048 .136 -.051 -.045 -.114
conjncts .480
**
-.294
*
.488
**
.390
**
-.410
**
-.322
*
-.320
*

agls_psv -.610
**
.460
**
-.591
**
-.461
**
.428
**
.404
**
.422
**

sub_othr .148 .060 .032 .007 .048 -.116 .063


vcmp .130 -.195 .236 .105 .080 -.131 -.129
downtone -.109 .144 -.082 -.179 .172 .112 .162
pred_adj -.034 .121 -.164 -.256 .191 .046 .088
allconj .332
*
-.243 .301
*
.282 -.349
*
-.354
*
-.196
allwhrel .348
*
-.313
*
.300
*
.223 -.267 -.224 -.241
allpro .421
**
-.427
**
.513
**
.444
**
-.551
**
-.386
**
-.568
**

have -.166 .011 .024 -.082 .155 .059 .196


vprogrsv .181 -.074 .128 .237 -.172 -.164 -.231
that_rel .341
*
-.256 .245 .279 -.292
*
-.213 -.285
nonf_vth .116 -.089 .127 .004 -.041 -.083 -.157
fact_vth -.095 -.009 .007 .010 .235 -.046 -.012
lkly_vth .172 -.071 .158 .208 -.213 -.118 -.199
factadvl .443
**
-.326
*
.336
*
.254 -.426
**
-.294
*
-.287
lklyadvl .045 .043 .083 -.110 -.016 -.112 .009
all_jth .103 -.086 .040 .230 -.013 -.100 -.146
all_nth .162 -.232 .216 .194 -.054 -.180 -.032
all_th .137 -.186 .211 .209 -.027 -.200 -.245
all_jto .320
*
-.481
**
.334
*
.401
**
-.351
*
-.365
*
-.567
**

all_to .435
**
-.548
**
.431
**
.418
**
-.486
**
-.522
**
-.525
**

all_advl .376
**
-.242 .276 .114 -.313
*
-.272 -.227
humann .198 -.080 .054 .078 -.248 -.169 -.092
prcessn .117 -.079 .045 .292
*
-.018 -.078 -.199
cognitn .280 -.258 .221 .363
*
-.251 -.172 -.244
abstrcn .638
**
-.478
**
.536
**
.487
**
-.530
**
-.393
**
-.445
**

concrtn -.331
*
.436
**
-.402
**
-.239 .441
**
.427
**
.382
**

tccncrtn -.423
**
.421
**
-.322
*
-.389
**
.464
**
.481
**
.384
**

placen -.184 .034 -.060 -.125 .015 .057 .129


topicj .522
**
-.530
**
.561
**
.521
**
-.625
**
-.485
**
-.508
**

actv -.138 -.068 -.066 .097 -.090 -.110 -.100


commv .499
**
-.447
**
.463
**
.421
**
-.363
*
-.408
**
-.341
*

249
opinion dull expressive interactive unemotional unrelatable distant
mentalv .204 -.340
*
.197 .347
*
-.166 -.175 -.314
*

aspectv .281 -.359


*
.446
**
.299
*
-.400
**
-.327
*
-.351
*

ineffective -.315
*
.571
**
-.425
**
-.448
**
.350
*
.488
**
.626
**

unreadable -.709
**
.844
**
-.756
**
-.795
**
.781
**
.823
**
.818
**

unbiased -.800
**
.637
**
-.726
**
-.681
**
.697
**
.626
**
.568
**

constrained -.759
**
.837
**
-.827
**
-.731
**
.835
**
.747
**
.775
**

concrete -.700
**
.369
*
-.530
**
-.384
**
.532
**
.306
*
.380
**

awkward -.481
**
.720
**
-.621
**
-.696
**
.560
**
.639
**
.670
**

explicit -.595
**
.388
**
-.498
**
-.610
**
.610
**
.417
**
.445
**

good .239 -.536


**
.384
**
.431
**
-.288 -.469
**
-.581
**

opinionated 1 -.682
**
.776
**
.733
**
-.783
**
-.683
**
-.680
**

dull -.682
**
1 -.823
**
-.730
**
.792
**
.817
**
.822
**

expressive .776
**
-.823
**
1 .786
**
-.794
**
-.741
**
-.781
**

interactive .733
**
-.730
**
.786
**
1 -.758
**
-.664
**
-.791
**

unemotional -.783
**
.792
**
-.794
**
-.758
**
1 .749
**
.808
**

unrelatable -.683
**
.817
**
-.741
**
-.664
**
.749
**
1 .755
**

distant -.680
**
.822
**
-.781
**
-.791
**
.808
**
.755
**
1
serious -.533
**
.380
**
-.407
**
-.476
**
.536
**
.476
**
.471
**

superficial .083 .144 -.037 -.033 -.123 .144 .111


monotonous -.730
**
.849
**
-.805
**
-.755
**
.772
**
.723
**
.784
**

engaging .626
**
-.852
**
.801
**
.718
**
-.689
**
-.818
**
-.762
**

poorly_organized .102 .245 -.109 -.241 -.010 .149 .262


comprehensible .479
**
-.713
**
.642
**
.675
**
-.571
**
-.664
**
-.694
**

unsuccessful -.200 .533


**
-.415
**
-.457
**
.280 .448
**
.554
**

not_dense .779
**
-.842
**
.778
**
.791
**
-.846
**
-.794
**
-.841
**

formal -.707
**
.738
**
-.782
**
-.714
**
.801
**
.635
**
.711
**

hard_to_follow -.573
**
.776
**
-.714
**
-.760
**
.724
**
.737
**
.803
**

descriptive -.407
**
.354
*
-.271 -.450
**
.529
**
.374
**
.363
*

academic -.789
**
.705
**
-.722
**
-.746
**
.888
**
.709
**
.736
**

clear .402
**
-.552
**
.504
**
.555
**
-.506
**
-.574
**
-.669
**

not_focused .378
**
-.163 .169 .099 -.211 -.241 -.159
not_entertaining -.635
**
.793
**
-.737
**
-.676
**
.763
**
.739
**
.777
**

important .036 -.226 .083 -.017 .056 -.168 -.096


not_informative .437
**
-.082 .211 .243 -.348
*
-.239 -.244
useful -.054 -.213 .138 .116 .080 -.127 -.203
old_fashioned .238 -.085 .247 .166 -.266 -.084 -.116
irrelevant -.023 .266 -.240 -.248 .038 .383
**
.211
technical -.776
**
.703
**
-.747
**
-.737
**
.910
**
.642
**
.710
**

impersonal -.798
**
.775
**
-.815
**
-.757
**
.937
**
.713
**
.763
**

detailed -.702
**
.638
**
-.643
**
-.626
**
.845
**
.626
**
.640
**

250
poorly_ comprehens
serious superficial monotonous engaging organized -ible unsuccessful
LD1 -.508
**
.204 -.588
**
.530
**
.151 .386
**
-.101
LD2 -.172 .013 -.265 .139 .124 .180 -.063
LD3 -.154 -.028 -.469
**
.403
**
-.110 .407
**
-.257
LD4 -.383
**
-.057 -.499
**
.452
**
-.064 .368
*
-.364
*

LD5 -.166 -.316


*
-.477
**
.454
**
-.125 .521
**
-.242
PD1 .299
*
.319
*
.846
**
-.890
**
.476
**
-.894
**
.741
**

PD2 .611
**
-.165 .810
**
-.705
**
-.043 -.619
**
.286
phrasal -.210 -.109 -.365
*
.174 .125 .155 -.098
lex_bun .013 -.163 -.044 -.002 -.094 .097 -.166
length -.210 -.173 -.414
**
.463
**
-.005 .328
*
-.265
vocab_high -.401
**
.071 -.678
**
.629
**
-.121 .627
**
-.338
*

vocab_mod .083 -.216 .139 -.130 -.102 .087 .044


vocab_acad .412
**
-.326
*
.356
*
-.232 -.279 -.154 .057
NN .334
*
-.232 .399
**
-.413
**
-.226 -.242 .040
prv_vb -.066 -.072 -.371
*
.390
**
-.177 .381
**
-.261
pres -.129 .191 -.049 -.057 .283 -.077 .169
pdem .094 .128 -.107 -.038 .306
*
.102 .092
emph -.095 .143 -.243 .309
*
.205 .320
*
-.015
pro1 .053 -.004 .132 -.170 .182 -.254 .268
it -.243 .281 -.445
**
.340
*
.125 .270 .059
be_state -.088 -.005 -.139 .071 .133 .167 -.114
pany -.216 .013 -.061 .094 -.011 .035 -.108
amplifr -.183 .403
**
.058 -.148 .236 -.126 .281
pos_mod -.151 .003 -.210 .142 .121 .140 -.042
o_and -.358
*
.034 -.181 .163 -.034 .127 -.086
n .356
*
-.117 .476
**
-.444
**
-.063 -.362
*
.186
prep -.150 .283 -.366
*
.277 .131 .262 .119
adj_attr -.167 -.209 -.242 .292
*
-.005 .342
*
-.153
pasttnse .037 -.230 -.124 .207 -.356
*
.149 -.338
*

pro3 -.347
*
.017 -.556
**
.501
**
.032 .418
**
-.347
*

pub_vb -.197 -.028 -.174 .179 .116 .165 -.070


rel_obj -.249 .215 -.253 .236 .018 .095 .016
rel_subj -.048 .004 -.384
**
.329
*
.011 .365
*
-.169
rel_pipe -.056 .451
**
-.141 .041 .094 .037 .154
n_nom -.067 -.394
**
-.562
**
.580
**
-.348
*
.539
**
-.418
**

tm_adv .130 -.030 .213 -.094 -.241 -.205 -.045


advs -.436
**
.270 -.342
*
.230 .271 .127 .060
inf -.206 -.062 -.586
**
.451
**
-.173 .402
**
-.276

251
poorly_ comprehens
serious superficial monotonous engaging organized -ible unsuccessful
prd_mod -.123 .039 -.209 .164 .179 .036 .148
sua_vb -.167 .159 -.319
*
.234 -.152 .108 -.109
spl_aux -.088 .013 -.085 .020 -.030 .067 -.104
conjncts -.375
**
.282 -.417
**
.325
*
.095 .194 .026
agls_psv .279 .020 .509
**
-.447
**
-.144 -.336
*
.079
sub_othr -.078 -.253 .044 .057 -.083 -.082 .012
vcmp -.021 -.134 -.180 .205 -.059 .115 -.238
downtone .140 .133 .207 -.165 .294
*
-.162 .133
pred_adj .027 -.153 .072 -.117 -.100 .000 -.041
allconj -.369
*
-.095 -.172 .221 -.042 .140 -.105
allwhrel -.151 .348
*
-.422
**
.320
*
.070 .291
*
-.010
allpro -.312
*
.020 -.465
**
.394
**
.113 .283 -.206
have .073 -.049 .050 .025 .001 -.137 .098
vprogrsv -.233 -.083 -.100 .139 -.119 .043 .089
that_rel -.302
*
.206 -.248 .183 .085 .113 -.058
nonf_vth -.107 .059 -.068 .082 -.001 .120 -.152
fact_vth .109 -.223 -.024 .057 -.109 .094 -.198
lkly_vth -.034 .054 -.118 .126 .014 .030 .025
factadvl -.340
*
.165 -.333
*
.252 .177 .143 -.067
lklyadvl .007 .245 .091 -.067 .301
*
-.139 .160
all_jth -.204 .054 -.060 .056 -.139 .036 -.110
all_nth .141 -.193 -.296
*
.260 -.204 .323
*
-.141
all_th -.024 -.154 -.188 .230 -.144 .248 -.261
all_jto -.277 -.322
*
-.380
**
.339
*
-.214 .274 -.417
**

all_to -.360
*
-.012 -.493
**
.471
**
-.142 .367
*
-.215
all_advl -.229 .222 -.229 .133 .247 .026 .038
humann -.032 .300
*
-.120 .089 .192 .098 -.029
prcessn .082 -.323
*
-.225 .187 -.293
*
.302
*
-.173
cognitn .049 -.015 -.271 .298
*
-.024 .351
*
-.104
abstrcn -.321
*
-.088 -.460
**
.407
**
-.006 .322
*
-.073
concrtn .206 -.192 .391
**
-.395
**
-.142 -.214 -.019
tccncrtn .206 -.055 .528
**
-.480
**
.015 -.374
**
.111
placen .046 .039 .136 -.126 -.023 .079 .107
topicj -.178 -.097 -.522
**
.468
**
-.013 .483
**
-.234
actv -.084 .044 -.094 .126 -.192 .195 -.312
*

commv -.429
**
.131 -.412
**
.335
*
.076 .252 -.104

252
poorly_ comprehens
serious superficial monotonous engaging organized -ible unsuccessful
mentalv -.110 -.191 -.232 .185 -.185 .379
**
-.252
aspectv -.189 -.171 -.294
*
.339
*
.022 .254 -.252
ineffective .038 .460
**
.544
**
-.590
**
.543
**
-.661
**
.759
**

unreadable .333
*
.176 .821
**
-.812
**
.300
*
-.867
**
.598
**

unbiased .469
**
-.075 .668
**
-.621
**
-.086 -.520
**
.231
constrained .420
**
-.005 .795
**
-.714
**
.052 -.666
**
.431
**

concrete .466
**
-.390
**
.467
**
-.272 -.268 -.208 -.122
awkward .197 .374
**
.727
**
-.733
**
.414
**
-.770
**
.557
**

explicit .558
**
-.347
*
.461
**
-.287 -.245 -.347
*
-.016
good .032 -.558
**
-.505
**
.607
**
-.651
**
.620
**
-.702
**

opinionated -.533
**
.083 -.730
**
.626
**
.102 .479
**
-.200
dull .380
**
.144 .849
**
-.852
**
.245 -.713
**
.533
**

expressive -.407
**
-.037 -.805
**
.801
**
-.109 .642
**
-.415
**

interactive -.476
**
-.033 -.755
**
.718
**
-.241 .675
**
-.457
**

unemotional .536
**
-.123 .772
**
-.689
**
-.010 -.571
**
.280
unrelatable .476
**
.144 .723
**
-.818
**
.149 -.664
**
.448
**

distant .471
**
.111 .784
**
-.762
**
.262 -.694
**
.554
**

serious 1 -.373
**
.295
*
-.231 -.077 -.155 -.012
superficial -.373
**
1 .097 -.269 .469
**
-.253 .488
**

monotonous .295
*
.097 1 -.848
**
.208 -.729
**
.472
**

engaging -.231 -.269 -.848


**
1 -.326
*
.729
**
-.551
**

poorly_organize ** * ** **
-.077 .469 .208 -.326 1 -.422 .685
d
comprehensible -.155 -.253 -.729
**
.729
**
-.422
**
1 -.723
**

unsuccessful -.012 .488


**
.472
**
-.551
**
.685
**
-.723
**
1
not_dense -.453
**
-.078 -.765
**
.771
**
-.207 .711
**
-.477
**

formal .514
**
-.201 .713
**
-.627
**
-.085 -.578
**
.211
hard_to_follow .323
*
.264 .733
**
-.766
**
.317
*
-.865
**
.635
**

descriptive .449
**
-.213 .360
*
-.265 -.229 -.099 -.148
academic .612
**
-.147 .674
**
-.584
**
-.087 -.520
**
.248
clear -.173 -.316
*
-.597
**
.652
**
-.528
**
.709
**
-.681
**

not_focused -.375
**
.448
**
-.184 .093 .472
**
-.126 .454
**

not_entertaining .493
**
.101 .676
**
-.728
**
.202 -.674
**
.509
**

important .204 -.515


**
-.108 .205 -.419
**
.278 -.404
**

not_informative -.660
**
.423
**
-.080 -.052 .309
*
-.056 .242
useful .153 -.523
**
-.142 .232 -.574
**
.374
**
-.618
**

old_fashioned -.072 .305


*
-.135 .166 .330
*
.018 .151
irrelevant -.054 .545
**
.328
*
-.400
**
.518
**
-.361
*
.429
**

technical .515
**
-.220 .744
**
-.618
**
-.114 -.574
**
.181
impersonal .436
**
-.121 .792
**
-.700
**
.015 -.622
**
.336
*

detailed .644
**
-.373
**
.602
**
-.479
**
-.186 -.421
**
.114

253
not_den hard_to_follo descripti academi not_focuse
se formal w ve c clear d
LD1 .559
**
-.584
**
-.458
**
-.549
**
-.659
**
.256 .314
*

LD2 .194 -.042 -.169 -.252 -.058 .062 .166


LD3 .445
**
-.323
*
-.417
**
-.307
*
-.236 .409
**
.171
LD4 .448
**
-.504
**
-.539
**
-.284 -.607
**
.319
*
-.122
LD5 .551
**
-.466
**
-.494
**
-.221 -.416
**
.386
**
.053
PD1 -.856
**
.637
**
.911
**
.248 .656
**
-.808
**
.041
PD2 -.870
**
.880
**
.728
**
.533
**
.909
**
-.490
**
-.324
*

phrasal .259 -.278 -.229 -.330


*
-.365
*
.140 .001
lex_bun .067 .122 -.079 -.167 .056 .010 -.042
length .516
**
-.395
**
-.360
*
-.218 -.371
*
.291
*
.289
*

vocab_high .697
**
-.560
**
-.678
**
-.406
**
-.649
**
.523
**
.116
vocab_mod .064 -.028 -.070 .026 -.008 .064 -.162
vocab_acad -.364
*
.524
**
.268 .226 .507
**
-.109 -.094
NN -.340
*
.427
**
.289
*
.284 .389
**
-.073 -.394
**

prv_vb .356
*
-.282 -.306
*
-.076 -.153 .306
*
.133
pres .056 .064 .061 -.250 .048 -.065 .337
*

pdem -.011 -.121 .035 -.091 -.048 -.207 .062


emph .452
**
-.313
*
-.400
**
-.384
**
-.377
**
.225 .074
pro1 -.168 .106 .142 -.166 .075 -.107 .109
it .402
**
-.273 -.279 -.277 -.261 .206 .452
**

be_state .056 .040 -.138 -.038 .087 .092 .117


pany .096 -.032 .003 -.038 -.043 .008 -.078
amplifr -.119 .066 .253 -.082 .120 -.225 .289
*

pos_mod .152 -.097 -.182 -.250 -.027 .031 -.010


o_and .192 -.274 -.254 -.409
**
-.437
**
.166 .037
n -.442
**
.442
**
.396
**
.263 .420
**
-.224 -.290
*

prep .248 -.442


**
-.239 -.164 -.339
*
.021 .193
adj_attr .292
*
-.354
*
-.237 -.052 -.162 .097 .108
pasttnse .098 -.131 -.235 .099 -.201 .272 -.395
**

pro3 .536
**
-.483
**
-.525
**
-.288
*
-.620
**
.314
*
.074
pub_vb .370
*
-.173 -.235 -.223 -.179 .197 .191
rel_obj .309
*
-.221 -.018 -.001 -.172 .026 .344
*

rel_subj .235 -.248 -.166 .106 -.116 .298


*
.128
rel_pipe .092 -.202 .059 -.369
*
-.104 -.126 .302
*

n_nom .516
**
-.283 -.597
**
-.138 -.413
**
.580
**
-.104
tm_adv -.247 .381
**
.077 .197 .275 .142 -.232
advs .303
*
-.522
**
-.226 -.250 -.550
**
.053 .332
*

inf .529
**
-.411
**
-.451
**
-.297
*
-.382
**
.472
**
.077

254
not_dense formal hard_to_follow descriptive academic clear not_focused
prd_mod .116 -.022 -.125 -.130 -.042 -.087 .064
sua_vb .114 -.045 -.115 -.113 -.001 .268 -.006
spl_aux .060 .017 -.142 -.194 -.090 .056 .073
conjncts .377
**
-.540
**
-.181 -.357
*
-.424
**
.101 .230
agls_psv -.419
**
.465
**
.296
*
.216 .396
**
-.152 -.238
sub_othr .039 .205 -.087 -.030 .034 .251 -.002
vcmp .138 -.096 -.086 -.090 .109 .120 .108
downtone -.272 .018 .174 .335
*
.175 -.317
*
.221
pred_adj -.105 .175 .166 .106 .209 .009 -.008
allconj .268 -.247 -.291
*
-.311
*
-.441
**
.234 .036
allwhrel .316
*
-.358
*
-.078 -.153 -.197 .126 .385
**

allpro .435
**
-.410
**
-.433
**
-.337
*
-.552
**
.251 .126
have -.187 .050 .163 .050 .062 -.236 -.201
vprogrsv .117 -.141 -.203 -.290
*
-.171 .018 .185
that_rel .327
*
-.222 -.158 -.518
**
-.261 -.024 .028
nonf_vth .129 -.123 -.166 -.110 -.018 .225 .124
fact_vth -.076 .117 -.030 .044 .212 .045 .012
lkly_vth .210 -.207 -.147 -.327
*
-.184 .086 .052
factadvl .331
*
-.402
**
-.212 -.478
**
-.481
**
.059 .182
lklyadvl -.092 .011 .045 -.075 .065 -.099 .436
**

all_jth .079 .028 -.090 -.055 -.121 .163 -.004


all_nth .157 -.206 -.169 -.064 -.021 .172 .023
all_th .191 -.130 -.266 -.171 -.023 .275 .062
all_jto .380
**
-.183 -.435
**
-.273 -.328
*
.327
*
-.006
all_to .490
**
-.383
**
-.479
**
-.411
**
-.395
**
.451
**
.137
all_advl .184 -.244 -.095 -.370
*
-.287 -.010 .463
**

humann .189 -.031 -.091 -.062 -.125 .012 .260


prcessn .142 .062 -.263 -.164 .055 .345
*
-.102
cognitn .346
*
-.235 -.303
*
-.097 -.279 .156 .054
abstrcn .503
**
-.538
**
-.367
*
-.196 -.515
**
.284 .166
concrtn -.373
**
.403
**
.285 .378
**
.374
**
-.150 -.428
**

tccncrtn -.426
**
.252 .425
**
.442
**
.421
**
-.420
**
-.278
placen -.011 .022 -.039 .035 -.027 -.014 -.061
topicj .576
**
-.542
**
-.537
**
-.216 -.550
**
.409
**
-.072
actv .027 -.052 -.188 .051 -.107 .211 -.241
commv .482
**
-.417
**
-.313
*
-.360
*
-.356
*
.323
*
.231

255
not_dense formal hard_to_follow descriptive academic clear not_focused
mentalv .275 -.136 -.356
*
-.060 -.145 .300
*
.019
aspectv .327
*
-.311
*
-.373
**
-.100 -.334
*
.115 -.092
ineffective -.548
**
.260 .602
**
.007 .316
*
-.618
**
.219
unreadable -.856
**
.695
**
.884
**
.354
*
.715
**
-.748
**
-.051
unbiased -.671
**
.692
**
.589
**
.392
**
.710
**
-.425
**
-.276
constrained -.854
**
.832
**
.738
**
.420
**
.813
**
-.475
**
-.172
concrete -.478
**
.615
**
.218 .304
*
.539
**
-.159 -.483
**

awkward -.734
**
.462
**
.770
**
.262 .510
**
-.682
**
-.010
explicit -.565
**
.549
**
.391
**
.663
**
.605
**
-.153 -.387
**

good .456
**
-.182 -.609
**
.065 -.195 .701
**
-.244
opinionated .779
**
-.707
**
-.573
**
-.407
**
-.789
**
.402
**
.378
**

dull -.842
**
.738
**
.776
**
.354
*
.705
**
-.552
**
-.163
expressive .778
**
-.782
**
-.714
**
-.271 -.722
**
.504
**
.169
interactive .791
**
-.714
**
-.760
**
-.450
**
-.746
**
.555
**
.099
unemotional -.846
**
.801
**
.724
**
.529
**
.888
**
-.506
**
-.211
unrelatable -.794
**
.635
**
.737
**
.374
**
.709
**
-.574
**
-.241
distant -.841
**
.711
**
.803
**
.363
*
.736
**
-.669
**
-.159
serious -.453
**
.514
**
.323
*
.449
**
.612
**
-.173 -.375
**

superficial -.078 -.201 .264 -.213 -.147 -.316


*
.448
**

monotonous -.765
**
.713
**
.733
**
.360
*
.674
**
-.597
**
-.184
engaging .771
**
-.627
**
-.766
**
-.265 -.584
**
.652
**
.093
poorly_organized -.207 -.085 .317
*
-.229 -.087 -.528
**
.472
**

comprehensible .711
**
-.578
**
-.865
**
-.099 -.520
**
.709
**
-.126
unsuccessful -.477
**
.211 .635
**
-.148 .248 -.681
**
.454
**

not_dense 1 -.758
**
-.773
**
-.418
**
-.786
**
.611
**
.142
formal -.758
**
1 .612
**
.423
**
.829
**
-.328
*
-.298
*

hard_to_follow -.773
**
.612
**
1 .294
*
.674
**
-.743
**
.066
descriptive -.418
**
.423
**
.294
*
1 .470
**
-.084 -.320
*

academic -.786
**
.829
**
.674
**
.470
**
1 -.412
**
-.261
clear .611
**
-.328
*
-.743
**
-.084 -.412
**
1 -.193
not_focused .142 -.298
*
.066 -.320
*
-.261 -.193 1
not_entertaining -.744
**
.744
**
.775
**
.319
*
.760
**
-.564
**
-.103
important .047 .138 -.235 .314
*
.123 .404
**
-.219
not_informative .293
*
-.309
*
-.038 -.282 -.410
**
.009 .381
**

useful .072 .075 -.287 .413


**
.070 .516
**
-.424
**

old_fashioned .186 -.267 -.017 -.182 -.159 -.058 .325


*

irrelevant -.210 .066 .330


*
.020 .026 -.509
**
.218
technical -.803
**
.826
**
.689
**
.556
**
.872
**
-.427
**
-.299
*

impersonal -.816
**
.801
**
.719
**
.498
**
.872
**
-.517
**
-.206
detailed -.689
**
.717
**
.572
**
.655
**
.749
**
-.383
**
-.411
**

256
not_entertaining important not_informative useful old_fashioned irrelevant
LD1 -.558
**
-.176 .317
*
-.188 .276 .008
LD2 -.163 -.090 .024 -.060 .154 .104
LD3 -.421
**
.022 .153 -.073 .262 -.063
LD4 -.500
**
-.104 .148 -.027 .150 -.007
LD5 -.436
**
.310
*
.039 .156 .007 -.313
*

PD1 .800
**
-.309
*
-.020 -.385
**
-.017 .459
**

PD2 .773
**
.086 -.408
**
.070 -.277 .067
phrasal -.225 -.022 .062 -.079 .050 -.059
lex_bun -.055 .087 -.076 .061 -.046 .060
length -.367
*
.097 .106 .019 .169 -.237
vocab_high -.635
**
.039 .254 -.041 .230 -.105
vocab_mod .005 .248 .013 .139 -.212 -.230
vocab_acad .356
*
.196 -.242 .190 -.359
*
-.286
NN .469
**
.071 -.212 .206 -.509
**
-.214
prv_vb -.234 -.033 .156 .018 .204 -.063
pres .011 -.238 .180 -.226 .213 .132
pdem -.018 -.198 -.111 -.157 .146 .161
emph -.308
*
-.111 .154 -.300
*
.218 .017
pro1 .080 -.125 .230 -.093 .155 .151
it -.284 .104 .088 -.150 .258 .098
be_state -.020 -.081 .038 .053 .264 .060
pany -.074 -.060 .083 .019 .250 -.075
amplifr .093 -.254 .237 -.253 .019 .246
pos_mod -.163 -.239 .061 -.157 .159 .066
o_and -.313
*
-.055 .236 -.027 -.012 -.139
n .541
**
.080 -.189 .099 -.355
*
-.069
prep -.184 -.105 .143 -.222 .291
*
.104
adj_attr -.280 .194 -.057 .130 .047 -.180
pasttnse -.245 .170 -.150 .190 -.279 -.204
pro3 -.541
**
-.120 .099 -.033 .260 .034
pub_vb -.237 .035 .203 -.088 .119 -.041
rel_obj -.085 -.033 .262 -.085 .146 .025
rel_subj -.195 .065 -.037 .114 .037 -.120
rel_pipe .024 -.149 .032 -.280 .421
**
.310
*

n_nom -.383
**
.351
*
.037 .276 -.225 -.402
**

tm_adv .241 .175 -.066 .255 -.114 -.083


advs -.347
*
-.283 .272 -.151 .189 .135
inf -.377
**
.166 .106 .091 -.013 -.228

257
not_entertaining important not_informative useful old_fashioned irrelevant
prd_mod -.054 -.254 .032 -.316
*
.114 .258
sua_vb -.132 .176 -.093 .080 .234 .021
spl_aux -.111 -.161 -.012 .034 -.073 -.019
conjncts -.248 -.320
*
.235 -.233 .303
*
.053
agls_psv .308
*
-.049 -.161 -.023 -.211 .054
sub_othr .003 .186 .046 .088 .071 -.250
vcmp -.158 .011 .045 .021 .114 -.163
downtone .005 -.005 -.126 .034 .362
*
.202
pred_adj .071 .111 -.111 .149 -.181 -.096
allconj -.376
**
.022 .255 .012 .053 -.159
allwhrel -.142 -.059 .086 -.124 .325
*
.118
allpro -.478
**
-.160 .202 -.075 .317
*
.111
have .021 -.107 -.224 -.051 -.096 -.069
vprogrsv -.084 -.099 .188 -.027 .006 -.147
that_rel -.140 -.247 .143 -.141 .094 .092
nonf_vth -.288
*
-.030 .143 -.153 .286 .170
fact_vth -.020 -.016 -.005 .068 -.207 -.210
lkly_vth -.217 -.265 -.009 -.341
*
.235 .053
factadvl -.310
*
-.200 .230 -.140 .054 -.026
lklyadvl -.004 -.069 -.021 -.111 .412
**
.118
all_jth -.033 .098 .121 .274 -.050 -.074
all_nth -.177 .142 -.207 .156 .048 -.359
*

all_th -.312
*
-.048 .034 -.013 .109 -.122
all_jto -.440
**
.198 -.025 .213 -.012 -.114
all_to -.521
**
.093 .217 -.074 .099 -.134
all_advl -.258 -.153 .130 -.202 .358
*
.091
humann -.019 -.122 .024 -.225 .445
**
.301
*

prcessn -.034 .210 -.090 .170 .025 -.280


cognitn -.141 -.037 .001 -.147 .385
**
.296
*

abstrcn -.540
**
.332
*
.178 .021 .188 -.016
concrtn .414
**
.077 -.013 .216 -.457
**
-.058
tccncrtn .345
*
.044 -.108 .088 -.035 .099
placen .026 .045 -.078 -.015 .016 -.074
topicj -.439
**
.205 -.002 .201 -.001 -.225
actv -.097 -.049 .033 .056 -.094 .006
commv -.338
*
.089 .375
**
-.015 .191 -.179

258
not_
entertaining important not_informative useful old_fashioned irrelevant
mentalv -.366
*
.076 .239 .118 -.148 -.092
aspectv -.361
*
-.069 .070 -.023 .305
*
.056
ineffective .453
**
-.350
*
.186 -.567
**
.065 .417
**

unreadable .720
**
-.265 -.110 -.260 -.067 .371
*

unbiased .644
**
-.034 -.201 .002 -.259 .136
constrained .766
**
-.001 -.220 -.017 -.147 .178
concrete .308
*
.218 -.530
**
.214 -.230 -.200
awkward .594
**
-.251 -.005 -.342
*
.068 .547
**

explicit .346
*
.388
**
-.532
**
.294
*
-.298
*
-.088
good -.442
**
.501
**
-.185 .594
**
-.245 -.600
**

opinionated -.635
**
.036 .437
**
-.054 .238 -.023
dull .793
**
-.226 -.082 -.213 -.085 .266
expressive -.737
**
.083 .211 .138 .247 -.240
interactive -.676
**
-.017 .243 .116 .166 -.248
unemotional .763
**
.056 -.348
*
.080 -.266 .038
unrelatable .739
**
-.168 -.239 -.127 -.084 .383
**

distant .777
**
-.096 -.244 -.203 -.116 .211
serious .493
**
.204 -.660
**
.153 -.072 -.054
superficial .101 -.515
**
.423
**
-.523
**
.305
*
.545
**

monotonous .676
**
-.108 -.080 -.142 -.135 .328
*

engaging -.728
**
.205 -.052 .232 .166 -.400
**

poorly_organized .202 -.419


**
.309
*
-.574
**
.330
*
.518
**

comprehensible -.674
**
.278 -.056 .374
**
.018 -.361
*

unsuccessful .509
**
-.404
**
.242 -.618
**
.151 .429
**

not_dense -.744
**
.047 .293
*
.072 .186 -.210
formal .744
**
.138 -.309
*
.075 -.267 .066
hard_to_follow .775
**
-.235 -.038 -.287 -.017 .330
*

descriptive .319
*
.314
*
-.282 .413
**
-.182 .020
academic .760
**
.123 -.410
**
.070 -.159 .026
clear -.564
**
.404
**
.009 .516
**
-.058 -.509
**

not_focused -.103 -.219 .381


**
-.424
**
.325
*
.218
not_entertaining 1 -.103 -.194 -.120 -.164 .148
important -.103 1 -.333
*
.675
**
-.227 -.403
**

not_informative -.194 -.333


*
1 -.294
*
.082 .201
useful -.120 .675
**
-.294
*
1 -.328
*
-.606
**

old_fashioned -.164 -.227 .082 -.328


*
1 .389
**

irrelevant .148 -.403


**
.201 -.606
**
.389
**
1
technical .687
**
.115 -.310
*
.195 -.326
*
-.022
impersonal .760
**
.087 -.308
*
.074 -.268 .050
detailed .655
**
.149 -.448
**
.210 -.380
**
-.062

259
technical impersonal detailed
LD1 -.715
**
-.674
**
-.668
**

LD2 -.137 -.126 -.180


LD3 -.324
*
-.401
**
-.411
**

LD4 -.543
**
-.551
**
-.469
**

LD5 -.471
**
-.415
**
-.323
*

PD1 .656
**
.739
**
.545
**

PD2 .932
**
.933
**
.851
**

phrasal -.369
*
-.310
*
-.382
**

lex_bun .080 -.005 -.020


length -.364
*
-.406
**
-.358
*

vocab_high -.722
**
-.711
**
-.675
**

vocab_mod .047 .073 .150


vocab_acad .494
**
.457
**
.495
**

NN .487
**
.458
**
.503
**

prv_vb -.172 -.250 -.108


pres -.013 .029 -.118
pdem -.092 -.040 -.072
emph -.550
**
-.460
**
-.468
**

pro1 .098 .114 .017


it -.497
**
-.418
**
-.579
**

be_state -.029 -.016 -.059


pany -.019 -.031 -.091
amplifr .086 .110 -.016
pos_mod -.133 -.099 -.153
o_and -.376
**
-.337
*
-.332
*

n .504
**
.486
**
.485
**

prep -.503
**
-.390
**
-.349
*

adj_attr -.273 -.189 -.154


pasttnse -.080 -.166 .004
pro3 -.542
**
-.608
**
-.442
**

pub_vb -.186 -.236 -.329


*

rel_obj -.281 -.206 -.196


rel_subj -.209 -.138 -.098
rel_pipe -.282 -.198 -.269
n_nom -.354
*
-.432
**
-.237
tm_adv .275 .235 .170
advs -.489
**
-.525
**
-.469
**

inf -.437
**
-.458
**
-.362
*

260
technical impersonal detailed
prd_mod -.102 -.116 -.050
sua_vb -.178 -.182 -.366
*

spl_aux .023 -.138 -.089


conjncts -.483
**
-.496
**
-.434
**

agls_psv .459
**
.445
**
.402
**

sub_othr .070 .057 -.015


vcmp .037 -.047 -.106
downtone .209 .178 .068
pred_adj .195 .207 .210
allconj -.387
**
-.351
*
-.324
*

allwhrel -.403
**
-.280 -.293
*

allpro -.475
**
-.522
**
-.411
**

have .157 .141 .275


vprogrsv -.149 -.203 -.100
that_rel -.336
*
-.254 -.399
**

nonf_vth -.090 -.094 -.276


fact_vth .249 .128 .130
lkly_vth -.234 -.248 -.168
factadvl -.490
**
-.486
**
-.478
**

lklyadvl .040 -.034 -.186


all_jth -.012 -.059 .011
all_nth -.142 -.133 -.116
all_th -.055 -.147 -.144
all_jto -.239 -.291
*
-.239
all_to -.395
**
-.454
**
-.441
**

all_advl -.367
*
-.372
**
-.462
**

humann -.166 -.228 -.272


prcessn -.118 -.024 -.075
cognitn -.348
*
-.341
*
-.133
abstrcn -.567
**
-.455
**
-.437
**

concrtn .481
**
.451
**
.522
**

tccncrtn .478
**
.431
**
.428
**

placen -.029 .044 -.036


topicj -.565
**
-.561
**
-.475
**

actv -.042 -.032 -.037


commv -.345
*
-.421
**
-.461
**

261
technical impersonal detailed
mentalv -.111 -.178 -.131
aspectv -.385
**
-.347
*
-.310
*

ineffective .275 .348


*
.203
unreadable .743
**
.809
**
.639
**

unbiased .749
**
.754
**
.686
**

constrained .835
**
.841
**
.689
**

concrete .606
**
.609
**
.574
**

awkward .547
**
.556
**
.415
**

explicit .670
**
.628
**
.684
**

good -.151 -.268 -.084


opinionated -.776
**
-.798
**
-.702
**

dull .703
**
.775
**
.638
**

expressive -.747
**
-.815
**
-.643
**

interactive -.737
**
-.757
**
-.626
**

unemotional .910
**
.937
**
.845
**

unrelatable .642
**
.713
**
.626
**

distant .710
**
.763
**
.640
**

serious .515
**
.436
**
.644
**

superficial -.220 -.121 -.373


**

monotonous .744
**
.792
**
.602
**

engaging -.618
**
-.700
**
-.479
**

poorly_organized -.114 .015 -.186


comprehensible -.574
**
-.622
**
-.421
**

unsuccessful .181 .336


*
.114
not_dense -.803
**
-.816
**
-.689
**

formal .826
**
.801
**
.717
**

hard_to_follow .689
**
.719
**
.572
**

descriptive .556
**
.498
**
.655
**

academic .872
**
.872
**
.749
**

clear -.427
**
-.517
**
-.383
**

not_focused -.299
*
-.206 -.411
**

not_entertaining .687
**
.760
**
.655
**

important .115 .087 .149


not_informative -.310
*
-.308
*
-.448
**

useful .195 .074 .210


old_fashioned -.326
*
-.268 -.380
**

irrelevant -.022 .050 -.062


technical 1 .879
**
.839
**

impersonal .879
**
1 .802
**

** **
detailed .839 .802 1

262

Você também pode gostar