A Computerized Statistical Methodology For Linguistic Geography - A Pilot Study

A COMPUTER1ZED STATISTICAL METHODOLOGY
FOR LINGUISTIC GEOGRAPHY:

A PILOT STUDY*
CHARLES L. HOUCK
1. INTRODUCTION
The need for a statistical methodology for analyzing linguistic data, is,
I believe, vital when these data are a function of either geographical or
sociolinguistic factors. In analyzing these data one is called upon to
draw inferences from a large nurnber of linguistic phenomena from a
large set of informants. Wide experience in other behavioral sciences
has shown, however, that if objective inferenees are to be obtained,
quantification of these linguistic events is necessary. And since linguistic
phenomena of this kind usually do not constitute measurement which is
fully reproducible s it is in the physical sciences, and are generally
subject to error, it is best analyzed statistically. Statistics provides,
first of all, empirically tested formulae for drawing accurate inferences
about the differences or similarities of a given population from a sample,
even though any particular set of data may be very inaccurate. And
secondly, these statistical formulae provide an estimate of the degree of
error involved in making these inferences. Lack of this estimate of error
usually renders Statements about a population from a sample unreliable.1
Although David W. Reed and John L. Spxcer proposed s early s
1952 correlation methods which provided a rigorous means by which
* An earlier Version of this paper was presented at the meeting of the Midwest
Modern Language Association in Chicago on 8 May 1965. I am especially indebted
to John W. Bowers, Associate Professor of Speech, University of Iowa, for his numerous editorial suggestions and help with the statistical design and analysis. I also wish
to express my thanks to Robert Howren, Jr., Associate Professor of English, University
of Iowa, Pavle Ivic, Faculty of Philosophy, Novi Sad University, Yugoslavia, and
Roger Shuy, Assistant Professor of English, Michigan State University, for the advice,
comments, and encouragement they have given me in the pursuit of this study.
1
The statistical concepts expressed in this introduction were obtained from the
introductory material in T. G. Connolly and W. Sluckin, Statistics for the social
sciences (London, 1962), 1-3; George A. Ferguson, Statistical analysis in psychology
andeducation (New York, 1959), 1-12; and Sidney Siegel, Nonparametric Statistics for
the behavioral sciences (New York, 1965), 1-5.
A COMPUTERIZED STATISTICAL METHODOLOGY
81
linguistic geographers could test their data for significant relationships

and differences,2 linguistic geographers in general have failed to apply
these correlation methods or other statistical procedures which can
test whether the differences or similarities obtained were due to chance.
Using the Reed and Spicer study s a point of departure, I will propose
and describe a computerized statistical methodology s it was used in a
study of dialects in Johnson County, Iowa.
The correlation method proposed and used in the Reed and Spicer
article was the phi coefficient with the chi-square test for significance.
In addition to the correlation method, Reed and Spicer also used Wilhelm
Milke's method of cartographic representation of correlation coefficients with the phi coefficient s input.3
In their study, Reed and Spicer used data obtained from the three
Tables compiled by Alva L. Davis and Raven L McDavid, Jr.,4 which
showed "the distribution respectively of thirty-nine items of vocabulary,
ten items of pronunciation, and seven items of grammar, among ten
informants two each in five communities in northwestern Ohio.
Most of these items [i.e. test questions] have several variants [i.e. multiple
response items]; only the pronunciation was not organized directly to
show simple presence or absence of all variants in each item. For statistical purposes, each variant of an item in the vocabulary and grammar
tables was considered to be a separate item [i.e. a separate response
item], thus yielding ninety-one vocabulary items and fourteen grammar
items. The pronunciation table was reorganized to indicate simple presence or absence of each pronunciation characteristic, with a resultant
total of forty-three pronunciation items. After this preliminary organization of the Davis-McDavid material, it was subjected to the normalized phi coefficient with the chi-square test for significance."5
The results of their study, in spite of its obvious limitations of sample
size and test completeness, were impressive, for the method was able
to delineate the dialect patterns of the area concisely and convincingly.
Unfortunately, the implied promise by the authors of future application
of their correlation methods to linguistic atlas materials has not been
fulfilled, to nay knowledge, in print.61 believe it is safe to say that, on the
2
"Correlation methods of comparing idiolects in a transition area", Language, 28

(1952), 348-359.
3
"The quantitative distribution of cultural similarities and their cartographic
representation", American Anthropologist, 51 (1949), 127-152.
4
"Northwestern Ohio: a transition area", Language, 26 (1950), 264-273.
6
Reed and Spicer, Language, 28, 351.
6
Reed and Spicer, Language, 28, 359.
82
CHARLES L. HOUCK
basis of published linguistic atlas materials, the level of statistical sophistication in the field of linguistic geography has remained low and static
since the appearance of their article in 1952.7
The purpose of this pilot study is fivefold: (1) It will apply the concept
of density s an attempt to obtain statistical data which largely eliminates
the chance factor. It will do this by providing the phi coefficient and the
tetrachoric correlation with input which is derived, first, from a larger
Informant sample concentrated in a smaller geographical area, and second
from a much larger sample of lexical test questions and response items.
(2) It will explore the use of fourfold correlation analyses in which the
correlation coefficients are used not only terminally s in Reed and
Spicer,8 but also instrumentally s input for factor analyses which
attempt to determine from these intercorrelations whether the "Variation
represented can be accounted for adequately by a number of basic
categories smaller than that with which the investigation was started".9
(3) 1t will provide a simple frequency count analysis which, first, provides
a basis for the rejection or retention of test questions and response items
in the questionnaire, and, second, provides data in an accessible form so
that a statistical test for differences can be executed on the response to
dialect lexical items and a given geographical area can be classified
dialectally. (4) It will provide Computer programs which can process
a large amount of linguistic data for correlation, count, and factor
analyses. (5) It will report Substantive findings of the pilot study.
2. METHOD
The following sections will describe the methodology used in this study.
2.1 GEOGRAPHICAL AREA. Since the orientation of this study is primarily methodological, no attempt was made to pick a county which
would have important dialectal findings. Johnson County, Iowa, however, falls within the Davenport-Cedar Rapids-Dubuque triangle of
Iowa which is described by Harold B. Allen, s showing, in his view,
strong Northern elements although the contrasts to Midland features
are not s strong s they are at the major boundary.10 The resultant
7
See Glenna Ruth Pickford, "American linguistic geography: a sociological appraisal", Word, 12 (1956), 211-233, for a comment on linguistic geography methodology which is still apropos. For a more recent comment, see Charles A. Ferguson,
Social science research counc, vol. 19, no. l (1965).
8
Reed and Spicer, Language, 28, 348-359.
0
Benjamin Fruchter, Introduction factor analysis (Princeton, New Jersey, 1954), l.
10
Harold B. Allen, "The primary dialect areas of the upper midwest", in Harold
B. Allen (ed), Readings in applied English lingmstics (New York, 1964), 233 and 241.
A COMPUTERIZED STAT1STICAL METHODOLOGY
83
findings by the proposed methodology should point to a rejection or

retention of Allen's findings in reference only, of course, to lexical usage.
2.2. DENSITY. Density in this study was deiined with the township s
the minimal geographical unit rather than the more commonly used
county. In practice, however, this definition turned out to be unfeasible,
and a more realistic unit was devised before analysis took place: namely,
by dividing the county arbitrarily into five sections with four to six adjacent townships each, except for the two townships which contained Iowa
City. In each section five to seven informants were questioned.11
2.3. INFORMANTS. Thirty-two informants were used in the study.
In general, the informants were selected if they met the following criteria:
(1) that they were born and reared in Johnson County or were life-long
residents of the county (i.e. had lived in the county since they were five
years old or less); and (2) that they were sixty-five years old or l der. Two
informants who failed to meet these criteria were kept in the study because both were native Iowans, and their idiolects could be compared
with those native to the county. The mean age was 72 with the oldest
Informant being 87. The one Informant who failed to meet the age
criterion was 38 years old. Six of the thirty-two informants were women.
In education they ranged from people who had had only four years at
school to holders of Professional and post-graduate College degrees. By occupation they included housewives, farmers, a county extension officer, a
carpenter, a retired lawyer and large landowner, a pharmacist, a machine
shop operator-owner, a streetcar conductor and bodyshop operatorowner, and a bulk-oil dealer.
2.4. THE LEXICAL QUESTIONNAIRE. The lexical questionnaire was
compiled from the Iowa Atlas checklist and workbook. Hans Kurath's
Word Geography,1* and Robert Howren's Ocracoke checklist13 were
also consulted for relevant lexical response items. The resultant form
contained two hundred and thirty lexical test questions composed of
one thousand and eighty lexical response items. The test questions covered the following categories: (1) time; (2) weather; (3) household;
(4) farmstead; (5) farming; (6) farm animal terms; (7) farm animal
11
The township was discarded s the basic geographical unit because it was too small
a unit, especially in the relatively sparsely settled areas, to provide enough informants
meeting the requirements set forth in 2.3; moreover, farmers, at least in Johnson
County, many times move from township to township in quest of better farms and
living conditions, or simply to town to retire, but remain in the county, and, most
of the time, in the same sections of the county s devised for this study.
12
A wordgeography ofthe eastern United States (Ann Arbor, U. of Michigan, 1949).
13
"The speech of Ocracoke, North Carolina", American Speech, 37 (1962), 163-175.
A copy of the questionnaire was also made available to me by Robert Howren.
84
CHARLES L. HOUCK
sounds; (8) calls to farm animals; (9) landscape; (10) fishing; (11) roads;
(12) food; (13) nature; (14) kinship terms (primarily parental); (15)
idioms; (16) childhood terms for playthings and games; and (17) miscellaneous. In Table 3 is a sample of fifty-four key questions, their
respective response items, and the frequency with which each was chosen.
The questionnaires were distributed in person to help insure the high
return necessary for a methodological pilot study. The informants were
provided with a stamped envelope for the return of the questionnaire.
2.5. THE PHI COEFFICIENT WITH THE CHI-SQUARE TEST. The phi coefficient, or fourfold point correlation, measures, like other tests I will
describe later, what statistical relationships exist among informants on
the cnterion oflexical similarity. It assumes that a given lexical response
item is either present or absent in a given idiolect. Given a phi coefficient, one can determine by referring to appropriate theoretical distributions the likelihood of the apparent relationships having occurred by
chance. Such a test is the chi-square test for significance. By referring
to a chi-square table for the critical value required for significance at an
accepted significance level for the appropriate degrees of freedom, one
can determine whether the values for the differences between the observed
and the expected frequencies are significant and cannot reasonably be
explained by sampling fluctuation or chance.14 The phi coefficient i s
used here to provide input for Guttman5 s Radex Analysis15 and the
cluster analysis.
2.6 THE TETRACHORIC CORRELATION. The tetrachoric correlation is
also a fourfold correlation which treats the dichotomy, presence and
absence, s though it is on a continuum; i.e. sometimes present, sometimes
absent, depending, e.g. on the Speech Situation.16 The tetrachoric is
used here primarily to provide input for the multiple factor analysis.
2.7. FACTOR ANALYSIS. Three kinds of factor analysis are employed
in this study: (1) Guttman's Radex approach to factor analysis;17
(2) a multiple factor analysis Computer program assembled by Professor
Harold Bechtoldt, Department of Psychology, University of Iowa;18 and
14
J. P. Guilford, Fundamental statistics in psychology and education, 3d ed, (New

York, 1965), 311-316. See also George A. Ferguson, Statistical analysis in psychology
andedcation (New York, 1959), 158-77.
15
L. Guttman, "A new approach to factor analysis; the radex", in P. Lazarusfeld
(ed.), Mathematical thinking in the social sciences (Glencoe, Illinois, 1954), 258-348.
16
Guilford, . c//., 305-311.
17
Guttman, loc. cit.9 258-348; Also "A generalized simplex for factor analysis",
Psychometrika, 20 (1955), 173-191.
18
A program write-up (and a description of the type of factor analysis used) is
85
(3) Robert C. Tryon's cluster analysis.19 All three of these factor analyses
provide "a mathematical model which can be used to describe certain
areas [of linguistic behavior such s the use of lexical items]. A series ...
of measures [e.g. responses to lexical items in a questionnaire] are
intercorrelated to determine the number of dimensions the test space
occupies, and to identify these dimensions in terms [of Jmguistic or
socio-geographical categories]. The interpretations are done by observing which tests fall on a given dimension and inferring what these
tests have in common [e.g. geography, occupation, age, sex, or education]
that is absent from tests not falling on the dimension. Tests correlate
to the extent that they measure common traits ... [Responses to a checklist questionnaire or to a fieldworker can be studied] to detect possible
common sources of Variation or variance, [or factors; and factors
represent] the fundamental underlying sources of Variation operating in a
given set of scores or other data observed under a specified set of conditions."20
2.8. FREQUENCY COUNT OF ITEMS ACROSS INFORMANTS. The purpose
of the frequency count is to provide a tabulation of response items
across informants, so that the total number of responses to particular
response items in the questionnaire can be readily determined and analyzed. This is important for editorial purposes, for the count can determine meaningfulness of response items in the questionnaire for a particular geographical area. The frequency count also provides input for
the Mest.21
2.9. THE Z-TEST FOR THE DIFFERENCE BETWEEN TWO MEANS. The /-test
determines whether an apparent difference between two means can
easily be accounted for by chance.22 In this study, it will be used to
determine whether Johnson County natives employ Northern lexical
items significantly more often than they use Midland lexical items.
available upon request from the State Univeristy of Iowa Computer Center, Iowa
City, Iowa 52240.
19
R. C. Tryon, Cluster analysis: correlation profile and orthometric {factor) analysis for
the isolation ofunities in mind andpersonality (Ann Arbor, 1939), especially 41-48.
20
Fruchter, op. dt., 2-4 (see fn. 9).
21
The frequency count analysis has been expanded to three types. The primary
addition is the tabulation and percentage of response items across Informant profile
which includes sex, age, education, and occupation. The program identifies the
profile, totals the number of informants who belong to each profile, and indicates how
each profile responds in toto to each lexical item in the questionnaire. This Output can
then be fed into a Type l analysis of variance which tests whether each profile differs
significantly in relation to each lexical response item.
22
George A. Ferguson, op. c/ , 126-128 (see fn. 1).
86
CHARLES L. HOUCK
2.10. THE COMPUTER. The study was designed to make fll use of
the Computer for two reasons: (1) accuracy, for correlation and factor
analysis studies entail a great amount of intricate mathematical computation and counting which, by their very nature, are greatly error prone
when done humanly; and (2) efficiency, for, since there is a great amount
of mathematical computation and counting, the computor saves time.
It is, of course, in this area that a Computer provides the linguistic
geographer with his greatest boon, for it allows him to increase his informant sample for more reliable results. In this study, for example, the
estimated time for manual computation of a 32 X 32 phi coefficient and
tetrachoric matrix was more than one thousand hours. The estimated
time for programming, keypunching, and eliminating program errors
(fide-bugging') is around one hundred hours. Although the saving of
time here i s large, the real saving comes when data from a new study
are to be analyzed, for all that remains is the preparation of the data
a minor part of the process.
In this study, then, a Computer program was used for each type of
analysis except for the -test and the cluster analysis. A Computer
program for the cluster analysis is now operational.23
The following sequence was used for analysis on the Computer: (1) The
data was readied on Computer data cards. (2) The frequency count
program was then run. This program not only provided the necessary input for the Mest, but also provided automatically another input
deck for the phi and tetrachoric program in which all the response items
that none of the informants responded to were deleted. This was done so
that the *D' cell of the fourfold contingency table for the two correlation
computations would not be inflated, thus providing greater correlation
discrimination. (3) The phi and tetrachoric program was then run. This
program also automatically provided an input deck in the form of a
symmetrical tetrachoric intercorrelation matrix for the multiple factor
analysis program. (4) The multiple factor analysis was run in two stages:
(a) exploratory; and (b) confirmatory.
23
The use of the Computer was first made possible through the interest of Garry A.
Flint, a Computer programmer at the Indiana University Computing Research Center
in the summer of 1964. He was responsible for the phi and tetrachoric program used
in this pilot study, I am also indebted to him for his help in learning the basics of
Computer programming. Since the completion of this pilot study I have expanded the
analyses and have increased the data processing capacity of the various Computer
programs through the generous help of the University of Iowa Computer Center.
This expanded methodology has been applied to the Iowa Atlas checklist materials,
and the results will appear in a monograph by Robert Howren, Jr. and myself, to
be published by the Iowa State University Press, Ames, Iowa. The complete methodology will also be described in my doctoral dissertation.
87
3. RESULTS AND DiSCUSSION
The overall results were encouraging, for the degree of density provided
highly reliable data input for the phi, tetrachoric, and count analyses.
The phi and tetrachoric intercorrelation matrices consistently showed
middle to relatively high but homogeneous intercorrelations, indicating
perhaps dialectal homogeneity, while at the same time revealing idiolectal
discrimination. The rnge for the phi coefficient intercorrelations was .05
to .60; the rnge for the tetrachoric intercorrelations was .09 to .83.
All the intercorrelations except four were significant (If 2 > 6.64, df =
l,p ^.01); i.e. if chi-square is greater thaii 6.64 at one degree of freedom,
the probability is that fewer than one intercorrelation out of 100 would
be due to chance, A randomly selected sample of phi (with their 2
values) and tetrachoric intercorrelations is shown in Table 1.
TABLE l
A phi coefficient and tetrachoric intercorrelation matrix of randomly
selected informants from the five county-sections of Johnson County
Informants
2
7
18
23
28
2
1.00
.50
.73*
176.59**
.49
.72
168.80
.47
.70
118.42
.47
.70
152.48
18
23
28
LOO
.72
.71
161.72
.45
.68
143.04
.46
.69
146.52
1.00
.39
.60
104.63
.42
.64
123.04
1.00
.37
.58
94.43
1.00
* Tetrachoric intercorrelations.
** Chi-square values.
The four non-significant intercorrelations were caused by one informant who also showed marked deviation from the rest of the informants,
even though he correlated significantly with them in some respects. No
explanation can be offered for this deviation, for there is nothing in his
biographical data which would indicate even a post hoc explanation for
the deviation. On the criteria set up for the selection of informants in the
Linguistic Atlas of the United States and Canada, he would have been an
ideal Informant: he was a native and life-long resident of Johnson
CHARLES L. HOUCK
r
County, Iowa; he was 71 years old; he was a farmer who owned his own
farm; and he had only four years of education. I believe this case of
deviance points up rather concretely the need to exercise care in assuming
that an Informant who meets the Informant selection criteria of the
Linguistic Atlas of the United States and Canada necessarily represents
the norm of his geographical area, and to note that he may in fact
contribute spurious data to a survey.
TABLE 2
The Guttman 'quasi-simplex covariance structure*
Informants
2
7
18
28
23
2
1.00
.50
.73*
176.59**
.49
,72
168.80
.47
.70
152.48
.47
.70
118.42
18
28
23
1.00
.47
.71
161.72
.46
.69
146.52
.45
.68
143.04
1.00
.42
.64
123.04
.39
.60
104.63
1.00
.37
.58
94.43
1.00
* Tetrachoric intercorrelations.
** Chi-square values.
The results gained by using factor analysis were just s encouraging.

The patterning exhibited by the intercorrelation matrix in Table 2
demonstrates the Guttman Quasi-simplex Covariance Structure.24 That
is, the diagonal of the matrix shows a non-equidistant ranking from
high to low. The same phenomenon also occurs in each column. The
theoretical concept underlying the ranking in the matrix is that of
'complexity'. 'Complexity' according to Guttman25 is that factor of
greater inter-individual difference which is hypothesized in this study
s language behavior in relation to some geographical area. Language
24
Guttman, loc. dt. (1954), 258-348 (see fn. 15).

25
Guttman, loc. cit. (1954), 258-348; Psychometrika, 20, (1955), 173-191; "Empirical
verification of the radex structure of mental abilities and personality traits", Educ.
Psychol Measm., 17 (1957), 291-407; "What lies ahead for factor analysis?", Educ.
Psychol. Measm., 18 (1958), 497-515.
89
behavior s it varies among idiolects can be conceived here in terms of

uniqueness and ranked accordingly: from a more simple, i.e. homogeneous, to a rnore complex, i.e, unique, dimension. This is to say
that each idiolect is ranked in terms of the number of unique items i t contains in relation to the other idiolects: the lower the number of unique
terms, the higher the intercorrelation.26 Therefore, in Table 2, idiolect 2
has fewer unique items in relation to idiolect 7 than to other idiolects
in the sample; thus, it correlates more strongly with 7 than the other
idiolects in this matrix. This Interpretation is reinforced by the independent tetrachoric correlation computation whose intercorrelations are
marked with an asterisk in Table 2.
However, the hypothesis that this uniqueness demonstrated by the
Guttman Quasi-simplex Covariance Structure among the Johnson
County idiolects stems from geographical location is to be rejected, because neither the multiple factor analysis, the cluster analysis, nor the
Mest supported such a hypothesis.
The multiple factor analysis loaded all thirty-two informants into one
factor and by-passed the estimate-of-factor-loadings Step because it
could not meet the significance criterion of t wo factors. This one factor
was confirmed when the estimate-of-factor-loadings Step was programmed
to run on those factor-loadings which contained the largest amount of
variance in common. This step rejected all loadings, thus confirmimg
that none of the informants' intercorrelations could be used s a criterion
for establishing more than one factor. These results do not indicate, then,
that the informants* lexical behavior in Johnston County is anything but
homogeneous. No significant differences occurred due to geography,
occupation, education, age, or sex.
The cluster analysis also justified retaining an assumption of homogeneity, for the two highest intercorrelations in the matrix failed to reach
the ratio (2.00) of the average intercorrelations of the variables in a cluster
to their average correlation with the variables not included in the cluster.
In the -test, no significant difference (t = .14, df= 32, p ^ .01) was
found between the incidence-means of Northern and Midland responses.
This again supported the assumption that Johnson County is lexically
a homogeneous dialect area.
The most important revelation of the frequency-count analysis was
the large number of response-items to which no informant responded.
26
This Interpretation was offered to me by Bishwa Nath Mukherjee, formerly on

the Psychology faculty at Indiana University and now at Jakkanpur, Patna-1, Behar
State, India.
90
CHARLES L. HOUCK
t
There were three hundred and forty-two of these. All informants responded to only ten response-items. In terms of the above correlational
analyses, the informants were correlated over 738 response-items rather
than the 1,080 items of the original list. This type of Information is
important for editorial purposes. A questionnaire of 1,080 lexical
response items presents a formidable task for many informants; thus,
a frequency count analysis which indicates that three hundred and
forty-two of these one thousand and eighty response items were responded to by none of the informants means that this questionnaire
contained considerable excess baggage. Since studies in other social
sciences show that short questionnaires obtained a better return percentage than long ones, it seems almost mandatory that those three hundred
and forty-two response items be deleted in this case.
4. SUBSTANTIVE RESULTS
As clearly indicated by the representative sample of key lexical test

questions and their respective response-items in Table 3,27 the frequency
distribution of the response items is predominately leptokurtic; i.e.
one response-item was generally chosen overwheimingly more frequently
than the other response-items belonging to the same set of responseitems. If this distribution were graphically represented, the curve
marking the central location would be highly peaked. Test questions
4,9,10, 11, 23,33,49,53, are obvious examples of leptokurtic distribution.
These test questions also show that definite dialect mixture occurs between different sets of response-items since in each instance the responseitem picked within a particular set is definitely either Northern or
Midland. In test questions 1,2, 17, 25, 30, 34, 42, 43, the response pat27
The sample of fifty-four key lexical test questions and their respective response
items in Table 3 were chosen on the basis of the findings in H. Kurath's A word geography of the eastern United States, Roger W. Shuy's monograph, The northernmidland dialect boundary in Illinois ( Publication of the American Dialect Society,
no. 38) (U. of Alabama, 1962), and the previously cited Davis and McDavid article.
These lexical response items consistently showed dialectal Variation s a function of
geography. Words marked N, M, SM, and S are Northern, Midland, South Midland,
and Southern respectively. This dialectal classification is not, however, absolute,
for the drawing of isoglosses tends to be more of an art than a science, and dialectal
overlap is the rule rather than the exception; but there is, for the most part, general
agreement that the classified response items in Table 3 represent that particular
dialect area. The unclassified response items are either nondiscriminate or more
restricted in relation to dialect areas. The frequency with which each response item
was chosen was compiled by the frequency count program.
91
lern shows dialect mixture within a set of response-items since the modal
distribution is not so extreme between Northern and Midland reponseitems, and, in some instances, almost bi-modal, s in 14 and 2L this
sample, only test questions 47 and 51 show true bi-modal distribution.
These results show, then, that, while dialectal mixture occurs, there is
no central tendency for Northern or Midland response items to occur
overall more frequently, thus indicating once again dialectal homogeneity.
TABLE 3
* J * ~-**-^
mf
representative sample of key lexical test-questions and their respec ve

response-items
1. IT is
FIVE:
quarter of (N)
quarter to (N)
quarter till (M)
2. sunrise
sun-up
20*
5
11
16*
15
3. CHANNEL FOR RAINWATER ON

EDGE OF ROOFI
eavetroughs (N)
eavestrough (N)
gutters (M,S)
spouts (M)
13*
8
11
coil
pile
7. POLE TO STEER AND PULL WAGON:
neap (N)
tongue
pole (N)
spear
8. TWIN POLES OF BUGGY
shafts (M)
shavs
th ls (N)
4. BUILDING FOR CORN
31*
0
0
4
corn crib (N)

corn barn
corn house (M)
crib (N)
5. LARGE OBLONG STACK OF HAY
hayrick (M)
2
haymow
0
0
Dutch cap
barrack (N)
0
haystack
30*
6. SMALL STACK FOR DRYING HAY
IN FIELD:
haycock (N)
19*
tumble (N)
0
doodle (M)
0
heap (N)
0
cock
4
Modal response
l
5
drafts
0
31*
l
0
28*
5
l
0
0
9. PIVOTED CROSSBAR FOR ONE HORSE:
whiffletree (N)
whtpplefree (N)
swingletree
singletree (M)
10. PIVOTED CROSSBAR FOR TWO
HORSES:
evener (N)
doubletree
spreader
double singletree
\ l.
0
0
l
31*
0
31*
0
l
WOOD IN WAGON:
hauling (M)
drawing (N)
carting (N)
teaming
32*
0
0
0
92
CHARLES L. HOUCK
by name (N)
(N)
12. IMPLEMENT FOR BREAKING CLODS

AFTER PLOWING :
drag (N)
harrow
13. SETTING HEN:
duck (M)
duck hen
setting hen
hatching hen
brooder
0
32*
21.
horse (N)
9*
8
/z<?r,y (M)
lead horse
3
leader
l
wheel horse
0
saddle horse
l
//e Aor^e
15. SOUND MADE BY CALF AT FEEDING
TIME:
blat (N)
l
blare
0
2
29*
1 6. CALL TO CALVES :
24*
3
3
l
l
14*
0
0
10
18. SOUND A HORSE MAKES DURING

FEEDING TIME :
whinny
whinner
nicker (M)
whicker
/*/-/?
12*
9
9
0
4
19. CALL TO HORSES TO BRING THEM

IN FROM P ASTURE !
co-jack
kwope
kope (M)
come up
0
4
8
0
CALL TO SHEEP TO BRING THEM IN

FROM PASTURE I
coo-sheep
coo-nannie
coo-nan
kudack
kuday
fe (M)
7
0
27*
0
l
14. HORSE ON THE LEFT:
sook, calf(M)
sook, sook (M)
come calfy
come, boss
wo special call
17. CALL cows:
co9 boss (N)
saw
madam
()
20.
10
13*
LARGE OPEN METAL VESSEL FOR

WATER, MILK, ETC. :
pal l (N)
b cket (M)
22.
6
25*
FENCE MADE OF SLATS:
picket fence (N)

paling fence. (M, S)
apple fence
7 fence
garden fence
shingle garden fence
24. SPLIT WOOD FENCE :
rall fence (N)
worin fence (M)
herring-bone fence
snake fence (M)
sta&e <2/z ? r/rfer fence (M)
straight rail
25.
16
20*
GARBAGE FED TO HOGSI
swill (N)
slop (M)
23.
5
l
0
0
0
19*
25*
5
0
l
0
0
28*
0
0
0
3
0
LOAD IN BOTH ARMS!
tarw
0
armful (N)
14
armload (M)
19*
26. "A"-FRAME SUPPORT USED BY
CARPENTERS:
trestle (M)
7
sawhorse
22*
horse
2
sawbuck
l
27.
IRON UTENSIL FOR FRYING I
-?///*/ (M)
(N)
28. PAPER CONTAINER:

paper sack
paper bag
27*
0
8
22*
7
A COMPUTER IZED STATISTICAL METHODOLOG
poke (M)
sack
bag
l
5
\
29. HEMP OR BURLAP CONTAINER!
burlap sack
burlap bag
gunny sack (M)
polato sack
gramsack
8
3
25*
0
4
30. WALL OF LOOSE STONEI
stone wall (N)

rock fence (M)
rock wall (S)
20*
7
5
31. SMALL WIND INSTRUMENT PLAYED

WITH THE MOUTH:
harmonica
mouth organ (N)
french harp (SM)
breath harp
mouth harp
harp
juice harp
jew's harp
11
19*
l
0
6
0
0
l
32. VESSEL TO CARRY COAL:
coal hod (N)

scuttle (N)
coal pail
coal b cket (M)
33. PETROLEUM PRODUCT BURNED
LAMPS:
coal oll (M)

kerosene
lamp
4
5
2
23*
IN
0
31*
34. A TIED, FILLED BEDCOVER:
tied quilt
0
comforter (N)
19*
comfort (M)
11
comfortable (N)
l
/H f
0
35. A SMALL FRESH BODY OF RUNNING
WATER:
creek
28*
stream
4
prong
0
/ ()
fork
0
brauch (M)
2
(N)
3
rindet
riverlet
glitter
93
0
0
0
36. BREAD MADE OF CORN MEAL IN

LARGE CAKES:
corn bread
johnny cake (N)
cornpone (M)
/X?77
29*
2
l
0
37. SMALL RING-SHAPED CAKE MADE

WITH CAKE DOUGH:
doughnui
30*
fried-cake (N)
2
cruller (M)
l
fat-cake (M)
0
38. SIDE MEAT OF HOGS, SALTED, NOT
SMOKED:
side pork (N)

side meat (M)
sowbelly
fatback
oe//y w^^r
streak-o-lean
39. THICK, SOUR MILK :
curled milk
bonny-clabber (N)
lobbered milk
thick milk (M)
clabber (M)
loppered milk
clabbered milk (M)
clabber milk
8
19*
3
0
0
0
17*
l
0
l
5
l
6
4
40. A LOOSE, WHITE, LUMPY CHEESE:
pot cheese (N)

Dutch cheese (N)
smear cheese
C/Y/ (SM)
smearcase (M)
clabber cheese
sourmilk cheese (N)
cwrrf cheese
cottage cheese
l
2
l
0
12
l
0
0
25*
41. FOOD BATEN BETWEEN MEALS:
Wte (N)
snack(M,S)
/7/ece (M)
lunch
4
15*
5
8
94
42.
CHARLES L. HOUCK
seed(M)
pit (N)
stone
kernel
heart
43.
16
19*
l
0
0
CENTER OF A PEACH :
stone (N)
seed(M)
pit (N)
44.
ear-sewer
CENTER OF A CHERRY:
14
16*
5
GREEN OUTER COVER OF WALNUT I
hll (M)
shuck (N)
25*
2
AWJ:
Shell
45.
- BEANS:
shell (N)
/m// (M)
shuck
3
19*
9
0
6
46.
KIND OF WORM I
earthworm (N)
angleworm (N)
zY worm
mudworm
redworm (M)
fishworm (M)
fishing worm (M)
eelworm
rainworm
eaceworm
47. LARGE WENGED INSECT SEEN
AROUND WATER:
darning needle (N)
deviVs darning needle (N)
sewing needle
mosquito hawk
snake feeder (M)
dragonfly
7
5
l
l
l
26*
l
0
l
0
2
4
7
0
0
11*
11*
48. INSECT THAT GLOWS AT NIGHTI
>?r^/fy (N)
lightning bug
firebug (M)
candlefly
49.
5
25*
4
0
maple
/re^ (M)
wwzpfe (N)
maple (N)
5
0
22*
2
50. PLACE WHERE SAP IS GATHERED:
maple grove
(N)
13*
2
(N)
orchard
maple grove
camp (M)
(M)
51. HE is SICK
at his stomach (M)

to his stomach (N)
in his stomach (M)
on his stomach (M,S)
0
10
4
2
16*
16*
l
0
52. THE GAME OF
quoits (N)
quates
horseshoes
53. A NOISY BURLESQUE SERENADE

AFTER A WEDDING:
serenade
chivaree (N)
belling(M)
dish-panning
skimmelton (N)
callathump
54. BABY (ON ALL FOURS)

ACROSS THE FLOOR:
creeps (N)
crawls (M)
l
0
30*
l
31 *
l
0
0
0
23*
10
5. SUMMARY AND CONCLUSION
In summary, then the following conclusions may be drawn from the

study: (1) the degree of density applied in this study should be seriously
* Modal response
95
considered s a critical part of future dialect studies. Although the sampling techniques used in the study were relatively unsophisticated and
definitely need to be improved, the reliability of the data was apparent
in all of the significance tests. The degree of density in relation to questionnaire size will have to be revised in the light of the above findings.
(2) Although the phi coefficient s well s the tetrachoric correlation
describe the relatedness of linguistic phenomena under analysis reliably,
they function more crucially s input for either the multiple factor analysis
or cluster analysis, for the factors they describe must obtain some significance criterion if they are to be considered valid. (3) Although the Guttman quasi-simplex covariance structure can show apparent 'complexity'
among idiolects, it is unreliable s an indicator of the statistically significant factors which underlie the intercorrelations. (4) The count analysis
proved to be editorially informative s well s an Instrument to provide
an accurate frequency-count and input for the Mest. (5) The results
of the proposed statistical methodology overwhelmingly did not support
previous assumptions about lexical usage in Johnson County, Iowa, and
demonstrated the need for an analytic methodology which can test for
significant differences. It should be pointed out at this point that the
proposed methodology can also be used to analyze phonological,
morphological, and syntactical dialect materials. (6) The Computer
can make the necessary degree of density feasible and be an extremely
time-saving and powerful tool in counting and computation for the
linguistic geographer.
30 vi 1966
University of Iowa
Iowa City, Iowa 52240
U.S.A.

A Computerized Statistical Methodology For Linguistic Geography - A Pilot Study

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Computerized Statistical Methodology For Linguistic Geography - A Pilot Study

Enviado por

Direitos autorais:

Formatos disponíveis

A COMPUTER1ZED STATISTICAL METHODOLOGY

FOR LINGUISTIC GEOGRAPHY:

A COMPUTERIZED STATISTICAL METHODOLOGY

linguistic geographers could test their data for significant relationships

"Correlation methods of comparing idiolects in a transition area", Language, 28

A COMPUTERIZED STAT1STICAL METHODOLOGY

findings by the proposed methodology should point to a rejection or

J. P. Guilford, Fundamental statistics in psychology and education, 3d ed, (New

A COMPUTERIZED STATISTICAL METHODOLOGY

A COMPUTERIZED STATISTICAL METHODOLOGY

3. RESULTS AND DiSCUSSION

The results gained by using factor analysis were just s encouraging.

Guttman, loc. dt. (1954), 258-348 (see fn. 15).

A COMPUTERIZED STATISTICAL METHODOLOGY

behavior s it varies among idiolects can be conceived here in terms of

This Interpretation was offered to me by Bishwa Nath Mukherjee, formerly on

As clearly indicated by the representative sample of key lexical test

A COMPUTERIZED STATISTICAL METHODOLOGY

representative sample of key lexical test-questions and their respec ve

3. CHANNEL FOR RAINWATER ON

7. POLE TO STEER AND PULL WAGON:

4. BUILDING FOR CORN

corn crib (N)

9. PIVOTED CROSSBAR FOR ONE HORSE:

12. IMPLEMENT FOR BREAKING CLODS

18. SOUND A HORSE MAKES DURING

19. CALL TO HORSES TO BRING THEM

CALL TO SHEEP TO BRING THEM IN

14. HORSE ON THE LEFT:

LARGE OPEN METAL VESSEL FOR

FENCE MADE OF SLATS:

picket fence (N)

GARBAGE FED TO HOGSI

LOAD IN BOTH ARMS!

IRON UTENSIL FOR FRYING I

28. PAPER CONTAINER:

A COMPUTER IZED STATISTICAL METHODOLOG

29. HEMP OR BURLAP CONTAINER!

30. WALL OF LOOSE STONEI

stone wall (N)

31. SMALL WIND INSTRUMENT PLAYED

32. VESSEL TO CARRY COAL:

coal hod (N)

coal oll (M)

34. A TIED, FILLED BEDCOVER:

36. BREAD MADE OF CORN MEAL IN

37. SMALL RING-SHAPED CAKE MADE

side pork (N)

40. A LOOSE, WHITE, LUMPY CHEESE:

pot cheese (N)

41. FOOD BATEN BETWEEN MEALS:

GREEN OUTER COVER OF WALNUT I

48. INSECT THAT GLOWS AT NIGHTI

50. PLACE WHERE SAP IS GATHERED:

at his stomach (M)

52. THE GAME OF

53. A NOISY BURLESQUE SERENADE

54. BABY (ON ALL FOURS)

5. SUMMARY AND CONCLUSION

In summary, then the following conclusions may be drawn from the

A COMPUTERIZED STATISTICAL METHODOLOGY

Você também pode gostar