Escolar Documentos
Profissional Documentos
Cultura Documentos
David F. Coward
Charles E. Grimes
SIL International
Waxhaw, North Carolina
2000
This book is sold with the software it describes. That software, too, is the copyrighted
property of SIL International. However, in the interest of sharing the fruit of our research
with the broader academic community, the user of the MULTI-DICTIONARY
FORMATTER [MDF] software is granted the right to share copies of the distribution
diskette with friends and associates, provided this is not done for commercial gain. Such
recipients of the software, if they decide to use it in their research, should in turn buy this
book with its latest version of the software.
MDF represents work in progress. In publishing this software, SIL International is
making no commitment to maintain it. It is, however, committed to forwarding user
comments to the softwares authors, who may or may not develop the software further.
IBM is a registered trademark of International Business Machines Corporation. Microsoft
Word, Microsoft Windows, Microsoft Word for Windows, and MS-DOS are trademarks of
Microsoft Corporation.
Cover designed by Bud Speck.
The 2000 edition is only available in Portable Document Format (PDF). Only minor
corrections to the 1995 text were made. No new material is introduced in this edition.
Contents
Preface.......................................................................................................................................... vii
1. Before you begin ........................................................................................................................1
1.1 Installing the MDF program and files ...................................................................................1
1.1.1 Running MDF...............................................................................................................1
1.1.2 Requirements and limitations.......................................................................................2
1.1.3 Further information ......................................................................................................3
1.2 Notes on presentation and conventions .................................................................................3
1.3 What to work on from the beginning ....................................................................................4
2. Getting started in lexicography with MDF..............................................................................7
2.1 MDF fields used within an entry with the relative order in which they print .....................13
2.2 Examples of lexical entries (raw SHOEBOX form and MDF output)................................29
2.3 Understanding the gloss, reversal and definition fields.......................................................36
2.3.1 Additional considerations for interlinearizing, definitions and reversal ....................41
2.3.2 Understanding the relationship between the \ge, \re and \de fields ............................43
2.4 Understanding the hierarchical structure of an entry...........................................................45
2.5 Direct character formatting within a field ...........................................................................49
2.6 Punctuation..........................................................................................................................52
3. Introduction to the Multi-Dictionary Formatter program ..................................................53
3.1 Familiarizing yourself with the program .............................................................................53
3.2 Requirements and limitations..............................................................................................54
3.3 Overview of menu options ..................................................................................................56
3.3.1 Change Settings..........................................................................................................56
3.3.2 Reset ...........................................................................................................................57
3.3.3 Format Dictionary ......................................................................................................57
3.3.4 English and national language finderlists...................................................................60
3.3.5 Quit.............................................................................................................................62
3.4 Printing ................................................................................................................................63
3.5 Modifying the printout ........................................................................................................64
3.5.1 WORD Stylesheets.....................................................................................................64
3.5.2 Character Style codes .................................................................................................64
3.6 Summary..............................................................................................................................66
4. Basic strategies and perspectives............................................................................................67
4.1 Terminology ........................................................................................................................67
4.2 Identifying the primary audience and purpose ....................................................................68
4.3 Monolingual, bilingual, and trilingual dictionaries .............................................................70
4.4 Text-based lexicography and lexical sets of similar words.................................................72
4.5 Minimal entries vs. expanded entries ..................................................................................74
4.6 Root-oriented vs. lexeme-oriented databases ......................................................................77
4.6.1 Comparing the two approaches ..................................................................................83
4.6.2 Advantages and disadvantages ...................................................................................83
4.6.3 A suggested compromise............................................................................................84
iii
Preface
This book and the MDF program that accompanies it did not just grow in a vacuum.
Rather the package developed as a positive response to a number of factors. It has been
built on foundations laid by others. We acknowledge and thank them by reviewing the
development process of MDF and this book (hereafter referred to as the Guide), noting
their contributions where they happened.
David Coward worked closely with John Wimbish in the mid to late 1980s on the
original development of the SHOEBOX computer program for data management. During
the drafting of the initial SHOEBOX documentation Wimbish, Coward, and Grimes
discussed the need to eventually rework and expand the chapter on lexicography and
adapt it further as our experience and expertise grew. All three were working on
genetically and geographically diverse languages in the province of Maluku in eastern
Indonesia.
As the number of SHOEBOX users grew, many began to organize their lexical data
and build dictionaries by interlinearizing bodies of vernacular texts. But it soon became
apparent that there was a significant need for an easy way to format and print the
dictionaries being compiled in SHOEBOX, and to produce a good reversed index.
Coward developed a fairly complex CC (Consistent Changes) print table to print an early
draft of his Selaru dictionary. Wyn Laidig and others then asked Coward to adapt similar
tables for their needswith many asking for refinements and enhancements to the
original tables. It became obvious that one print table flexible enough to handle many
options would be better than repeatedly customizing individual tables for individual users.
Since many users of SHOEBOX were using their lexical database for both
interlinearizing and building a dictionary, it also became apparent that there was a need
for a conditional selection of information rather than a straight find-and-grab approach
for making a reversed finderlist (see 2.3). Because of the nature of the computer tools
used for formatting and printing, these choices required superimposing certain constraints
on the field codes within the lexical database, as undesirable as everyone knows that to
be.
The development of the print tables was enhanced by the standards proposed and the
issues addressed at the 1991 Hasanuddin University-SlL Lexicography Workshop in
Sulawesi, Indonesia, lead by Tom Laskowske, Roger Hanna, Barbara Friberg, and
Coward (as a guest). This included useful input from David Anderson and Phil Quick.
The Maluku Linguistics Committee of SIL Indonesia, working at Pattimura University in
Ambon, developed an enhanced set of suggested field codes. Bryan Hinton, Russ Loski,
Howard Shelden, Mark Taber, and Ron Whisler were helpful at that stage, building on
Wimbish (1989), the Sulawesi workshop, and the works of Len Newell (1986) and Marc
Jacobson (1986). The results were made available in Indonesia in September 1992 as the
vii
Maluku Dictionary Formatter [MDF] program (version 0.9, originally limited to feed into
Microsoft WORD 5.0) with its accompanying documentation (Coward 1992). That
version and the later v0.95 (for MS-WORD 5.5) quickly found eager testers in a number
of countries throughout Southeast Asia and the Pacific. Many of these early testers
provided helpful ideas and words of encouragement, and we especially thank Bryan
Hinton, Jock Hughes, Rick Nivens, John Severn, and Ed Travis for theirs.
In the meantime, Grimes responsibilities were taking him back and forth between
Indonesia and Australia where he was gaining insights into semantics and related issues
with Prof. Anna Wierzbicka, Prof. Bill Foley, and Prof. Bob Dixon, and assisting Prof.
Andrew Pawley with workshops and courses on dictionary-making. MDF v0.9 was
incorporated into a number of SHOEBOX courses taught by Grimes at the Australian
National University while he was a Visitor in the Department of Linguistics at the
Research School of Pacific Studies. The correspondence between Coward and Grimes,
beginning at that time, grew into the collaborative effort you now hold in your hands.
The enhancements of both the program and the documentation since v0.9 have
focused on 1) providing more interactive options for the user; 2) making the field codes
more broadly applicable to users outside Indonesia (hence the original name was changed
from Maluku Dictionary Formatter to Multi-Dictionary Formatter); 3) making the field
codes more systematic and mnemonic; 4) providing additional categories and options
requested by early users working in a wide range of linguistically and geographically
diverse languages; 5) tying MDF into the broader academic world of lexicography;
6) addressing background and methodological issues that are beyond the immediate scope
of the MDF computer program but which are faced by anyone seriously grappling with
cataloging the lexicon of a language, and 7) including around 200 real-language examples
showing how to organize such things as homonyms, citation forms, multiple senses,
various kinds of cross-references, dialectal information, loan words, multiple-language
glossing, and other categories of lexical information, illustrating both the form it should
take in a SHOEBOX-like database and how MDF formats the information for printing.
The idea is that if users can see what an example looks like, they are then more likely to
be able to adapt it to their needs. Over time the documentation expanded to what it is
now, fulfilling the long-term goal of providing a stand-alone field guide that users can
have with them when doing their fieldwork. Also included is a bibliography directing
users to where they can find issues discussed at greater length.
As with the development of the MDF computer program, this Guide has also
benefited greatly from the works of others. General sources in lexicography such as
Zgusta (1971) and Landau (1989) broadened our horizons. Bartholomew and Schoenhals
(1983) was particularly useful on principles for choosing good example sentences. Newell
(1986) provided a helpful summary for, among other things, determining multiple senses.
A lexicography workshop held at Cenderawasih University in Irian Jaya in 1985, run by
Prof. Joseph Grimes of Cornell University provided an introduction to the works of Igor
viii
Melchuk and the usefulness of lexical functions. That introduction grew into Chapter 7,
which has also appeared in modified form as C. Grimes (1994). Joseph Grimes has also
given us considerable encouragement and has suggested many useful modifications to
both the MDF program and the Guide toward their latter stages of development. Prof.
Andrew Pawley at the Australian National University, who took C. Grimes under wing in
various workshops and courses on dictionary making, graciously allowed us to adapt
some of his materials for this volume, particularly in Chapter 8. Chapter 9 addresses a
number of issues that users have asked about and was presented in an earlier form at the
1992 Asia International Lexicography Conference (C. Grimes 1992).
From these and many other sources, and from our experience working on
dictionaries, both our own and helping dozens of others, we have gleaned and condensed
much of the information found in this Guide. The ideas have been generalized,
streamlined and formulated into a package we are confident will be useful to many in
both its theoretical and practical applications.
Along the way, John Wimbish and Dan Davis have individually encouraged our
efforts and we are grateful for their support. Wimbish also commented on parts of this
Guide. A number of other people have also given useful feedback including Myron
Bromley, Les Bruce, Barbara Dix Grimes, Len Newell, David Snyder, and Peter Wang.
While the over-all feedback has been overwhelmingly positive, recognizing the practical
service and guidance that MDF provides, not everyone has been in full accord with all of
our recommended approaches because of practices peculiar to their region that we do not
encourage here for principled reasons. The beauty of both MDF and this Guide, however,
is that they are flexible enough to handle a wide range of options even beyond the various
competing approaches and options explicitly discussed or recommended hereit is truly
a Multi-Dictionary Formatter.
Doyle Peterson has given consistent administrative support for this project as it
developed toward its later stages. Jim Albright and Betty Eastman provided helpful
editorial suggestions. Our wives and families have graciously tolerated several late-nightto-early-morning sessions, simultaneously believing in the usefulness of the MDF project
and hoping we would finish it soon.
ix
by MDF, do not use this program yet. First convert your lexical field codes to this
standard (as explained in chapter 2 of this Guide).
1.1 Installing the MDF program and files
The SETUP program will guide you through installing MDF on your computer. A hard
disk drive is highly recommended. At the DOS prompt type a:setup, then press ENTER.
If you are installing MDF from a different drive use the appropriate designation (e.g.
b:setup). Respond to the screen prompts using the default suggestions if you are
uncertain. We recommend installing MDF in its own subdirectory as suggested by the
SETUP program, e.g. C:\MDF. Consult the README file on the release disk for
additional information.
1.1.1 Running MDF
The MDF program is set up to work with WORD v5.0, v5.5, or v6.0 and WINWORD
(v2.0 or v6.0).1 In order to run, MDF needs to know the filename of your lexical database.
So, if the name of your lexical database is LEXICON.DB, you would type:
C:\MDF>mdf lexicon.db
C:\MDF>mdf \sawai\lex\lexicon.db
The MDF program will ask you to specify the version of WORD you are using. (Use the
arrow keys and <ENTER> to select it). If you prefer to specify this from the command line,
the following exemplifies how to do it:
1If the user specifies WINWORD as the word processor, MDF will format, split, and convert the
database files to WORD documents, but makes no attempt to merge them (because MDF cannot access
WINWORD directly). The user will need to then exit MDF and load each document file into
WINWORD manually for merging and printing. For WINWORD, formatted dictionaries are named
DICTN*.DOC; English reversed lists are ENGLS*.DOC; and national reversed lists are NATNL*.DOC.
Some WINWORD 6.0 users will prefer to merge the DICTN*.DOC files together by using the Master
Document View and buttons, and then later remove the section breaks introduced by that process.
C:\MDF>mdf
C:\MDF>mdf
C:\MDF>mdf
C:\MDF>mdf
C:\MDF>mdf
lexicon.db
lexicon.db
lexicon.db
lexicon.db
lexicon.db
v5
v55
v6
win2
win6
The MDF program can have trouble merging documents in WORD v5.5 and WORD v6.0
simply because the glossary files used by those programs assume a default keyboard setup
for each version of WORD. If the user has configured the keyboard in WORD to be
different from the default configuration, MDF may malfunction at the point where
WORD is called. So, test MDF on a small section of your lexicon to see that all is
working well before trying to process your whole lexicon.2 If MDF does not work
properly, exit MDF, reconfigure WORD to its default settings, and try MDF again. A file
named MDFSAMPL.DB is provided with MDF for testing that your system is working
properly.
For Windows users: Drag the MDF.BAT file to a Program Manager group; edit its
properties (ALT+ENTER); and add the name (and path) of your lexical database to the
command line. Also be sure the Working Directory is the same as the directory in which
you copied all of the MDF files.
1.1.2 Requirements and limitations
MDF is not a sophisticated program!3 It requires some user care. Allow plenty of room
for MDF to workapproximately four times the size of your lexical database. Trying this
program on a floppy drive would be unwise. The MDF program reserves the filenames
DICT*.*, ENGL*.*, and NATN*.* for its own use. Do not use these names for your own
files as they are likely to be deleted. MDF must be able to find the MS-DOS program
SORT.EXE (SORT.EXE is supplied with MS-DOS and is usually found in the C:\DOS
subdirectory). If it is unable to find SORT (i.e. if C:\DOS is not in the PATH command in
the AUTOEXEC.BAT file), the MDF program will not be able to run properly. To test if
MDF will be able to find SORT, type DIR | SORT at the DOS prompt:
C:\MDF>dir | sort
If this gives an alphabetized listing of the files on the default directory then all is okay
(the line indicating the amount of free disk space is also sorted to the top). If the files are
not sorted alphabetically, this means that the SORT program is not accessible. You will
need either to specify a path that makes SORT accessible, or to copy SORT to a place
2Testing a small portion of your lexicon before trying the whole thing is important not only for testing
the interaction of the programs, but also for ensuring that the structuring of your lexical information fits
within the parameters set for working with MDF (see chapter 2).
3That is, computerwise, although what MDF can deliver to the user is very powerful.
where it can be found (like to the directory where MDF and its associated files are
located).
MDF must also be able to find your word processor. MDF assumes your word processor
subdirectory is specified in the PATH command of your AUTOEXEC.BAT file and that
your word processor is named WORD.EXE. If you have more than one version of WORD
installed and have renamed the files (e.g. WORD5.EXE and WORD6.EXE), make sure
the version you want to use with MDF is named (or renamed) to WORD.EXE. Make sure
that particular subdirectory is added to the PATH command in AUTOEXEC.BAT. To
check this, from the MDF subdirectory type:
C:\MDF>word<ENTER>
[check WORD-for-DOS]
C:\MDF>win winword<ENTER>
[check WORD-for-WINDOWS]
Keyboard conventions: Key names connected by a plus sign [+] indicate a combination of
keys (e.g. ALT+F6 indicates press the F6 function key while holding down the ALT key).
Key names separated by a comma [,] indicate a sequence of key strokes (e.g. ALT+F,V
indicates press the F key while holding down the ALT key, then press the V key). Angle
brackets indicate pressing the key named, for example <ENTER>.
1: Before you begin
Cross-references to more detailed discussion elsewhere in this Guide take two forms. A
cross-reference to an entire chapter is simply see chapter 7. A cross-reference to a
specific section uses the symbol [] as in discussed in 4.6 (meaning chapter 4,
section 6).
Throughout this Guide are found special boxes beginning with CAUTION, TIP,
NOTE. They alert the user to information that will make the compiling, formatting, and
printing of a dictionary more trouble-free and rewarding.
Many examples are given throughout this Guide to illustrate the accompanying discussion
and show how MDF processes information. Most are real examples from dictionaries in
progress. The few English examples that are found are simply meant to illustrate a basic
idea of how to manage the data and are not meant to portray theoretical tightness in their
definitionsthat is not what they are illustrating.
On-line helps: On the MDF release disk is a file called LXFIELDS.DB, which is
designed as an on-line help in SHOEBOX for organizing lexical information to format
and print through MDF. One can ask this file, for example, what is the \sc field? what is
it for? and how do I organize information in that field? One can also look at this file for
information on recommended order of fields, punctuation appropriate to a particular field,
etc.
Sample database: Another file provided on the MDF release disk is MDFSAMPL.DB.
This provides a SHOEBOX file of a number of lexical entries in the Selaru language of
Indonesia. Some of the entries are simple and some complex, but they illustrate a range of
different possibilities. This file can be called up into SHOEBOX or a word processor and
can be studied as desired. It can also be used to gain familiarity with MDF by processing
MDFSAMPL.DB using the various menu options available in MDF to view the variety of
output options provided for the user. This can be done by typing:
C:\MDF>mdf mdfsampl.db
1.3 What to work on from the beginning
The compiler of a dictionary should plan on doing at least the following things during the
years it takes between starting and finishing the dictionary.
1)
When first learning how MDF interacts with your data, make a test file of 50200
entries, both simple and complex, making sure that every field and record in it is
organized along the lines required for MDF.
Format this test file through MDF with the various options likely to be needed for
your various audiences and purposes.
Make a reversed finderlist through MDF as you will be doing with the final
product.
Copy the appropriate MDF stylesheet for your printer to MDFDICT.STY and print
your test file.
Inspect every detail of the printout. Adjust the way lexical data is organized in your
LEXICON.DB, and make minor adjustments to the stylesheet to get the resulting
printout you desire.
2)
Edit or enter the rest of your lexicon to conform to what you have learned from
step one above.
3)
PREVIOUS
SESSION (2)
TODAYS
SESSION (1)
NEXT
SESSION
Diskette A
Diskette B
Diskette C
Diskette A
4)
5)
We recommend making a hard copy printout of your full lexical database at least
once a year.
6)
We recommend that you process your database through MDF after every 100200
new or newly edited entries. A new printout is not required, just inspection of the
results on the computer. This keeps you mindful of how the field codes interact
with MDF. It also helps you pinpoint a snag if the program should hang for some
reason.
Once the compilers are ready to print the final product, they should plan on at least two
passes:
1)
The first pass is a printout of the entire database using the options they want for the
final form. This includes both the dictionary and the finderlist.
These printouts should be carefully inspected entry by entry to see that everything
is as desired. Human experience suggests that it wont be.
Make any corrections on the original lexical database, not on the MDF output (i.e.
make changes in the LEXICON.DB file, not in the DICT.DOC file)!
2)
After you have written your introduction to the dictionary (see 10.2), then make
sure the lexical database is consistent with what has been said in the introductory
material and reprocess the corrected database file through MDF. Repeat the steps
above, if necessary.
3)
Using WORD, post-edit anything that MDF cannot control directly in the final
DICT.DOC file. For example, a) remove the (dateprint) from the footers; b) make
sure the section dividers that begin a new letter are modified to reflect special
characters and digraphs as appropriate; c) if the national language-vernacular
diglot, or triglot option is chosen, replace labels to conform to what is appropriate
for the country in which the national language is spoken. (The Indonesian labels to
be replaced are listed in Appendices A and B); d) if the national languagevernacular diglot option is chosen, replace Kamus (meaning dictionary) in the
footer with whatever is appropriate.
2)
3)
Inputting the information (compiling the lexical database) normally over a period
of years. This is best begun in the earliest stages of contact with a language and
continued throughoutmuch is gained by doing it this way.
4)
5)
Manipulating the data for analytic or other purposes, such as extracting semantic
domains, doing reversals, etc.
6)
7)
Printing.
8)
A tool like SHOEBOX can very nicely assist with aspects 26 above. The MultiDictionary Formatter [MDF] and this Guide are designed to be used in conjunction with
SHOEBOX to beef up 27, especially points 2, 5 (reversals), 6, and 7.
Putting dictionary information in a database structure rather than in word processor text
files has significant advantages in the compiling, checking and formatting stages.1
SHOEBOX has brought these advantages to new heights in a 640K DOS environment
with features such as:
1)
2)
Easy comparison of non-adjacent entries and copying information from one to the
other with the JUMP feature.
3)
4)
The ability to search across separate databases (e.g. comparing different dictionaries of the same language, lexicons of different languages, and different domains
of the same language).
5)
The ability to check for consistency against a master list using the SHOEBOX
RANGE SETS (e.g. parts of speech, semantic domains). This provides a quality
control in the compiling stage.
6)
7)
8)
9)
10) The lexical database is interactive with a text corpus (e.g. for interlinearizing,
spell-checking, dictionary-building, or searching for example sentences). Textbased linguistics and lexicography provide a very sound foundation for mapping
out a language and culture.
TEXT
/Language learning
//Phonology
///Morphology
////Clause-level syntax
/////Interclausal syntax
\\\\\Discourse
\\\\Lexical database
\\\Anthropology
\\Literacy
\Translation
2With MDF the user will do best to stick with the suggested codes. Nearly 100 field codes are provided,
Multi-Dictionary Formatter
Overview
Format dictionary
English finderlist
National finderlist
Change settings
Reset
(e.g. SHOEBOX)
\lx
\ps
\ge
\de
dapan
n
spear
three-pronged spear with
barbs, used for eels
\ee This is similar to the
unbarbed fv:nasel used
for crayfish.3
\mr dapa-n
\dt 14/Apr/93
\lx
\ps
\sn
\ge
\et
\eg
\dt
\lx
\ps
\ge
\re
\de
\ee
flawan
n
1
gold
*bulaw-an
gold
13/Dec/93
akal
n
idea
idea ; notion ; conspiracy
idea, notion, conspiracy
Has overtones of evil or
mischievous intent.
\bw Arabic
\dt 20/Oct/89
A sample of MDF output for a formatted dictionary and a reversed finderlist are found on
the following two pages:
3Note that in the \de field normal punctuation is used except at the end, where no punctuation is used
MDF will supply it later. The fv: is a code (font-vernacular) that provides direct formatting for printing
the tagged word in the vernacular style when using MDF. Other direct formatting character codes are
explained in 2.5.
10
11
12
2.1 MDF fields used within an entry with the relative order in which they print
Fields already factored into MDF are listed below. Sticking with these field markers will
permit automated reverse indexing and printing. The relative order of the field markers is
the one we recommend.4 The following fields are critically ordered in relation to each
other: \lx \hm \lc \se \ps \pn \sn. The order of the other fields is fixed in printing, but
there is some flexibility for user preference in how the information can be organized on
screen in SHOEBOX. For example, some users prefer \sd (semantic domain) near the
front while others prefer it at the end.
CAUTION: There is a potential cost in deviating from the canned package. MDF is not
highly interactive, so do not expect to customize the output except in limited ways.
Nevertheless, be assured that MDF provides a wide range of options that have proven
capable of organizing diverse lexical information for a variety of purposes and from a
variety of languages spoken in Asia, Africa, the Americas, and the Pacific.
The explanation of the field codes that follows is supplemented in 2.2 by examples from
the Buru, Selaru, and Tetun languages of how these codes are used.5 Subsequent chapters
expand the discussion of many of these codes. A summary of the information below is
available in a helps file supplied with MDF (LXFIELDS.DB) that can be on-line in
SHOEBOX when needed.
\lx
Lexeme: also known as lemma or headword [\lx tuat]. This is the key field or
record marker that SHOEBOX uses to keep one entry separate from another.
Bound morphemes are listed with a preceding or following hyphen [\lx -oli, \lx
nara-]. For some languages it may be acceptable to give an inflectable citation
form, such as the H-form given in Tetun for inflectable verb roots [\lx holi,
representing the paradigm koli, moli, noli, holi, roli, where the linguist would
tend to identify the root -oli but the community thinks in terms of holi].
Multiple word or phrasal lexemes are common. Once SHOEBOX is set up in
v1.2 or earlier, the user no longer sees \lx, but rather Key: at the top of the
SHOEBOX screen [Key: tuat]. Version 2.0 uses the actual record marker field
[\lx tuat]. See 6.1 for an expanded discussion on choosing headwords. This
field is obligatory for each entry.
4The recommended order of fields is listed more succinctly in Appendix B. Different purposes and
different audiences may require a different setup, but MDF is not designed to assist with customized
output beyond the built-in options.
5See the SHOEBOX manual for alternate ideas on organizing lexical information. This current MDF
Guide is designed to expand and enhance the discussion in the SHOEBOX manual relating to lexical
databases and provides for a wider range of lexicographic needs.
13
\lc
Citation form (lexical citation): [\lx nara-, \lc naran]. This gives a complete
surface form of bound roots that will be printed as the headword in the final
printout. The \lc form always replaces the \lx form for the printed dictionary.
MDF prompts users to choose whether or not they want entries that use \lc to
sort under the \lc form for the printed dictionary. If the entry is not sorted by the
\lc form, it will sort under the \lx, but the printed headword will be the \lc form
(\lx -angu, \lc (na)-angu is printed between \lx ane and \lx aok; similarly
\lx -ao, \lc (beke)-ao is printed between \lx aok and \lx ape). See 5.4.4 for
detailed discussion. Use \lc only if the \lx form is inappropriate for the printed
dictionary. MDF places the contents of the \lx field as follows: \lx -hilu,
\lc na-hilu is printed as na-hilu (from: -hilu).
\ph
\se
Subentry: This field is used if one is organizing the lexicon primarily around
the root morphemes rather than the surface forms. It is also used by some
compilers for languages in which phrasal lexemes are common (e.g. put out)
where the preference is not to list the phrasal lexemes as separate headwords.
Phrasal lexemes can be organized as \se sections under the words that make
them up. Polymorphemic forms or phrases are listed under \se, which is like the
\lx field except that it occurs within the record (entry), marking the word (or
phrase) as a form derived from or associated with the root. Following this field
14
would be all the fields that make up a typical lexical entry. There can be several
\se subentries within a record (entry). Subentries can also have multiple senses
within them. MDF begins each subentry at the beginning of a new line: [\lx
destroy, \se destroyer]. For bilingual dictionaries of minority languages,
many lexicographers prefer to not use \se, listing everything as main entries to
make it easier for the naive user to find information. Upon reversal, both the \se
form and the \lx form are referenced for a gloss listed under the \se form (e.g.
\lx sima, \ge hand, \se simake klarake, \ge palm reverses on the subentry as
palm simake klarake, see: sima).
\ps
Part of speech: [\ps vt, \ps n, \ps PREP, \ps PRO]. This is used to classify the
vernacular form, not the English or national language gloss. For example, the
quality fat might be an adjective in English, but a verb in the vernacular
language. \ps labels should be refined as ones understanding of the language
grows. In other words, dont believe your early labels. Consistency in labeling
is important. The RANGE SETS in SHOEBOX can help with this. There should
be no final punctuation. MDF prints the \ps contents as italics (case is printed
as entered in the original file) and adds a period [\ps vt vt.]. See chapter 9 for
a variety of relevant issues and Appendix E for a starter list of abbreviations. If
more than one \ps is used in an entry (e.g. one sense as a noun and another as a
verb), then MDF starts each new \ps within an entry or subentry at the
beginning of a new line, dividing the entry into sections on the basis of the \ps.
See 2.4 for how this fits into the structural hierarchy of an entry.
\pn
Part of speech (national): [\pn kkt, \pn kb, \ps ks]. This is used to classify
vernacular parts of speech, labeling them with terms common to national
language dictionaries. Keep in mind that part of speech categories in the
national language may not match part of speech categories in the vernacular
(see chapter 9). Consistent labeling is important. Use SHOEBOXs RANGE SET
feature for this field.
MDF requires that the \pn field follow the \ps field:
\ps n
\pn kb
(noun)
(the national abbreviation for noun)
CAUTION: If the order of these two fields is reversed, MDF will not format
15
is no \pn field or it is empty, the \ps field will be output for the national
audience as well as for an English audience. This limits the need for redundancy
for those labels that are the same in both languages. (See also \ps above.)
\sn
Sense numbers can subdivide subentries (\se) and parts of speech (\ps). Each
\sn should contain its own set of basic field markers (\ge, \re, \de, etc.) as
relevant. It is important to aim toward each sense being validated by a wellchosen example sentence (\xv). See 6.2 and 6.3 for additional considerations.
Where multiple senses occur, MDF automatically references the correct sense
number in the reversed finderlists.
In compiling the lexicon, some lexicographers find it is convenient to deal with each
separate language as a separate bundle (all English fields, then all national language
fields), whereas others may prefer to interspersing the language codes (all the gloss fields,
then all the reversal fields, then all the definition fields). See 2.3 for a discussion of the
relationship between gloss, reversal, and definition fields.
Vernacular language bundle of fields:
\gv
\dv
16
\ge
Gloss (English): [\ge 3s, \ge house ; hut ; building]. This field is used for
1) interlinearizing, 2) printing the dictionary (if there is no \de field or the \de
field is empty), and 3) reversal (if there is no \re field or the \re field is empty).
Where the user is distinguishing morpheme-level from word-level glosses, the
\ge field is used for morpheme-level glosses. Multiple word glosses should be
connected with an underline to maintain spacing integrity and force SHOEBOX
to treat the whole gloss as a unit when interlinearizing [\ge put_out, \ge
kin_group]. MDF will convert this to a plain space when printing.
There are two options for organizing multiple glosses:
\ge house
\ge hut
\ge building
OR
[space-semicolon-space]
Reversal (English): [\re jaw ; chin; \re exchange ; get ; take ; give]. This
gives the English word(s) or phrase(s) desired for a reversed English-vernacular
finderlist. It is used for reversal only if the form in the \ge field is not suitable.
The contents of the \re field are not printed in the dictionary, but only in the
reversed finderlist. This is not a definition. Since this field is not used for
interlinearizing, the joining underline [\ge put_out] is not used. See 2.3 for
additional suggestions such as not glossing verbs as infinitives to (cut), or
nouns with an article a (rock) because the reversal will sort on the first word
in this field.
If an asterisk is placed in this field [\re *], then the relevant entry, subentry, or
sense will be discarded or ignored for reversal (i.e. it will not be included in the
reversed finderlist).
CAUTION: MDF can handle up to twenty multiple glosses in the \ge or \re
fields in a single sense or subentry for the reversal process. If more than
twenty glosses are required, consider whether the information should be
restructured into separate senses or subentries.
17
\we
\de
Gloss (national language): This is like the English \ge field, but is for
Indonesian, Spanish, French, Portuguese, etc. If interlinearizing is not to be
done in the national language, then all material for a reversed finderlist is also
put in this field and \rn is not used. See 4.2, 4.3 and 5.2.
\rn
Reversal (national language): This is like the \re field, but is designed for
forms that are appropriate for reversal in the national language. For example,
mempersilahkan may be an appropriate gloss for the \gn field, but
inappropriate for reversal\rn silahkan is preferred. This field would also be
used if interlinearizing is done in the national language and the contents of the
\gn field are inappropriate for reversal.
\wn
\dn
Regional language bundle of fields: These are activated by MDF when National language
audience or triglot options are selected.
\gr
Gloss (regional language): This is like \ge field, but for the regional language
or lingua franca that might be different from the national language, such as
Ambonese Malay, Swahili, or regional creoles. These are often the languages in
which explanations are given, particularly early in the researchers contact, and
they may provide more insight into the range of meaning of the headword than
the national language. See 2.3, 4.2, and 4.3.
18
\rr
\wr
Word-level gloss (regional language): Like the \we field. It is not likely to be
needed.
\dr
Definition (regional language): This is like the \de field. If triglot printing is
selected, MDF prints the regional language fields in italics within square
brackets [ ] preceded by Regnl: as in [Regnl: parlente].
Literally: This is used where the literal parts of an idiom or lexeme do not
obviously yield the gloss or definition given. MDF adds Lit: before the contents
of this field and puts the contents in single quotes, followed by a period.
\sc
Scientific name: [\sc Phalanger spp]. Used where the information is known.
Consult the best regional sources on flora, fauna, avifauna, and fish, or get
expert advice. Be careful about guessing as a lay person. Educate yourself about
principles of identification and taxonomy in botany and zoology. MDF prints
the contents of this field as underlined italic, e.g. Phalanger spp. Do not use
final punctuation as MDF will add this.
Example sentence bundle of fields: MDF can handle up to five different example sentence
bundles for each sense and subentry in a main entry. Within such a unit, multiple
examples are printed one after the other.
\rf
Reference: This refers to the source of the example sentences from data
notebooks, the name of the source text and sentence number, etc. [\rf C89
2:34, \rf Manukama 164.]. This housekeeping field does not have to be
printed, but the information is useful to record. MDF adds Ref: before the
contents of this field. The information is bundled with the following example
sentence fields. Punctuation should be used as needed.
\xv
\xe
19
\xn
\xr
\xg
Example (gloss for interlinearizing): This field is for those who wish to
include interlinear glossing of \xv in their lexicon.
CAUTION: MDF does not currently recognize this field and so will not
maintain the integrity of the spacing for printing if this field is used.6 It is
questionable whether interlinear examples are appropriate for most
dictionaries.
Fields clarifying the range of meaning and usage:
\ue
Usage (English): [\ue archaic, \ue ritual, \ue Used by same-sex siblings,
not opposite-sex siblings. \ue taboo, \ue vulgar, \ue Rana dialect, \ue
H(igh register)]. This is for comments on social usage, region, register, or
\ur
\uv
\ee
6This reflects a limitation in the CTW program that MDF uses for converting to a WORD format.
20
TIP: Use the \ee and related fields (\en, \er, \ev) as all-purpose fields for
\er
\ev
\oe
Only (restrictionsEnglish): [\oe human; \oe female; \oe not said for
siblings of opposite sex; \oe collocates with non-active verbs only]. This
is for semantic or grammatical restrictions pertinent to the use of the headword.
Capitalization should be used as needed. MDF places Restrict: before the
contents of this field.
\on
\or
\ov
Lexical function fields: This bundle of fields (\lf \le \ln \lr) should be kept together since
each example of a lexical function has its own distinct glosses. There can be as many of
these bundles as needed. MDF separates multiple bundles of lexical functions within an
entry, subentry or sense with a semicolon [;], and places a period [.] after the final lexical
function in the entry, subentry or sense.
\lf
Lexical functions: [\lf Part = sufen, \lf Whole = huma]. These are for
mapping lexical networks, in effect, cross-referencing the lexeme with entries
related to it, including various types of synonyms, antonyms, part-whole,
generic-specific, typical actors, undergoers, instruments, material used, etc. The
\lf system of cross-referencing links words in specific ways, in contrast to the
use of \cf, where the link is vague and undefined. See the discussion of lexical
functions in chapter 7 for a listing with examples of relations most commonly
used in the \lf field. When printing, MDF converts the spaceequals sign [ =] to
a colon [:], printing the label of the semantic relationship in italics, and what
comes after the equals sign [=] as vernacular font. Thus, \lf Syn = peni prints
through MDF as Syn: peni. MDF is set to ignore \lf fields that have nothing
after the equals sign, for empty \lf fields that include certain labels in their
21
template. Thus, \lf Syn = (blank), will not print as Syn: unless something is
filled in after the equals sign.
\le
Lexical function (English gloss of \lf): [\le merchant; \le wave]. For most
lexical functions, the contents of \le are simply the gloss of the contents of the
\lf field. But for SynD(ialect), the dialect name is put in this field [\le Rana
dialect]. For SynR(egister), the speech register name is put in this field [\le
Low]. MDF places single quotes around the contents of this \le field. Thus, \lf
Nact [Actor noun] = gebkaleli, \le merchant prints through MDF as Nact:
gebkaleli merchant. See 2.2 for examples of how these bundles are used.
\ln
Lexical function (national language gloss of \lf): Like the \le field.
\lr
Lexical function (regional language gloss of \lf): Like the \le field.
Synonyms: Available for those who do not want to use the \lf bundles. This
field does not provide the advantage of giving a gloss as with the \le field. MDF
adds Syn: before the contents of this field and prints the contents in vernacular
font, followed by a period.
\an
Antonyms: Available for those who do not want to use the \lf bundles. This
field does not provide the advantage of giving a gloss as with the \le field. MDF
adds Ant: before the contents of this field and prints the contents in vernacular
font, followed by a period.
\mr
Morphology: [\lx inaat, \mr ii-en-kaa-t]. This field is for indicating morpheme
representation, or the underlying forms where morphophonemic processes
occur. MDF adds Morph: before the contents of this field and prints the
contents in vernacular font, followed by a period. See 4.6 for further
discussion with examples.
\cf
22
converted to WORD format for printing, MDF will subscript the homonym
number (e.g. See: asw2). MDF allows multiple \cf bundles, separating each
with a semicolon [;] and placing a period after the final \cf bundle.
\ce
\cn
\cr
\mn
\va
Variant forms of headword: [\lx yako, \va ya, yak; \lx anat, \va an; \lx lidak,
\va lidek; \lx cannot, \va cant]. This can be the inverse of \mn. Cliticized
forms, alternate pronunciations or alternate spellings are listed here. These
variant forms generally refer to minor entries found elsewhere in the dictionary.
Some lexicographers handle incomplete inflections or reduplication here as
well, but those should be handled under the field(s) for paradigms (\pd) or
reduplication (\rd). Use the \ve, \vn, and \vr fields only if there are relevant
comments, such as distinguishing usage restrictions between the \lx form and
the \va form. MDF adds Variant: before the contents of this field and prints the
contents in vernacular font. Multiple \va field bundles are separated by a
semicolon and the final bundle is closed with a period.
The \va bundle can also be used to record dialect variants.7 See 6.5.
7We are aware that a compiler may use the \va bundle for more than one function (i.e. for morphological
variants, and for dialectal variants), and that this sets up limitations for analysis or if one chooses to print
one type but not the other. We intend future enhancements of MDF to have fields dedicated to dialectal
information, but at present the programming limitations do not allow us any more field bundles. For the
present, use \va and \lf SynD =.
23
\ve
Variant (English comment): Comments regarding the contents of the \va field
such as usage restrictions of the contents of \va, or dialect names identifying the
source of the forms in \va. The contents of this field are enclosed in
parentheses: \lx hahy, \va fafy \ve older speakers, prints as Variant: fafy
(older speakers).
\vn
\vr
Borrowed word (loan): [\bw Sanskrit, \bw Swahili, \bw Spanish, \bw
Malay]. This identifies the ultimate source language, where known, with the
understanding that it may have been introduced through an intermediate
language. The form of the original language may also be given [\lx emrimo,
\bw Portuguese fi:meirinho]. For the final printing MDF adds From: and
places a period following the contents of the field, e.g. From: Sanskrit.
\et
\eg
Etymology gloss (English): [\eg bowels]. This field is for the gloss of the
reconstructed form so one can see semantic consistency or shift. Reconstructed
meanings for most language families are given in English. Give the original
published glossdo not translate the published reconstructed gloss into the
national language. MDF prints the contents of this field in single quotes, e.g.
Etym: *biCuka bowels.
\es
Etymology source: [\es Blust 1993:46; \es PANDYMPL]. This is for the
source of the reconstructed form in \et. It is a housekeeping field for data
management and is not intended for printing. Abbreviations for works on
Austronesian languages can be found in Wurm and Wilson (1975).
\ec
24
Paradigm: This is a general field identifying the noun class, verb class, gender,
or other paradigm set to which the headword belongs (as explained in the
introduction to the dictionary). It can be used to identify incomplete or irregular
paradigms. MDF places Prdm: before the contents of this field and adds a
period at the end. For those users or languages that require more specific
paradigm-related fields, MDF recognizes the following:
\sg
\pl
\rd
\1s
\2s
\3s
\4s
\1d
\2d
\3d
\4d
\1p
\1i
\1e
\2p
\3p
\4p
singular form
plural form
reduplication form(s)
1st singular form
2nd singular form
3rd singular form
non-human or non-animate singular
1st dual
2nd dual
3rd dual
non-human or non-animate dual
1st plural
1st plural inclusive
1st plural exclusive
2nd plural
3rd plural
non-human or non-animate plural
[Sg:
[Pl:
[Redup:
[1s:
[2s:
[3s:
[3sn:
[1d:
[2d:
[3d:
[3dn:
[1p:
[1pi:
[1px:
[2p:
[3p:
[3pn:
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
Table (chart): This marks the text as unformatted. Line breaks and tabs entered
by the user are retained. It may be used for such things as folk taxonomies of
plants and animals, clarifying grammatical paradigms, or listing specific terms
under a generic term (the latter better done in the \lf field). Punctuation and
capitalization should be used as needed. The following example is from Selaru:
\tb Listing of all types of cutting verbs:
fv:akrina: split in two lengthwise
fv:boras: cut s.t. in small pieces with a knife
fv:dow:
chop s.t. into smaller pieces while standing it on end
fv:het:
chop or hack with a machete
fv:kety:
slice open and clean an animal
fv:lary:
slice (like chiles, etc.)
25
fv:lilit:
fv:mair:
fv:simat:
shave or carve
to adze wood
pop out or cut out coconut meat
26
Semantic domain: [\sd Nkin, \sd Nplant, \sd Vcut, \sd Vspeak]. The use and
placement of this field marker within the SHOEBOX database is up to the user.
Some who use it regularly tend to put it near the front of the entry. Some users
place \sd directly following \ps, using \ps to indicate strict subcategorization
(e.g. \ps vt), and using \sd to indicate selectional restrictions (e.g. \sd Vcarry).
Here one tries to catalog the semantic categories relevant to the language, being
careful not to let the English force or mask the vernacular categories. The use of
this field greatly assists specialized analysis or extracting topical subsets of the
whole lexicon (e.g. publishing a special fascicle on plant terms). Several
domains can be listed in the one field, if relevant, or one can use a separate \sd
field for each sense. The contents of this field are not ordinarily printed, as it is
primarily for analysis. But if one chooses to print the \sd fields, MDF places
them toward the end of the entry, preceding the contents of the field with SD:
and follows the contents with a period. See Appendix C for a suggested starter
list of semantic domains and optional renderings.
\is
Index of semantics: Some MDF users have requested this field for correlating
vernacular terms with Louw and Nidas (1988) Greek-English 93 semantic
domain categories (many with additional subdomains). While useful for some
purposes (like translation of Greek-based materials), the compiler is cautioned
to remember that these categories are an etic checklist that may have no relation
to emic categories in the vernacular. This field could also be used for the
Human Relations Area Files [HRAF] categories from the Outline of cultural
materials (Murdock, et. al. 1982). A third system that could be used is that of
Hashimoto (1977) which provides an etic list of semantic domains that is more
compact than HRAF and less language specific than Louw and Nida. Reversing
on this field would yield semantically related entries grouped under the various
Louw and Nida, HRAF, or Hashimoto semantic domains. MDF precedes the
contents of this field with Semantics: and places of period following the
contents of the field.
\th
Thesaurus (vernacular): [\th utan]. This field is for the vernacular generic
term under which the headword is emically categorized by the people
themselves. For example, in Selaru, masy fish has a broader semantic range
than English fish because it also includes sea mammals and crustaceans.
Similarly, the Buru generic term manut, whose Austronesian reconstructed form
is glossed as bird, in Buru includes bats and other flying creatures like
butterflies whose wings are large enough and slow enough to see in flight, but
does not include most other insects. (See 8.1 for a discussion on folk
taxonomies). This field is useful for later analysis or extraction (using
SHOEBOX FILTERS) for separate publications of fish-type terms, flying
creatures, etc. The contents of this field may or may not correlate with a western
taxonomy or with the \sd field. It overlaps with \lf Gen(eric) =. MDF precedes
the contents of this field with Thes: and places of period following the contents
of the field.
Bibliographical reference: [\bb BDG 1991:328, \bb Schut 1917]. This field
references literature expanding on this lexeme. It is generally for grammatical
particles or lexemes of ethnographic significance. MDF places Read: before the
contents of this field and places period after.
\pc
27
The .G. marks this as a graphics link. Next follows the path and filename:
\pcx\eagle.pcx. Then the width of the picture desired for printing (here 1.5
inches), then the height (1), and finally the graphics format type (PCX). Each
bit of information is separate by a semicolon [;].
When the dictionary is formatted, the graphics information is moved to the
beginning of the entry, subentry or sense in which the \pc field is found. This
will cause the text to flow around the picture, which will be in a box. Sizes
much larger than 1.5 x 1.5 are not recommended. In double column format
the picture is placed flush right in the column; in single column format the
picture is flush right to the right margin.
If no .G. is found, then MDF assumes the contents of the field are a reference to
a book or notebook and simply prints the contents of the field enclosed in
parentheses.
Note fields:
\nt
Notes: This is a general note field that can accommodate comments related to
any field. It may be placed anywhere within an entry, subentry, or sense.
Punctuation and capitalization should be used as needed. If selected to print, the
contents of this field will be placed at the end of the entry or sense within
square brackets [Note: ...]. These fields are intended for the compilers use and
are not intended for printing, except for drafts. If the lexicographer wants to
distinguish different classes of notes, MDF recognizes the following fields:
\np
\ng
\nd
\na
\ns
\nq
[Phon: ... ]
[Gram: ... ]
[Disc: ... ]
[Anthro: ...]
[Socio: ... ]
[Ques: ... ]
\st
Status for editing or printing: [\st no print, \st done, \st check]. This field
can be used to later exclude entries that the informants have specifically
requested not appear (e.g. in the national language dictionary they may fear
28
Date entry was last edited: This housekeeping matter can be automated with the
SHOEBOX DATESTAMP feature. It is not normally printed.
\??
Unknown fields: Fields entered by the user that are not recognized by MDF are
placed within square brackets at the end of the entry and preceded by a double
question mark [?? ...]. These can be toggled to print or not print through the
Change Settings menu option (where they are called the (huh) fields).
2.2 Examples of lexical entries (raw SHOEBOX form and MDF output)
Some compilers could organize their data quite well if they were simply given a few
visual examples of how somebody else structures similar information and how MDF
formats it. A variety of examples are given below with little commentary. These should
be sufficient for many to go a long ways in compiling their lexical database and printing it
through MDF. Additional examples are sprinkled throughout this Guide along with
detailed discussion of relevant issues.
SHOEBOX lexical database
\lx
\ps
\ge
\dt
stife
vt
pour
2/Nov/89
\lx
\ps
\ge
\de
\dt
srapa
vt
slap
slap with open hand
27/Aug/91
\lx
\lc
\ps
\ge
\dt
-angu
na-angu
v
interwoven
29/Apr/93
[citation form]
na-angu (from: -angu) v. interwoven.
\lx
\hm
\ps
\ge
sau
1
vt
sew
[homonyms]
sau1 vt. sew.
29
\lx
\hm
\ps
\ge
\lf
\le
\dt
sau
2
n
anchor
Whole = waga
boat
17/Jul/93
\lx sau
\hm 3
\ps n
\ge fruit
\de8 succulent fruit (various),
including breadfruit,
rose apple, guava and
cashew fruit
\lx
\ps
\ge
\re
\de
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\et
\eg
ati
vt
twirl
twirl ; pick up with tongs
twirl, pick up s.t. with
tongs
Nact = anafina
woman
Nug = bia
starch paste
NugSpec = bia polon
sago paste
NugSpec = mangkau polon
cassava paste
NugSpec = bia mangkau
cassava paste
Ninstr = atit
sago paste twirler, tongs
*atip
pinch off
\lx
\ps
\ge
\ge
\cf
\ce
\sd
atit
n
tongs
twirler_(for_sago_paste)
ati
twirl
Ninstr
sau3
8SHOEBOX can be made to give hanging indents on the screen by setting the margins (for both v1.2 and
v2.0 under EDIT MARGINS) to Hanging Indent 5. Some find this gives a more orderly appearance.
Hanging indents in SHOEBOX do not effect the formatting in MDF.
30
\lx
\ps
\sn
\ge
\lt
\lf
\le
\sn
\ge
\lf
\le
\mr
\cf
\ce
gebhaa
n
1
husband
big person.
SynD = namorit
Rana dialect
2
clan_head
SynD = tean elen
Rana dialect
geba-haa
haa
big, important, loud
\lx
\ps
\sn
\ge
\re
\rf
\xv
emata
vt
1
kill
kill ; murder
C:89-3:27
Siro rohi pa emata
gebar telo dii.
The two of them stalked and
killed those three men.
Nug = geba
person
Nug = fafu
pig
Spec = seka
spear s.o./s.t.
2
extinguish
B:86-1:84
Da emata bana mele pothaki.
She extinguished the fire
lest it start a forest
fire.
Nug = bana
fire
ep-mata
mata
die
\xe
\lf
\le
\lf
\le
\lf
\le
\sn
\ge
\rf
\xv
\xe
\lf
\le
\mr
\cf
\ce
[multiple senses]
gebhaa n. 1) husband. Lit: big
person. SynD: namorit Rana
dialect. 2) clan head. SynD: tean
elen Rana dialect. Morph:
geba-haa.
See: haa
big,
important, loud.
31
\lx
\ps
\ge
\ue
\ee
\lf
\le
\lf
\le
\mr
\et
\eg
\es
\sd
\dt
\lx
\ps
\ge
\re
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\et
\eg
\es
\sd
\dt
32
ahut
n
wave ; rough_(sea)
Rana dialect.
Rana speakers use fv:ahut
to refer to rough seas when
they are down at the coast,
but it is taboo to use the
term up at the lake.
SynT = emhein
wave, rough (sea)
Sim = permitek
stormy seas
ahu-t
*qaRus
current
PANDYPMPL
Nnature
4/Mar/92
[usage]
ahut n. wave, rough (sea). Usage: Rana
dialect. Rana speakers use ahut to
fafu
n
pig
pig ; boar ; sow
Spec = faf tinan
sow
Spec = fafu bhasat
boar
Spec = faf anan
piglet
Spec = faf aba
wild (jungle) pig
Spec = faf fena
domestic (village) pig
Spec = fafu emlahat
domestic pig gone wild in
the jungle
Spec = fafu melaban
wild pig which has been
domesticated
Spec = faf Bali
short-legged domestic pig
imported since WWII
Spec = faf donit
fi:babirusa
*babuy
pig
PAND
Nanim
2/Nov/89
[generic noun]
fafu n. pig. Spec: faf tinan sow; Spec:
fafu bhasat boar; Spec: faf
anan piglet; Spec: faf aba wild
(jungle) pig; Spec: faf fena
\lx
\ps
\ge
\ps
\sn
\ge
\sn
\ge
\bw
foto
v
take photograph
n
1
camera
2
photograph
English?
\lx
\ps
\ge
\de
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\mr
\sd
\dt
agat
n
grain
grain (generic)
Nloc = hum kolon
grain bin
Spec = feten
foxtail millet
Spec = pala
rice
Spec = biskutu
corn
Spec = warahe
peanuts
Spec = kopi [L]
coffee
aga-t
Nagri
2/Nov/89
\lx
\ps
\ge
\gn
\re
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\mr
\et
\eg
\es
\sd
\dt
atet
n
thatch
atap
roof ; thatch
Sim = hum fafan
top of house, roof
Mat = bia omon
sago leaves
Mat = niwe omon
coconut palm leaves
Mat = mehet
grass
Prep = sau atet
sew thatch
ate-t
*qatep
thatch
PANDYPMPL
Ncult ; Nhouse
23/Oct/89
n. 1) camera. 2) photograph.
From: English?
33
\lx
\ps
\pn
\ge
\re
\de
\gn
\gr
\lf
\le
\ln
\lf
\le
\lf
\le
\lr
\lf
\le
\lr
\lf
\le
\ln
\lr
\lf
\le
\lf
\le
\lf
\le
\ln
\et
\eg
\es
\sd
\dt
\lx
\ps
\ge
\lf
\le
\lf
\le
\mr
\cf
\ce
34
ama
n
kb
F
father ; uncle
father, uncle; male of
first ascending generation
of egos natal fv:noro or
anyone egos mother can
call fv:naha brother
bapak ; ayah
papi
Gen = geba emtuat
parent, elder
orang tua
Spec = ama ebanat
birth father
Spec = ama haat
fathers oldest brother
bapa tua
Spec = ama roin
fathers youngest brother
bapa kacil
Spec = ama kete
father-in-law
bapak mertua
bapa mantu
Spec = ama tiri
stepfather (due to remarriage)
Sim = tama
forefather of a lineage
Cpart = ina
mother
ibu
*ama
father
PANDYPMPL
Nkin
2/Apr/92
kadefun
n
seat
Syn = elepteat
seat
SynL = kadera
chair, seat
ka-defo-n
defo
stay, sit
stay, sit.
\lx
\ps
\ge
\re
\de
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\lf
\le
\sd
\dt
ego
vt
transfer
transfer ; carry ; bring
; bear ; take ; get ; seize
; obtain ; grasp ; fetch
; marry
transfer control, location
or affiliation; get, take,
carry
Spec = gao
grasp in hand, carry
(e.g. a spear)
Spec = wada
carry (bulky thing) on
shoulder
Spec = leba
carry on shoulder with a
pole
Spec = renge
carry s.t. in a basket on
ones back using a tumpline
(headstrap)
Spec = eplabuk
carry on back using
shoulder straps
Spec = tolfafak
carry s.t. on head
Spec = pinu
carry with strap over
shoulder (e.g. hunting
pouch)
Spec = baba
carry a child on ones side
with a carrying cloth
Spec = sgera
carry a child with its legs
straddling ones hip
Spec = slolo
carry a child in ones arms
Spec = sgege
carry s.t. under ones arm
Spec = edaba
carry gifts on shoulder
in procession
Vcarry ; Vput ; Vexchange
28/Aug/91
35
\lx
\ps
\ge
\re
\de
\lf
\le
\mr
\cf
\ce
ba elalek
vt
faithful
faithful ; believe
(strong sense)
faithful, believe
(strong sense)
Sim = nanuk
think, believe (weak sense)
ek-lale-k
lalen
inside
[cross-reference]
ba elalek vt. faithful, believe (strong
sense). Sim: nanuk think, believe
(weak sense). Morph: ek-lale-k.
See: lalen inside.
mitet black.
\lx huma
\ge house
huma house.
However, there are several conditions in which the form of the gloss desired for
interlinearizing is different from that desired for reversal. The first is when one form will
suffice for all instances in interlinearizing, but several forms are desired for reversal, as
illustrated below. Multiple options in the gloss fields slow down interlinearizinga
single form is inserted automatically, but multiple forms cause SHOEBOX to pause in
order to let the user select the appropriate choice before resuming. Furthermore, in
glossing strategies, a single form used consistently to gloss a word interlinearly (where
legitimate) will more faithfully show the emic unity of a language than will a variety of
36
etic forms. Under these conditions the reversal fields are used to indicate the variety of
surface forms desired for the reversal.
One emic form sufficient for interlinearizing
\lx huma
\ge house
\re house ; hut ; building
; dwelling
\lx aan
\ge jaw
\re jaw ; chin
\lx baa
\ge only
\re only ; exclusively ; just
Dictionary
English finderlist
huma house.
building
dwelling
house
hut
huma
huma
huma
huma
aan jaw.
chin
jaw
aan
aan
baa only.
exclusively baa
just
baa
only
baa
A second condition is that in which an abbreviation is desired for interlinearizing, but not
for the reversal. This is often desired for grammatical particles, but occasionally simply
because the unabbreviated gloss would stretch out the interlinearization inordinately. In
this case one would use the contents of the \ge field for interlinearizing, the \re for the
English reversed finderlist, and the contents of the \de field for the printed dictionary.
Abbreviation preferred for interlinearizing:
\lx
\ps
\ge
\re
\de
saro
PRO
REC
reciprocal
reciprocal
\lx
\ps
\ge
\re
\de
utan
n
veg.
vegetable
vegetable
Dictionary printout
saro PRO. reciprocal.
utan n. vegetable.
A combination of both conditions of one form sufficient for interlinearizing and the
preference for abbreviations for interlinearizing is common, particularly in certain
semantic domains, or certain parts of speech:9
9Notice the preferred pattern for multiple glosses in either the gloss fields or the reversal fields of space-
semicolon-space [gloss ; gloss]. This allows MDF to later convert these sequences to comma-space
[gloss, gloss], without changing other sequences of semicolon-space [text; text] that are desired for other
purposes.
37
Kin terms:
\lx
\ps
\ge
\re
\de
\lf
\le
\lf
\le
\sd
Dictionary printout
ina
n
M [mother]
mother ; aunt
mother, aunt; any female of
the first ascending
generation of egos fv:noro
Spec = infalin
mothers younger sister
Cpart = ama
father
Nkin
Pronouns:
\lx
\ps
\ge
\re
\de
ringe
PRO
3s [3rd
he ; she
he, she,
singular
pers. sing.]
; it
it; third
subject pronoun
subject pronoun.
Deictics or directionals:
\lx
\ps
\ge
\re
\de
dii
DEIC
DIST [distal]
that ; there ; then
that, there, then; distal
in space, time, or
reference
Word-level glosses: These fields are used if the compiler needs morpheme-level glosses
for some purposes and word-level glosses for other purposes. (See 4.6).
Definitions and the definition fields: Definitions represent a serious attempt to
characterize the meaning of a lexeme in a precise way. Loose definitions tend to be
expanded glosses or prose explanations of the lexeme. If present, the definition fields are
printed in the dictionary. If not present, the contents of the gloss fields (\ge, etc.) are
printed instead.
38
\lx
\ps
\ge
\re
\de
ama
n
F
father ; uncle
father, uncle; male of
first ascending generation
in the fv:noro of egos
primary affiliation, or
in the natal fv:noro of
egos mother
\ee Includes biological and
classificatory fathers
\sd Nkin
39
unified whole. Like any school of thought this requires an investment of time and energy
to master and use well. The metalanguage may be awkward for the uninitiated, but
extremely powerful to those who become familiar with its use. Those who have invested
in mastering it, however, must take special care not to lose the broader audience.
An example of using a natural semantic metalanguage is in Wierzbickas (1991:100104)
summary her discussion of the Javanese term tok-tok (defined in Horne 1974:178 as to
pretend):
I dont want to say what I think/know
I dont have to say this
I can say something else
The following principles are generally subscribed to in relation to definitions, several
being particularly relevant to monolingual dictionaries:
1)
Only words accounted for elsewhere should be used in a definition (in monolingual
dictionaries). This does not necessarily mean that all words used in a definition
should be themselves defined, because of the problem in principle #4.
2)
Definitions should not be circular. For example, sugar should not be defined in
terms of sweet, and then sweet also defined in terms of sugar; or pain should not be
defined in terms of hurt, and hurt also defined in terms of pain.
3)
4)
Eventually some words are found to be indefinable. These are occasionally referred
to as semantic primitives, and occasionally lexical universals.10
5)
The word being defined should not be used as part of the definition.
6)
As much as possible, definitions should use familiar, high frequency words rather
than use obscure or archaic words or technical jargon.
7)
The most fundamental or essential parts of the definition should be expressed first
(e.g. genus, species, and primary differentiae). Expansions can follow (using \ee).
8)
The form of the definition should match the part of speech of the headword for
major word classes. Nouns should be described by noun phrases, and verbs by
10The strong view of lexical universals holds that all languages have a lexical explication of the declared
set of universals. The weak view holds only that the set of so-called universals has been demonstrated to
provide convenient building blocks for definitions, and have lexical explication in most languages.
40
to_sail
a_sail
to_comb
a_comb
This pattern has many disadvantages: 1) it lengthens the gloss for interlinearizing; 2) it
adds redundant information to what is already in the \ps field (the \ps field will tell
whether it is a noun or a verbthe gloss does not need to repeat this); 3) it will not
reverse under sail as desired, but under to, resulting in possibly hundreds of verbs
clustering under to, and hundreds of nouns under a in a reversed finderlist; and 4) it
usually misrepresents the vernacular form in the \lx field, which is seldom an infinitive. A
routine to strip out the to before reversal would have to be sophisticated enough to leave
any legitimate to in the gloss, or in other fields.
TIP: Always remember that the reversal process will sort on the first word of each
gloss unit in the \ge or \re fields.
Thus, the reversal fields provide for reversing on more forms or more specialized forms
than those found in the gloss fields. For example, English reversals might include, \re
basin, wash; \re aunt, maternal; bamboo sp(ecies). A morphologically complex
national language such as Indonesian could, for membersihkan, be entered as \rn bersih,
mem*kan so the reversal will be indexed by the root in the national language
finderlist.
In any lexical database there are probably certain records that should be excluded from
the reversed finderlists. For example, minor entries might be excluded from finderlists
(because they are variant forms and contain little information anyway, so there may be no
point in referencing such an entry). Or the entry might be a functor of some kind which
really cant be given a gloss that could serve as a decent reference form in a finderlist
(e.g. 3sPOS).
41
TIP: For each entry, subentry, or sense that you want excluded from the reversed
finderlists, place an asterisk (*) in the \re and \rn fields. Many bound morphemes or
minor entries (variants) do not need to be reversed, if the information contained in
them is redundant with fuller entries.
\lx
\ps
\ge
\re
\de
\gn
\rn
\mn
-a
GEN
3sPOS
*
his, hers, its
-nya
*
-na
If more than one \re or \rn field is needed in a record section, it can be done in one of two
ways: either in separate fields, or separated by a semicolon with a space on each side:
\lx
\ps
...
\rn
\rn
nelnyely
n
OR
kebersihan
bersih, ke-*-an
\lx nelnyely
\ps n
...
\rn kebersihan ; bersih, ke-*-an
In either case the reversing print tables would create two entries in the national language
finderlist for nelnyely.
NOTE: The national language reversing process is completely separate from the English
reversing process. This means that \rn fields operate independently from \re fields. So
just because the compiler chooses to use two \rn fields (as in the example above) does not
mean there must be two \re fields, and vice versa. For English, a gloss like cleanliness
would be adequate for glossing text, defining the lexeme in a dictionary, and reversing the
English list. The record would look like this:
\lx
\ps
\ge
\re
\de
\gn
\rn
\rn
\dn
...
42
nelnyely
n
cleanliness
kebersihan
kebersihan
bersih, ke-*-an
OR
\lx
\ps
\ge
\re
\de
\gn
\rn
\dn
...
nelnyely
n
cleanliness
kebersihan
kebersihan ; bersih, ke-*-an
One important comment about this record: the gloss kebersihan occurs twice in the
record, once in the \gn field and once in the \rn field. This is necessary because the user
wants to reverse on both kebersihan and bersih, ke*an. Once an \rn field is detected as
having data (i.e. \rn bersih, ke*an), the reversing program ignores all \gn fields in that
section of that record. The reversing program will not take information from both \gn and
\rn fields out of the same section of a record. So, once the user decides to reverse on
bersih, ke*an, then kebersihan must also be added (since in this case both forms are
felt to be needed in the finderlist).
This restriction on \rn, \gn fields also applies to \re, \ge fields.
2.3.2 Understanding the relationship between the \ge, \re and \de fields
Three points summarize earlier information:11
1)
Only the contents of the \ge field are used for interlinearizing.
2)
The \ge and \de fields are used for printing the main dictionary. \re is ignored for
this purpose. If there are contents to a \de field, then that will be printed in the
dictionary entry, and the contents of the \ge field will be ignored. Otherwise the
contents of the \ge field will be printed.
3)
The \ge and \re fields are used for the reversed English finderlist. \de is ignored for
this purpose. If there are contents to an \re field, then that will generate entries in
the reversed finderlist, and the contents of the \ge field will be ignored. Otherwise
the contents of the \ge field will be used.
11The same relationship described in these points here holds for the national language bundle of fields.
43
If one form will work for all three field functions (interlinearizing, dictionary, reversal),
then only the \ge field should be used:
\lx aken
\ge gallbladder
aken gallbladder.
If the information in the \re field is desired in the main dictionary, then it should be
reproduced and reformatted in the \de field:12
\lx
\ge
\re
\de
aan
jaw
jaw ; chin
jaw, chin
If the information in the \re field is desired in a different form for naturalness, then the
changes should be in the \de field.
\lx
\ge
\re
\de
alih
charge
charge (take)
take charge
\lx
\ge
\re
\de
bolo
(bamboo)13
bamboo sp.
k.o. bamboo
If more information is desired than is appropriate for the \ge and \re fields, then that
should be in the \de field:
\lx
\ge
\re
\de
ahut
wave
wave ; rough
wave; rough (sea)
\lx
\ps
\ge
\re
\de
a
PRO
1s
I
I; first person singular
subject proclitic
proclitic.
12The reason for choosing to not put both jaw and chin in the \ge field in this example is so that the
SHOEBOX interlinearizing function can automatically fill in the gloss and move on. This is faster than
having the program stop to ask the user to choose between jaw and chin each time aan is encountered in
a text. If the stop-and-choose method is not seen as an inconvenience, then it is simpler to put both
glosses in the \ge field and dispense with the \de and \re fields.
13Some find it convenient for interlinearizing to enclose a generic term in parentheses to indicate kind
of x, thus avoiding multiple word glosses. Similarly (name) can be used as the gloss for a persons name,
(place) for a place name, etc.
44
\lx
\ge
\re
\de
alik
peel
peel ; strip off (skin)
peel s.t. by hand with
intent to use resulting
core; strip skin or husk
off s.t. by hand
It should now be clear that what one puts in the \de field is not limited to definitions in
the strict denotative sense.
2.4 Understanding the hierarchical structure of an entry
Because of the nature of the computer tools that drive MDF, it has been necessary for
MDF to superimpose a hierarchical structure that is flexible enough to meet most needs.
The field codes that are relevant here are \lx, \ps (\pn), \sn, \se. Each of these sections or
subsections can take a full set of field markers.
Multiple parts of speech (\ps) in an entry are used to organize sections within an entry. In
many cases there is a clear relationship between a word functioning in different syntactic
slots within a sentence as a noun, a verb, or a preposition, as between shower (v) and
shower (n), and between rain (v) and rain (n). These are often clearly related to each
other in meaning and have functional complementary distribution, and thus should not be
handled as homonyms (see chapter 9 for a more detailed discussion of this and related
issues). MDF starts a new \ps within an entry on a new line, preceded by an em-dash. If
an entry is substructured in this manner, then sense numbers (\sn) are not needed unless
to further substructure the part of speech (as in the second example below).
\lx anchor
\ps n
\de instrument attached to a
rope or chain for
preventing or minimizing
the movement of a boat when
it is not tied at dock,
usually by friction along
the ocean or lake bottom
\ps vt
\de action of using such an
instrument
Sense numbers (\sn) are also used to organize sections within an entry. Multiple senses
should be grouped under the relevant parts of speech. Multiple senses in each separate
part of speech should start with 1.
45
\lx
\ps
\sn
\ge
\de
\sn
\ge
\sn
\ge
\de
lexeme
n
1
gloss
definition
2
gloss
3
gloss
definition
\lx
\ps
\sn
\ge
\de
\sn
\ge
\de
\sn
\ge
\ps
\sn
\ge
\sn
\ge
\de
\sn
\ge
\de
lexeme
n
1
gloss
definition
2
gloss
definition
3
gloss
v
1
gloss
2
gloss
definition
3
gloss
definition
n. 1) definition.
3) definition.
lexeme
2) gloss.
3) gloss.
v. 1) gloss.
3) definition.
2) definition.
Some lexicographers want to make fine distinctions between subsenses. The principles
for justifying subsenses are the same as those for justifying senses (see 6.3); the
difference is one of degree or scope. Subsenses are more related to each other than they
are to other senses. These can be handled in MDF in the \sn field with subcategorization
using a, b, c, etc.
46
\lx
\ps
\sn
\ge
\de
\sn
\ge
\de
\sn
\ge
\de
opon
n
1a
grand_kin
grandparent, grandchild;
reciprocal term of plus or
minus two generations
1b
ancestor
ancestor, descendant
2
master
master, lord, owner; the
one with the say over s.o.
or s.t
Subentries (\se) provide a further level of hierarchy. These are commonly built around
polymorphemic forms in a root-based dictionary (see 4.6 for extended discussion). Note
that while information might be organized as follows during the early years of contact
with a language, the information for brushcutter below should eventually be separated
out and placed elsewhere as it is not lexically related to this headword.
\lx
\ps
\ge
\de
\se
\ps
\ge
\se
\ps
\ge
\de
\se
\ps
\ge
\de
\ps
\ge
\de
brush
n
gloss
definition
hairbrush
n
gloss
paintbrush
n
gloss
definition
brushcutter
v
gloss
definition
n
gloss
definition
brush n. definition.
hairbrush n. gloss.
paintbrush n. definition.
brushcutter v. definition.
n. definition.
47
\lx
\ps
\sn
\ge
\de
\sn
\ge
\de
\se
\ps
\ge
\se
\ps
\sn
\ge
\de
\sn
\ge
\de
\se
\ps
\sn
\ge
\sn
\ge
\dt
\lx
\ps
\ge
\ee
\se
\ps
\ge
\de
\se
\ps
\ge
\se
\ps
\ge
\dt
bersih
adj
1
clean
be clean, not dirty or
messy
2
innocent
be innocent, without fault
kebersihan
n
cleanliness
membersihkan
vt
1
clean_up
clean s.t. up
2
purify
purify, repent or renounce
immoral actions
pembersih
n
1
cleanser
2
janitor
17/Jun/92
bren
vi
play
Implies lack of focus or
purpose.
brenak
vt
play_s.t.
play a game, or play with
s.t
inabren
n
recreation ; entertainment
rabrenak
n
toy
17/Jun/92
messy.
2) be
innocent,
without fault.
kebersihan n. cleanliness.
membersihkan vt. 1) clean s.t.
up. 2) purify, repent or
renounce immoral actions.
pembersih n. 1) cleanser. 2)
janitor.
purpose.
brenak vt. play a game, or play
with s.t.
inabren
n.
recreation,
entertainment.
rabrenak n. toy.
Summary: The \se and \ps fields begin the new subsection of an entry at a new line. The
\sn field continues on the same line. The relative hierarchy is as follows:
48
\lx lexeme
\ps part of speech
\sn sense number, \sn sense number
\ps part of speech
\sn sense number, \sn sense number, \sn sense number
\ps part of speech
\se subentry
\ps part of speech
\sn sense number, \sn sense number, \sn sense number
\ps part of speech
\sn sense number, \sn sense number
\se subentry
\ps part of speech
\sn sense number, \sn sense number
\ps part of speech
\se subentry
\ps part of speech
\ps part of speech
\sn sense number, \sn sense number
The \lx and \ps fields are the only ones that are minimally required for structuring entries
(along with \ge, etc. to give useful information within the structural hierarchy of an
entry). \se and \sn should only be used as they are appropriate for substructuring an entry.
2.5 Direct character formatting within a field
All fields are given a basic character style when printed. For example the \ge field is
marked as being English, the \gn field is national language character styles. Fields
marked as vernacular include all of the cross-reference type fields \cf, \sy, \an, etc., as
well as the obvious ones: \lx, \se, \xv, etc. Because the data within each of these fields are
in a single language there is little problem in assigning character styles to them
automatically. The contents of the entire field is given the same typeface. But the world is
not so easy for free-form discussion type fields, and so MDF provides for direct
character formatting in any field.
Although free-form fields are also given a basic character style (e.g., the \ue field is
marked as English), they often contain words or phrases in the vernacular because they
are designed for discussion of the vernacular language. This vernacular text is set off
from surrounding information in a discussion field by preceding the vernacular word with
the code fv: (for font-vernacular). The print tables use this code to apply the vernacular
character style to the word that follows it.
49
How it prints:
Usage: The kin term wai is used for ...
TIP: For this type of coding to work, there must not be any space between the colon (:)
and the following text (this distinguishes the language code from normal punctuation),
and the code must be in lower case (i.e. fv:, not FV: or Fv:).
Be sure to place the code with the word inside punctuation (parentheses, quotes, etc.).
Otherwise the punctuation will receive the character style along with the word. For
example, if you want to print: ...during a hunt, the dogs (asure) go out ahead..., the
vernacular occurs in parentheses; encode this as ...dogs (fv:asure) go... and not ...dogs
fv:(asure) go...
If the vernacular text is a phrase, the phrase should be linked together with an underline
character: using fv:mbwai_ka in most cases ... The print tables would then apply the
character style to the whole phrase, changing the underline character to a space in the
process.14
The character styles do not flow across punctuation. Thus, character formatting codes
must be placed on both sides of the punctuation. For example, fv:peni/fv:beka prints as
peni/beka, whereas fv:peni/beka prints as peni/beka.
Character styles for other languages are set off as follows:
fn: for the national language (i.e. font-national)
fe: for English (i.e. font-English, if ever needed)
fr: for the local regional language (i.e. font-regional)
The uc: code is able to detect which type of field it is used in. If the field is a vernacular
field, uc: will underline with bold characters (following the vernacular character style); if
the field is for the national language, uc: will underline italic characters; and if the field is
for English, uc: will underline normal characters. If specific control is required, use ui:
and ub:.
14Alternatively one could add an fv: before each word in the phrase, but this increases the typing load.
50
All of these codes are to be used in the same way as described for the fv: code.
To reiterate what was said above, character style codes are unnecessary in most fields
because the field contains only one type of data (e.g. the national language gloss in the
\gn field does not need to be marked as national language). Such fields are converted to
the appropriate character style automatically. Direct character formatting codes are used
only in general information fields or discussion free-form fields where language data and
discussion are mixed.
TIP: Use these codes to keep language styles consistent throughout your dictionary.
Where possible, using the codes based on function (e.g. fv: fn: sc:) is preferable in the
long-term over using the codes based on form (e.g. ub: ui: uc:). This function-based
strategy facilitates uniform editorial changes and systematic upgrades to future
generation computer software.
The use of the uc: underline code is very helpful in example sentences that focus on
particles, functors, affixes, etc. In an Indonesian dictionary the entry \lx di might contain
the example sentence Bukunya tidak ditaruh di atas meja ini. This is encoded:
\xv Bukunya tidak ditaruh uc:di atas meja ini.
So, even though the sentence has two di morphemes in it (di-taruh [verbal prefix] and
di [preposition]), the underlining is used to mark the lexeme in question.
Underlining affixes often poses a problem. For example, if the third person singular
pronoun possessive suffix is -a, it needs to be underlined in a sentence such as Aulopoa
aua lae weidu, because there is another word that ends in a. But, because a is only
part of a word, underlining it with uc: will not work. To underline the a we must resort
to the rather inelegant bar code and curly braces:
\xv Aulopo|u{a} aua lae weidu.
The |u marks the bracketed character as underlined and bold (a type of vernacular style).
Note that these braces can be used to enclose any number of letters; this code is not
restricted to use with just single letters.15 When using this code be sure to include the
closing brace!! If you forget it, the rest of your dictionary will be underlined! For this
very reason the colon type of character style codes were developed. The bar code |u{} like
uc: can determine what type of field it is in and adjust the underlining to match the
surrounding character style.
15In fact, this is the general underlying form the code un: produces on the word and phrase level when
the lexical file is being formatted for conversion over to a WORD document.
51
2.6 Punctuation
Leave off all punctuation at the end of straight data fields (\ps, \ge, \cf, etc.). The only
places where punctuation should be included is in and at the end of free-form (discussion
type) fields (\ue, \ee, \nt, etc.). All other field-final punctuation is added by the
conversion process automatically.
For some national languages, such as French, there are orthographic conventions that
encourage the use of special characters for punctuation. Some compilers use the chevrons
in their SHOEBOX database to indicate double quotes for French and for the
vernacular in French-speaking countries. However, MS-WORD reserves these characters
for the macro language and the computer reacts to them differently than to other
characters, giving messages and inserting asterisks in the text when importing the
formatted file into WORD from MDF. We recommend using the Anglo-centric option of
double-quote marks , which is an alternative punctuation convention for French. Once
the file is imported into WORD, then the double-quotes can be replaced by chevrons if
desired.
52
database. Your database is simply read and the needed information extracted to
another file where further processing is done.
CAUTION: If your lexical database does not use the standard field codes recognized
by MDF, do not use this program yet. First convert your lexical field codes to this
standard (as explained in chapter 2). This conversion only has to be done once and
enables the user to tie into all of the formatting power and flexibility that MDF
provides. Converting your codes can be done with a CC table or by using the EDIT
REPLACE feature of WORD.
3.1 Familiarizing yourself with the program
First, test the way MDF is set up on your computer and how it interacts with your
particular word processor by using MDF with the sample file provided on the release
disk, called MDFSAMPL.DB. You can look at this file in SHOEBOX (or in a word
processor if you do not make any changes and save it again as text only), and then
process it in MDF by using the following command:
C:\MDF>mdf mdfsampl.db<ENTER>
Try the Format dictionary and then English finderlist options to become familiar with
the various menu options MDF provides. Answer the questions prompted by MDF on the
screen. The vernacular language in MDFSAMPL.DB is Selaru and the national language
is Indonesian, but for becoming familiar with the program you can fill in whatever you
like, including the vernacular language and national language appropriate to your
situation. This database has also been formatted through MDF into a triglot dictionary
with examples and notes (file MDFSAMPL.DOC on disk) for you to view directly
through WORD. A formatted English reversed listing is also included (file
MDFSAMPL.ENG). Together these will give you some idea of how MDF interacts with
the database file to produce the formatted document.
53
Before you try out MDF on your full-sized lexical database, we recommend you make a
sample database of about 4050 records copied from your main database. (If you use
WORD to do this, save the sample database as text only).1 Run this sample database
through MDF, selecting the different configurations available and saving the results to
different filenames; and then print the different output files to see which format you like
best. This suggestion applies to the formatted dictionary as well as to the national
language and English finderlists.
3.2 Requirements and limitations
The current version (1.0) of MDF is set up for WORD-for-DOS v5.0, v5.5, or v6.0 and
WORD-for-WINDOWS (WINWORD v2.0 and v6.0).2 You will be asked to specify your
word processor. In order to run, MDF needs to know the full filename of your lexical
database. If the database is not in the MDF directory, include the path. For example, if
LEXICON.DB is in the C:\SAWAI subdirectory, type:3
C:\MDF>mdf \sawai\lexicon.db
When MDF starts, it will ask you to specify the version of WORD you are using. (Use the
arrow keys and <ENTER> to select it.) If you prefer to specify this from the command line,
the following exemplifies how to do it:
C:\MDF>mdf
C:\MDF>mdf
C:\MDF>mdf
C:\MDF>mdf
C:\MDF>mdf
lexicon.db
lexicon.db
lexicon.db
lexicon.db
lexicon.db
v5
v55
v6
win2
win6
The MDF program can have trouble merging documents in WORD v5.5 and WORD v6.0
simply because the glossary files used by those programs assume a default keyboard setup
for each version of WORD. If the user has configured the keyboard in WORD to be
different from the default configuration, MDF may malfunction at the point where
WORD is called. So this is one reason we recommend testing MDF on a small section of
1Be sure to turn off automatic pagination and autosave before you load your lexicon. If you happen to
alter the lexical file in any way, autosave will save a temporary copy of the file in WORD format (even
though the file is text only) and this takes years for large lexicon files! Auto-pagination inevitably
slows the program down.
2If the user specifies WINWORD as the word processor, MDF will format, split, and convert the
database files to WORD documents, but makes no attempt to merge them (because MDF cannot access
WINWORD). The user will need to exit MDF and load each document file into WINWORD manually
for merging and printing. For WINWORD, formatted dictionaries are named DICTN*.DOC, English
reversed lists are ENGLS*.DOC, and national reversed lists are NATNL*.DOC.
3We are aware that there is some overlap between the material in this section and that in chapter 1. The
overlap is intentional.
54
your lexicon to see that all is working well before trying to process your whole lexicon. If
MDF does not work properly, exit MDF, reconfigure WORD to its default settings, and
try MDF again.
Although most users will be quite pleased with the results, MDF is not a sophisticated
program (from a computing point of view). It requires some user care. Be sure there is
enough free space on the default drive to process your dictionary and finderlists. A safe
size is at least four times the size of the original lexical database. This should give enough
space for the working files as well as the final document files for the formatted dictionary
and finderlists. Using MDF on a floppy drive would be unwiseit will probably not
know when it has run out of room.
The MDF program reserves the filenames DICT*.*, ENGL*.*, and NATN*.* for its own
use (to create the formatted dictionary, the English reversed list, and the national
language reversed list, respectively) as well as SPLIT*.* for some working files. Please
do not use these filenames for your own work (especially within the default directory
where MDF resides). Files with these names will be deleted by MDF!
MDF must be able to find the MS-DOS program SORT.EXE. If it is unable to find
SORT, it will not be able to run properly. To test if MDF will be able to find SORT, type
DIR | SORT at the DOS prompt:
C:\DICT>dir | sort
If this gives an alphabetized listing of the files on the default directory (the bytes free
line is also sorted to the top), then all is okay, but if the files are not sorted alphabetically,
then the SORT program is not available. You will need to either specify a path that makes
SORT accessible, or you will need to copy SORT to a place where it can be found (such
as the directory where MDF and its associated files are).
MDF must also be able to find your word processor. MDF assumes that your word
processor subdirectory is specified in the PATH command of your AUTOEXEC.BAT file
and that your word processor is named WORD.EXE. If you have more than one version
of WORD installed and have renamed the files (e.g. WORD5.EXE and WORD6.EXE),
make sure the version you want to use with MDF is named (or renamed) to WORD.EXE.
Make sure that particular subdirectory is added to the PATH command in
AUTOEXEC.BAT. To check this, from the MDF subdirectory type:
C:\MDF>word<ENTER>
C:\MDF>win winword<ENTER>
If your word processor comes up, then the setup is as it should be.
55
Of the six choices here the first four are relatively transparent. The last two options
require some exlanation and are addressed first.
3.3.1 Change Settings
The MDF program is set up from the factory to exclude certain lexical fields from the
formatted vernacular dictionary. [NOTE: creating finderlists makes no use of these
settings]. The excluded fields are:
\we
\wn
\wr
\re
\rn
\rr
\xg
\sd
\is
\th
\es
\ec
\so
\st
\dt
(index of semantics)
(thesaurus)
(etymologysource)
(etymologycomment)
(source)
(status)
(datestamp)
MDF also excludes all unknown fields (i.e. fields not found in the standard set given in
the accompanying guidelines). These are coded in the settings file with (huh). MDF by
default also excludes all SHOEBOX created fields (\_no, etc.). All other fields are printed
if present.
The default settings can be modified either by excluding fields that would normally print
or by including any of the above fields that normally would not print.
TIP: Before using the Change Settings option users should familiarize themselves
with the built-in formatting options that MDF provides through answering a number of
MDF-prompted options after selecting Format Dictionary as explained below.
Selecting Change Settings will call a simple text editor (TED.COM) and load a CC
table file which you modify. How it is to be modified is explained in the file, but basically
56
you add a c to the beginning of the line of any field you dont want to print, and remove
the c from the beginning of the line of any field you do want to print. (The c means
comment or ignore). Keeping things lined up is not important.
Save the file by exiting (F7Exit) and <ENTER>. Your changes will be used to create a
new settings file. Later when you want to format your vernacular dictionary, select
Format Dictionary from the menu. Your new settings will be used to create the
formatted dictionary.
Before the dictionary formatting process begins, you have the following options:
1)
Excluding example sentences (this would exclude the \rf, \xv, \xe, \xn, \xr, \xg
fields)
2)
Excluding your notes (this would exclude the \nt, \np, \ng, \nd, \na, \ns, and \nq
fields).
These formatting choices supersede the settings file for discarding fields. But if the
settings file is set to discard, say, the \rf field, choosing to include example sentence fields
does not override the settings file and cause the \rf field to print. Only the \xv, \xe, \xn,
and \xr fields would be output in this case. These options allow you to quickly alter an
output format for a particular audience (e.g. the dictionary for a national audience would
normally not contain your notes, whereas your own printed copy would), without having
to go through the Change Settings menu option and mark each of the example sentence or
note field codes to be ignored.
3.3.2 Reset
This menu choice simply restores the settings file back to its original from the factory
form. This resets which fields are excluded from the dictionary back to the ones listed
above in 3.3.1.
3.3.3 Format Dictionary
NOTE: For users of SHOEBOX v1.2x (and earlier), your database does not need to be
compacted before using MDF. The file is resorted anyway to order homonym
numbers correctly.
While formatting a dictionary in MDF is a fairly fast and automatic process, it is by no
means simple. The following describes in more detail what actually goes on behind the
scenes. Each of these steps is performed by MDF automatically and relatively quickly.
When MDF is processing your dictionary, it produces several intermediate files, but
without altering your original lexical database. The first step is to throw out every field
3: An introduction to the MDF program
57
that you have specified in the settings file that you do not want (see 3.3.1). MDF puts a
dot on the screen for every record it processes.
The output file is then sorted, taking into consideration homonym numbers.4 This second
step is necessary because SHOEBOX sorts only on the KEY field contents. With
homonyms, key fields are identical (see 6.3), and SHOEBOX assumes therefore that
there is no particular order for such records. In fact, SHOEBOX reverses the order of
homonyms each time it compacts the file. So there is no point in worrying about keeping
the homonym records in numerical orderyou just cant.
Now since homonyms are marked as 1, 2, 3, etc. and it would look rather odd to have sets
of homonym entries printed in random orders, MDF sorts them on both the \lx field and
the \hm field. (see also 5.4.1).
This sorting process uses the Text Analysis [TA] program SRT.EXE supplied with the
MDF release. The default sort order is in the file MDFDICT.ANS. The sort order may be
modified (outside of MDF) using the TA program ANSQ.EXE (this will be important for
users with digraphs or other complex orthographic issues). Changing the sort order is
explained in the documentation that comes with the ANSQ.EXE program. An alternative
means of changing the sort order in MDFDICT.ANS is explained in 5.4.2. But, for MDF
to function properly, the @ symbol must be sorted first. This symbol is used to sort a
dummy record to the beginning of the sorted file. This first record contains setting
information used by MDF later in the formatting process. This extra record also causes
SRT to give a record total that is one greater than the actual number in your lexical
database. For MDF to function properly, the MDFDICT.ANS file must contain the line
that tells SRT to use both the \lx and the \hm fields when sorting:
\rkey lx hm
Once sorted, the database file is then processed by a large CC table to convert it to a file
with all of the necessary paragraph and character style codes assigned to the appropriate
bits of text, with new letter sections added, with odd-even running footers, and all of the
other things necessary to get it ready for moving over to WORD.
The output of this CC table is then split into smaller, more manageable files, called
SPLIT01.TMP, SPLIT02.TMP, etc. These are then input into the Convert-to-Word
[CTW] program one by one.5 The CTW program then does some serious crunching on
the files to produce a series of printer-ready WORD documents (still in pieces). These
4The homonym number applies to the entry citation form if there is no \lc field. Then the \hm number
applies to the \lx form if there is a \lc field present. Then the \hm number references the \lc field, not the
\lx field. The user must keep this distinction in mind.
5CTW is a good program, but because it is limited in the size of the input and output file, the database
58
document files are called SPLIT01.DOC, SPLIT02.DOC, etc. and they must then be
merged back together in WORD.
The final step loads a sorted list of the split document files into WORD. This list is used
to remerge the files. The merged document is then loaded into WORD for your perusal.
This file is given the temporary filename MDFXXX.TMP. After the file has been viewed,
simply quit WORD, and MDF will change the temporary name to the name DICT.DOC.
(You will be notified of the new name by MDF). If you wish, you can rename the
MDFXXX.TMP file to something else from within WORD (v5.0 use TRANSFER
RENAME; v5.5 or v6.0 use SAVE AS). Renaming the file will not affect the MDF program.
It assumes that if MDFXXX.TMP no longer exists, you must have already given it
another name.
Once merged, the dictionary is basically ready for printing (though you may desire to
make cosmetic changes). This process from a standard format lexical database to a
printer-ready document is relatively automatic. It takes MDF about 13 minutes to format a
vernacular-English diglot dictionary from a 791K lexical database with 2,044 records
(many of them complex) on a Toshiba T1900 laptop (a 486SX20MHz machine). It takes
over 45 minutes on a PCXT. The following example illustrates a triglot printout.
Sample SHOEBOX Records
\lx
\ps
\ge
\gn
\rf
\xv
\xe
\xn
\rf
\xv
\xe
\xn
\ee
abat
n
grove
dusun
d2.077.03
Kbwai abatke ti ksweruk
nurare.
I went to the coconut
groves to clear the grass.
Saya pergi menyiangi dusun
kelapa.
d4.079.16
Kbwa ti ktwan nurke o
abatke.
Im going to plant coconut
trees in the grove.
Saya pergi tanam kelapa
di dusun.
This is uc:not limited to
coconut groves but is used
for mangoes, etc.
abatke
\sg
\pl
\nt
\dt 26/Feb/90
59
\lx
\ps
\ge
\gn
\rn
\rf
\xv
\xe
-abili
v
wail
meratap
ratap, me-*
n2.113.30
Kswer ma kabili yaw ti
lasmyerke.
I wailed prostrate on the
ground.
Saya meratap di tanah.
-ser
cry
menangis
1
kabili
\xn
\cf
\ce
\cn
\pd
\1s
\nt
\dt 1/Feb/90
entries into single entries. This collapsed database file is now ready for processing
through another CC table to become a formatted file ready for conversion to a WORD
document. The program CTW (which does the converting) is unable to handle large files.
So the formatted file is split into smaller files, as is also the case when formatting the
dictionary. These are then run through CTW one at a time. Finally a list of these split files
is loaded into WORD, and WORD uses the list to merge the split document files back
into a single document. This produces a printer-ready document in WORD.
The document files are merged into a temporary file called MDFXXX.TMP. The user is
given a chance to look at the finderlist while it is still called this. It may be renamed if
needed (in WORD v5.0 using TRANSFERRENAME; in WORD v5.5 or v6.0 use SAVE AS).
If you choose not to give it a new name, exit WORD, and the new finderlist is
automatically given the name ENGL.DOC or NATN.DOC depending on which language
it is for.
This whole process must then be repeated to produce a finderlist for the other language.
On a 486SX 20Mhz laptop, MDF takes just over five minutes to produce an English
finderlist from a 791K lexical database, with 2,044 records.
The essence of making a reversed finderlist involves storing the lexical entry form, the
lexical citation form (if present), and the subentry form (if there is one), as well as the
homonym number and the current sense number (if relevant), and then outputting a
reversed record for each gloss occurring for the language being extracted.
The finderlists produced can be in either single or double column format and can either
include or exclude the part of speech of the vernacular term being referenced. The
following examples are single column:
With the part of speech:
enrage
enter
entertain
entertainment
entire
envious
erase
adj. masbu.
vi. -sukar.
vt. -aluka.
n. inabrenke, see: -bren;
vi. ktem1.
ph. lema kdwakin irire wait eraske, see: -dakin.
vt. sos.
masbu.
-sukar.
-aluka.
inabrenke, see: -bren;
61
entire
envious
erase
ktem1.
lema kdwakin irire wait eraske, see: -dakin.
sos.
The MDF program combines the vernacular glosses in identical reversed glosses (shown
below with the part of speech). (Note with long headwords MDF pushes the part of
speech and gloss further to the right on reversal so that only the shorter units are fully
aligned.)
face
n:bp. mata;
n:bp. welnohaha.
face, to wash ones vi. -larif.
faded
adj. mamwaw.
faithful
vi. -tohtohaktel.
fake
adv. koikay.
falcon
n:an. lak.
fall
v. kibrok;
v. -tunik;
vi. -di;
vi. kdi;
vi. kyoras;
fall forward
kdian.
v. -surak.
The same list is shown below without the part of speech (Note that multiple references,
such as fall, are concatenated sequentially rather than displayed on separate lines as
above):
mata; welnohaha.
face
face, to wash ones -larif.
faded
mamwaw.
faithful
-tohtohaktel.
fake
koikay.
falcon
lak.
fall
kibrok; -tunik; -di; kdi; kyoras; kdian.
fall forward -surak.
The total number of entries in each finderlist is given as a statement at the end of the
document.
3.3.5 Quit
To leave MDF hit the <ESC> key at the main menu. A message giving the version and
date of the MDF program will be displayed as it returns you to DOS.
62
You are now free to reload each of your document files (DICT.DOC, ENGL.DOC,
NATN.DOC) into WORD to tweak as needed (margins, headers, footers, etc.). If you find
errors in the actual text due to MDF please report them using Appendix I. If you find
errors due to your own mistakes in the lexical database, you can go ahead and correct
them in the printer-ready dictionary, just be sure to also correct the errors in the original
lexical database; otherwise you will have to correct those errors every time you format
your dictionary.
3.4 Printing
The MDF program was designed to get everything ready for printing, but not to actually
handle the printing.
Once your dictionary and finderlists have been formatted, exit MDF and then use WORD
directly to load and print them. Or you could print them from within MDF right after each
document is merged into WORD.
Before printing a large print job, first print a couple of pages to check that the interaction
of the stylesheet with your printer is satisfactory. Several stylesheets are provided on the
release disk as explained below. Select (and if necessary adapt) the stylesheet that is most
appropriate for your printer.
If you are printing your dictionary on a dot-matrix printer (or perhaps on a light duty
inkjet printer), have WORD print only 20 or so pages at a time. Let the printer rest a bit
and then continue. This helps keep the print head from overheating. Another solution is to
open the lid and direct a fan at the print head. This may allow you to print the whole file
at one pass.
The stylesheet MDFDICT.STY is automatically attached to each of the final documents
by MDF. It is set up for the HP Laserjet series printers (III and above; the file
MDF-HP4L.STY is identical to MDFDICT.STY). It also does a fairly nice job for the
Epson LQ series printers (though the MDF-EPLQ.STY stylesheet is designed for these
printers). MDFDICT.STY bombs on the Toshiba 321SL, so if this is your printer, you
will need to copy the stylesheet MDF-T321.STY over to MDFDICT.STY so that MDF
will attach a Toshiba 321SL version of MDFDICT.STY to each document.
If you want to modify the look of your dictionary and finderlists, modify
MDFDICT.STY, but be sure to also save the modified version to another filename, such
as MY.STY. This allows you to switch to other printers (by copying another printer style
over MDFDICT.STY) and not lose all the modifications you made for your own printer
(just copy your stylesheet back to MDFDICT.STY when you want to use it again).
There is also a stylesheet called MDF-FLIP.STY which flips your document from a
single-column format to a double-column one, or vice versa. So even if you choose
3: An introduction to the MDF program
63
double-column format when MDF asks you, you are not stuck with the decision, just
attach MDF-FLIP.STY and the document is automatically changed. Reattaching
MDFDICT.STY returns the document to the original format.
MDF-FLIP.STY is a modification of MDFDICT.STY (it is identical to
MDF-HP4F.STY), so if you are using a modified stylesheet like MDF-T321.STY or one
youve made yourself, you will need modify MDF-FLIP.STY too, if you want to use it.
Again be sure to also give the modified stylesheet a new name, such as MYFLIP.STY.
3.5 Modifying the printout
3.5.1 WORD Stylesheets
The easiest way to modify the look of your formatted dictionary and finderlists is to
modify the WORD stylesheet MDFDICT.STY, giving it a new name after youve
modified it. This stylesheet is used by both the dictionary and the finderlists, so beware:
what you do for the dictionary may affect the finderlists as well. If it does and you dont
like it, then make two stylesheets, one for your dictionary and one for the finderlists. You
will need to attach your modified stylesheets each time you want to print. MDF does not
know about them and stubbornly attaches MDFDICT.STY to the documents.
Most of the styles in the stylesheet are character styles. It is pretty clear for most styles
what they affect (e.g. SN style formats the sense number, etc.). But the FV, FE, FN, and
FR styles affect more than just one lexical field. These codes (for vernacular, English,
national, and regional fonts respectively) determine the look of most of the fields. These
styles are used for all language specific text (\dv, \de, \dn, etc.). So, for example, if you
print out a diglot dictionary for a national language audience, you will probably want to
tweak the FN style, because this style is set to italic (to differentiate the national language
from English in a triglot dictionary). Simply edit the stylesheet and change FN back to
normal text for your national diglot dictionary.
The standard font style [FS] is used for formatting most information fields (\rf, \lt, \pd,
\lf, \is, \th, \sd, \bw, \et, and \cf), as well as for punctuation.
The labels that mark different fields (e.g. See: for the cross-reference field) are all
encoded with the FL style (mnemonic for fontlabel).
3.5.2 Character Style codes
MDF supports embedded coding in your discussion fields so that you can apply or specify
a character style to any bit of text in your dictionary. These embedded codes are to be
used in your lexical database before the dictionary is formatted, not afterwards. The
following are the character style codes supported by MDF (see also 2.5):
64
fv:
fe:
fn:
fr:
(fontvernacular)
(fontEnglish)
(fontnational language)
(fontregional language)
fl:
fs:
fb:
fi:
(fontlabels)
(fontstandard)
(fontbold)
(fontitalic)
uc:
ub:
ui:
sc:
(underline character)
(underlinebold)
(underlineitalic)
(underline a scientific namenot required in the \sc field)
These codes can be specified within any field (but generally are used only in free-form or
discussion fields). When specified, they apply to the following word (a space or
punctuation terminates the style). The style codes must be in lower case, and must not
have any space between the colon and the following word:
\ee They make fv:sabun using pulverized coral...
65
3.6 Summary
We hope the MDF program makes the whole process of printing out your dictionary and
finderlists easy enough so that it can be done as frequently as needed. In printed form, a
dictionary can be a valuable language learning tool for you, helpful to others in related
languages, and also a good demonstration of progress to the language community and the
government authorities. A dictionary only on the computer is of little use to anybody but
yourself (and then only when you are sitting at the computer).
66
1Many commercial dictionaries count each separate part of speech, subentry, inflected forms, run-on
derivatives and other classes of subsidiary information as separate entries for the purposes of inflating
the total entry count. Thus, a single headword can be counted as five or more entries, because for
commercial purposes the more entries one can claim, the more impressive (see Landau 1989:84-87). For
the discussion in this Guide entries are counted by headwords.
67
2Many educated westerners also have difficulty using thesauruses in major languages. How to find the
information one is after often takes a larger investment of understanding than does using a dictionary.
68
tends to be fairly useless to other audiences, and is often used with great difficulty by
other academics, if at all.
National government: A dictionary geared to please the national government often
appears incomplete and full of shortcuts. It is often produced to justify a visa, on-going
presence in the area, or show that contractual obligations are being met. It is rarely a
service to anybody. A better option is to produce a serious volume for an academic
audience that would contribute to both the local and scientific communities, and would
deal with the visa or contract problems as well.
Local government: Local government officials with a variety of motives are frequently
interested in a dictionary to help them grapple with the local vernacular. What they
usually mean is a simple glossary. However, the local community may not want the
transitory civil service, police, or military to know certain areas of vocabulary, such as
female body part terms and sexual terminology, and may request that certain areas of
vocabulary be left out of something made for local officials. The information that will
satisfy the needs of local officials is less than that required for a serious dictionary, and so
is not recommended as a primary audience.
Local audience: The local audience often has a variety of purposes or desires in having a
dictionary of their language. Prestige and ethnolinguistic pride may enter inupon
getting a dictionary it is not uncommon to hear, Now we have a real language!
Community leaders may feel the younger generations are rapidly shifting to a regional or
national language and want a reliable inventory of their language and culture in the form
of a dictionary. Or they may feel that knowledge of certain parts of their language and
culture (such as ritual language or traditional medicine) are not being transmitted to a new
generation of specialists and need to be archived while the knowledge is still available.
The information catalogued in a serious attempt to make a dictionary that will serve the
broad needs of the local community will normally serve other audiences and purposes as
well.
General audience: This is commonly cited as the primary audience of compilers of
dictionaries. However, a general audience is simply not specific enough to assist in
decision-making about how information should be packaged or what information should
or should not be included. A product aimed for a general audience is often amorphous
and unprincipled.
Mixed audience: This may be either something to be studiously avoided, or a viable
solution to several problems. Trying to serve mixed audiences with mixed purposes can
make a dictionary very unsatisfying or very unwieldy. For example, a dictionary geared
primarily for an academic audience will probably not be usable by the local community.
One solution is to make separate dictionaries for separate audiences. However, few
69
scholars have the time or the financial resources to make more than one serious
dictionary.
OUR RECOMMENDATION: Given the reality that a compiler will probably be limited to
producing one or at most two dictionaries, we recommend that the major dictionary be
aimed for the local audience and supplement it with information that is of use to
secondary audiences, such as a scholarly audience. For example, the addition of scientific
names, etymological information, and morphological parsing (e.g. memukuli meNpukul-i) can nicely broaden a dictionary otherwise geared for a local audience. A viable
solution is thus to aim the primary organization of the lexical information for a local
audience, but to also embellish the entries with information that is useful and interesting
to an academic audience. A well-organized computerized lexical database can
accommodate information packaged for different audiences. The following example is
from Buru:
\lx
\ps
\ge
\gn
\gr
\re
\dv
\de
\dn
\et
\eg
sira
PRO
3p
mereka
dorang ; dong
they ; them
gebaro dikat fi di kita
they; third person plural
orang ketiga jamak
*siDa
they
[lexeme / headword]
[part of speech]
[gloss for interlinearizing texts]
[gloss for national language]
[glosses for regional language]
[glosses for reversed English finderlist]
[definitionvernacular]
[definition/description -English]
[national language definition]
[historical etymology]
[gloss of etymology]
70
contexts. Judicious use of examples assists with both justifying and exemplifying usage.3
MDF provides for both vernacular-English and vernacular-national language diglot
options. Pawley (1993:18/3/93 lecture notes) explains his view:
In a bilingual dictionary, the situation is different [from a monolingual dictionary].
The bilingual dictionary, going from L1 to L2, is chiefly a translation aid and
ideally it should be backed by monolingual dictionaries of the two languages. The
user is looking for equivalents rather than analysis. Start with the ideal simplest
case, where the two languages, L1 and L2, always have fully intertranslatable
terms. By this I mean that for every term in L1 there is at least one term of
equivalent meaning in L2. In such circumstances, the counterpart of the definition
is the translation equivalent. And the lexicographers job would be to specify the
proper translation equivalent(s). There would be no need to define the meanings of
terms in L2 analytically in the bilingual dictionary because the speaker of L1
would either know the equivalent term in his own language, or having been told it,
would be able to look it up in a monolingual dictionary.
However, bilingual dictionaries do not always work this way. The main reason is
that the lexicons of different languages are never completely isomorphictheir
semantic categories do not match one-to-one. Languages stemming from a
common ancestor and spoken by communities with very similar cultures may show
a fairly close match. So, sometimes, do unrelated languages whose speakers have
been bilingual and in close contact for many centuries. But languages associated
with radically different cultures may not be readily intertranslatable. Far from it. In
such cases, the lexicographer is obliged to give analytic definitions, in other
words, to do much the same thing as the compiler of a monolingual dictionary.
Those of us who work on exotic languages (from the European standpoint), such
as Australian, Austronesian or Papuan languages, constantly find ourselves in this
last situation. [emphasis added].
3Most handheld electronic bilingual dictionaries do not qualify as dictionaries in the sense used here.
They are electronic glossaries (with varying degrees of sophistication). Multilingual dictionaries (e.g.
eight European languages in a single volume) also tend to be glossaries without enough information to
distinguish appropriate usage.
71
There are a variety of strategies for finding words to go into a lexical database.
1)
2)
3)
Given the phonemes and phonotactic patterns of the language, what are the
logically possible combinations of letters and morphemes, and which ones do the
native speakers recognize as words?
4)
Are there native speakers I can commission to fill in wordlists or think of terms for
me? This approach is full of inherent pitfalls. These include: in many societies
native speakers often have an inadequate mastery of the national language in which
they try to describe or define the terms; there is mostly likely a mismatch of terms
used in L1 and L2 even though the description is written by the native speaker on
the assumption of a complete match; the compilers add further changes when they
reinterpret into English what they are given, etc.
Strategies 14 are not recommended as primary (or serious) approaches. Some that have a
little more merit include:
5)
72
Are there good (tested) extended wordlists in the national language or a lingua
franca I can use to get started? These are best if they are designed specifically for
the language family. Because second (or third) languages tend to be used only in
certain contexts or domains, be aware that there may be large areas of vocabulary
that native speakers never use and may not know in the language of elicitation
(such as the national language), but only in the vernacular (e.g. centipede or leach).
6)
Is there a (good) dictionary of a related language that I can use to elicit forms and
compare range of meaning? Here the compiler must take great cautions to avoid
assumptions of isomorphism (one-to-one relationships of form and meaning across
languages).
7)
Are there good picture books (drawings or photographs) that can be used to elicit
terms? They may be useful for flora, fauna, and material culture such as artifacts.
However, there is the temptation to assume that the scientific name in the picture
book is a perfect match for the native term, whereas the local varieties may, in fact,
be different. Furthermore, scientific nomenclature often changes over time as
botanists and zoologists refine the principles by which things are classified. Thus,
the scientific name given by a qualified naturalist in 1850 or 1930 may not be what
is used today, and what was covered by the term 100 years ago may be split into
two or more terms now, or may have been merged with another term.
9)
4A program called IT (Simons and Versaw 1987) is available to Apple MacIntosh users, but it does not
have the extensive interactive capabilities available in SHOEBOX. IT works at the level of a glossary,
rather than a full-blown lexical database. IT can be ordered from Academic Computing, 7500 W. Camp
Wisdom Rd. Dallas, TX 75236, USA.
73
74
\lx ama
\ge father
ama father.
\lx ina
\ge mother
ina mother.
Some compilers include housekeeping information in a minimal entry such as the date the
entry was last worked on:5
\lx ama
\ge father
\dt 9/Sep/90
ama father.
\lx ina
\ge mother
\dt 8/Aug/89
ina mother.
Some who use the lexical database for linguistic analysis in interlinearizing texts want the
part of speech included in a minimal entry:6
\lx
\ps
\ge
\dt
ama
n
father
9/Sep/90
ama n. father.
\lx
\ps
\ge
\dt
ina
n
mother
8/Aug/89
ina n. mother.
TIP: Fields you want to appear in every entry can be entered in the DATABASE
\ps
\ge
\dt
problem is that analysts tend to believe the labels that they assigned early in their exposure to a language
before they understood how the language works as a system. We recommend that compilers flag tentative
parts of speech assigned early in the language project in some way, perhaps with a preceding asterisk to
indicate the tentative or hypothetical nature (\ps *vi). This will facilitate checks and later modifications
once the system is better understood.
7To make sure there is a space at the end of each line, press <END> and check where the cursor sits.
75
Novice compilers will find it helpful to include many fields in their templateeven more
than they feel they need at the beginning. Empty fields are not a problem with MDFif
there is no content in a field, MDF ignores it when formatting the dictionary or reversed
finderlist. By including many fields in the template, users will find the fields are there
when they need them. Power users can add bundles of fields at any time using MACROS or
direct keyboarding, but this is daunting to the beginner who is facing information
overload. Fields will be consistent if entered by a template rather than by hand. We have a
tendency to be lazyif the field is not there we may fail to add the information even
when we know it, but the presence of a field serves as a prompt. The following is
suggested as a basic set of fields to include in every record in the lexicon. It is most easily
entered in SHOEBOX as a DATABASE TEMPLATE.
\lx
\ps
\pn
\ge
\re
\de
\gn
\dn
\rf
\xv
\xe
\xn
\ee
\en
\lf
\le
\ln
\mr
\bw
\cf
\ce
\cn
\sd
\st
\so
\dt
[lexeme / headword]
[part of speech]
[\ps for national language]
[glossEnglish]
[reversalEnglish]
[definitionEnglish]
[glossnational language]
[definitionnational language]
[reference]
[examplevernacular]
[exampleEnglish translation]
[examplenational lg. translation]
[encyclopedic informationEnglish]
[encyclopedic info.national lg.]
[lexical function (lexical network)]
[\lf glossEnglish]
[\lf glossnational language]
[morphology]
[borrowed word]
[confer/cross-reference]
[\cf glossEnglish]
[\cf glossnational language]
[semantic domain]
[status of entry]
[source]
[date entry last worked on]8
Additional field markers for expanded entries can be added as needed. See 2.1.
8This can be set up within SHOEBOX to activate the DATESTAMP feature for automatically updating
76
\lx
\ps
\ge
\re
\de
\gn
\lf
\le
\ln
\lf
\le
\lf
\le
ama
n
F
father ; uncle (paternal)
male of first ascending...
ayah ; bapak
Cpart = ina
mother
ibu
Spec = ama ebanat
biological father
Spec = ama haat
fathers eldest brother
\sd Nkin
\dt 28/Feb/84
[lexeme / headword]
Field markers entered by database
template.
The primary consideration here goes back to audience and purpose. Despite Landaus
claim, it is not the case that everybody knows the alphabet.9 Linguists tend to want to
organize dictionaries around root morphemes for their own convenience. However, local
audiences (and often others, including other scholars) generally find it difficult to find
information organized around the root morphemes. They usually look for the surface form
9Many literacy programs for preliterate societies, non-formal education, or adult vernacular literacy,
while teaching the letters of the alphabet, often fail to teach the alphabet as a conventionalized ordering
of letters for mnemonic and organizational purposes. This then fails to equip the new readers with a basic
skill needed to access tools, such as dictionaries, that build bridges for survival in the larger world.
77
first and then give up.10 For example, in Buru they would want to look up enyikut under
en... rather than under the root iko, and ekhida under ek... rather than under the root
hida. Unfortunately, most major dictionaries of Austronesian languages have been
heavily root-oriented.11 Both strategies have their advantages and disadvantages, some of
which are discussed below (see 4.6.1 for a summary). It is best to choose one strategy as
primary over the other (root-oriented vs. lexeme-oriented, although a marriage of the two
is possible) keeping in mind the associated advantages and disadvantages. To accomplish
both requires some sophisticated tweaking of the database that is beyond the skill of the
novice or even the average compiler.
OUR RECOMMENDATION: We recommend essentially a lexeme-based dictionary
that also contains basic entries for root morphemes and affixes to show the
morphological parts of the language and also to handle interlinearization.
Not every surface form should be in the lexicon. Some languages have classes of words,
such as verbs, inflected for person and number with no other change in the meaning (e.g.
amo, amas, ama, amamos, aman, or ala, mala, nala, tala). For these types of words, only
the citation form (discussed under \lc in 2.1, and in more detail in 5.4.4) should be an
entry in the dictionary. The other forms should be derivable from information in the
grammatical introduction to the dictionary. If there is an irregularity in the paradigm, that
would be laid out overtly using the appropriate person-number form of the paradigm
fields.
The two database formats (root-based vs. lexeme-based) might look something like the
following:
Root-based DB (structure)
\lx
\ps
\ge
\gn
\dv
\de
\dn
\rf
\xv
\xe
\xn
root lexeme
part of speech
gloss (English)
gloss (national)
definition (vernacular)
definition (English)
definition (national)
ref. text, notebooks
example sentence (vern)
translation \xv (Eng)
translation \xv (nat)
Lexeme-based DB (meaning)
J
U
S
T
O
N
E
\lx
\va
\ps
\ge
\gn
\dv
\de
\dn
\rf
\xv
\xe
root lexeme
list of variants
part of speech
gloss (English)
gloss (national)
definition (vern)
definition (English)
definition (national)
ref. text, notebooks
example sent. (vern)
translation \xv (Eng)
O
N
E
R
E
C
O
R
D
10It can take a major effort to educate a whole society to parse words to find the root morphemes, and the
organizational infrastructure required to do so may not exist. By contrast, many people who know how to
read national languages learned the order of the alphabet in the process of learning to read, whether or
not they attended a formal school.
11Zorc (1992) gives a negative critique of the heavily root-oriented Austronesian dictionaries pointing
78
subentry (polymorph)
part of speech
gloss (English)
gloss (national)
definition (vernacular)
definition (English)
definition (national)
ref. text, notebooks
example sentence (vern)
translation \xv (Eng)
translation \xv (nat)
cross-ref. other entry
gloss (Eng) of \cf
notes, questions, etc.
C
O
M
P
L
E
X
R
E
C
O
R
D
\xn
\cf
\ce
\nt
\dt
\lx
\ps
\ge
\gn
\dv
\de
\dn
\rf
\xv
\xe
\xn
\mr
\cf
\ce
polymorphemic lexeme
part of speech
gloss (English)
gloss (national)
definition (vern)
definition (English)
definition (national)
ref. text, notebooks
example sent. (vern)
translation \xv (Eng)
translation \xv (nat)
morphology
cross-ref. other entry
gloss (Eng) of \cf
A
N
O
T
H
E
R
R
E
C
O
R
D
In the root-based database, polymorphemic lexemes related to the root are seconded under
the root form and become a part of the entry for the root formthis approach is structureoriented. In the lexeme-oriented database, each lexeme has its own entry and the
relationship that exists between root lexemes and polymorphemic lexemes based on that
root are handled by cross-referencing using the \lf, \cf, \va, and \mn bundles of fields, just
as headwords that may not be based on that root are handledthis approach is meaning
oriented.
The lexicographer biased in favor of a root-based (form) approach might organize
hairbrush, toothbrush, and paintbrush under the headword brush. The lexicographer
biased in favor of a lexeme-based approach would argue that languages are full of
lexemes such as remove which is clearly not synchronically the sum of move plus re- and
must be handled in terms of meaning, not form.
79
Root-based approach
\lx
\ps
\ge
\de
\se
\ps
\de
\se
\ps
\de
\se
\ps
\de
brush
n
bristly_intrument
bristly instrument used
for cleaning, arranging, or
applying a liquid to s.t
hairbrush
n
k.o. brush typically with
stiff one inch long
bristles loosely spaced
arranged perpendicularly to
the handle for rearranging
hair
toothbrush
n
k.o. brush with stiff onequarter inch bristles
tightly spaced arranged
perpendicularly to the
handle for cleaning teeth
paintbrush
n
k.o. brush of varying sizes
and varying lengths and
textures of bristles
arranged as an extension of
the handle used to apply
paint and similar materials
80
brush
n
bristly_instrument
bristly instrument used for
cleaning, arranging, or
applying a liquid to s.t
Part = handle
...
Part = bristles
...
Spec = hairbrush
\le
\lf
\le
\lf
\le
\lf
\le
\ps
\de
...
Spec =
...
Spec =
...
Spec =
...
v
to use
toothbrush
paintbrush
mustache brush
a brush (n)
\lx hairbrush
\ps n
\de k.o. brush typically with
stiff one inch long
bristles loosely spaced
arranged perpendicularly to
the handle for rearranging
hair
\lf Gen = brush
\le ...
\lx hairbrush
\ps n
\de k.o. brush typically with
stiff one inch long
bristles loosely spaced
arranged perpendicularly to
the handle for rearranging
hair
\cf brush
\ce ...
A root-based database is keyed to root morphemes (and also includes bound morphemes
like -ku). A root-based approach is often favored for morpheme-level analysis for
interlinearizing texts. Generally there are no polymorphemic words found in any key
field. Rather, these polymorphemic forms and their related information would be found
under the related root form.
Root-based approach (structure oriented)
\lx bersih
\ps adj
\ge clean
\se kebersihan
\ps n
\ge cleanliness
\se membersihkan
\ps v
\ge to clean
Beginning of record
1st subentry
2nd subentry
3rd subentry
81
In a lexeme-based database, on the other hand, the above subentries would be organized
separately as full lexical entries that are cross-referenced back to bersih. Such a lexemebased approach is preferred by many lexicographers because it focuses on the meaning
chunks irrespective of the root. These separate lexical entries are then cross-referenced
back to their root form through the \lf, \cf \mn, or \mr field bundles. The separate but
related lexical entries can be created and filled in from within the root entry through the
use of SHOEBOXs JUMP feature (ALT + F6).
Lexeme-based approach (meaning oriented)
\lx bersih
\ps adj
\ge clean
\cf kebersihan
\ce cleanliness
\cf membersihkan
\ce clean s.t.
\lx
\ps
\ge
\mr
\cf
\ce
kebersihan
n
cleanliness
ke-bersih-an
bersih
clean
\lx
\ps
\ge
\de
\mr
\cf
\ce
membersihkan
vt
clean
clean s.t
meN-bersih-kan
bersih
clean (adj.)
clean s.t..
clean (adj.).
For sanitys sake it is important to also cross-list these polymorphemic entries in the root
entry (e.g. using \cf or \lf). Otherwise the compiler would soon forget which related forms
had already been addressed in the lexicon (because, being separate entries, they would be
sorted alphabetically into their appropriate places).
Alternatively, the entry for the root bersih above could be more specific in the
relationship between the forms by using the \lf fields rather than the \cf fields (see
chapter 7).
\lx
\ps
\ge
\lf
\le
\lf
\le
82
bersih
adj
clean
Nres = kebersihan
cleanliness
Cause = membersihkan
clean s.t.
cleanliness;
Cause:
membersihkan clean s.t..
Lexeme-Based Format
resorted. If one were to try to use this resorted database, it would be with the understanding that some
fields relevant to the original root morpheme (the first \lx) are probably now repackaged as part of the
last subentry (\se). Version 2.0 of SHOEBOX can compare the resulting \lx contents against a text
corpus using the SPELL CHECKER feature.
83
ai
n
Nplant
1
tree
2
wood
Nres = ai balun
casket
Nres = ai kabelak
board
*kaSiw
wood
\lx
\ps
\sd
\ge
\mr
\cf
\ce
\cf
\ce
ai balun
n
Ncult
casket ; coffin
ai balu-n
ai
wood
balu
side, part
84
side, part.
[polymorphemic lexeme; identifying ai, balu, and -n]
\lx
\ps
\sd
\ge
\mr
\cf
\ce
\cf
\ce
ai kabelak
n
Ncult
board ; plank
ai ka-bela-k
ai
wood
bela
flat
\lx
\ps
\ge
\lf
\le
\lf
balu
n
part ; side ; half
Spec = mota balu
(other) side of river
Spec = balu-balu...,
balu-balu...
\le half of (group)..., the
other half...
\cf balun
\ce side
\lx
\ps
\ge
\lf
\le
balun
n
side ; remainder ; some
Idiom = ai balun
casket (lit. its
wooden sides)
\mr balu-n
\cf balu
\ce part
\lx
\ps
\ge
\cf
\ce
\cf
\ce
\cf
\ce
\cf
\ce
bela
vn
flat ; level
kabelak
flat (adj)
belak
flat round chest disk
kabelan
side, face
belar
spread out, multiply
flat.
[polymorphemic lexeme; identifying ai, bela, ka-, and -k]
part.
[polymorphemic lexeme; identifying balu and -n]
bela vn. flat, level. See: kabelak flat
(adj); belak flat round chest
disk; kabelan side, face; belar
85
\lx
\hm
\ps
\ge
\re
\de
ka2
Vpref
STAT
stative ; be
be; stative prefix
deriving adjectivals
from non-active verbs
\va k\lx
\hm
\ps
\ge
\re
\de
\lx
\ps
\ge
\re
\de
-k
2
Nsuf
NOM
*
nominal suffix indicating
an independent unit (in
contrast with the partwhole relationship
expressed by the genitive
fv:-n)
-n
Nsuf
GEN
*
genitive suffix normally
indicating a part-whole
relationship
-n
Under this combined strategy bound roots do not necessarily require a citation form. Two
alternatives for handling bound roots are presented below. The decision between the two
approaches is left to the compilers preference.
\lx
\ps
\ge
\re
\de
\mn
86
baniRt
F-in-law
*
father-in-law
banin
[Approach 1]
bani- Rt. father-in-law. See main entry:
banin.
[two entries; no reversal on root; use
\mn; these \ps Rt entries can be 1) in
the main lexicon, 2) in a separate
database, or 3) can be removed from the
main lexicon before processing in MDF
as desired]
\lx
\ps
\sd
\ge
\re
\de
\lf
\le
banin
n
Nkin
F_in_law
father-in-law
father-in-law
Cpart = kii
mother-in-law,
fathers sister
\lf Idiom = ai fehuk banin
\le rotten cassava
\mr bani-n
\lx
\lc
\ps
\sd
\ge
\re
\de
\lf
\le
banibanin
n
Nkin
F_in_law
father-in-law
father-in-law
Cpart = kii
mother-in-law,
fathers sister
\lf Idiom = ai fehuk banin
\le rotten cassava
\mr bani-n
[Approach 2]
banin n. father-in-law. Cpart: kii
mother-in-law, fathers sister;
Idiom: ai fehuk banin rotten
cassava. Morph: bani-n.
[single entry using \lc; SHOEBOX
INTERLINEAR function will see only
bani-, so a separate parse database or
use of semiautomated parsing is
required for handling polymorphemic
forms like banin; \mr field here is for
printing purposes only, not for
interlinearizing]
The first approach above incorporates information about morpheme breaks into the main
lexicon for interlinearizing, whereas the second approach uses a separate PARSE.DB as a
place for SHOEBOX to look for directions about parsing polymorphemic words into their
underlying morphemes. The first approach, if the root morpheme entries are included in
the main lexicon, will have a certain amount of redundancy. However, not all languages
have simple, single, or predictable forms that are built from the root, so the first approach
would be entirely appropriate. The second approach requires the compiler to anticipate
the final printing view to keep everything ordered properly.
87
88
Word processors as a tool have many disadvantages for compiling a dictionary, only
some of which can be compensated for using a stylesheet or document template. For
example, sorting (alphabetizing) is often done manually, particularly with non-default
sequences (e.g. sorting digraph ng separately after n; ch separately after c, etc.).
Searching or jumping to nonadjacent entries is slow and cumbersome on large lexicons,
even with fast computers and hard disks. Reversing the dictionary (e.g. vegetable n
utan; mushroom n utan) must be done manually with great tedium and a tremendous
waste of time. Editorial changes (e.g. the publisher insists on headwords being all caps or
underlined, or on part of speech being non-italic caps) or font changes required by
switching to a different printer must often be done manually, entry-by-entry. Styles can be
forgotten or flags misspelled when they are applied manually (e.g. See: occasionally not
italicized or no colon or misspelled). Additional language information (such as the
national language) would either clutter the entry visually or have to be handled separately
with a reduplication of effort. Extracting subsets of information for analysis or separate
publication (e.g. selecting out entries related to kinship and social relations, or plant
terms) is extremely difficult. Housekeeping information (e.g. date last worked on, source
of information, reference to notebook or text, etc.) is left out altogether, hidden, or
deleted manually prior to publication. The disadvantages go on and on.
A lexicon well structured as a database overcomes these problems, particularly when
using a computer program like SHOEBOX and piping the output through a utility like
MDF to make the print format, labels and styles automatic and consistent. The focus in
compiling the dictionary is then on structuring the lexical information rather than on
formatting. The disadvantage is that one cannot see the final formatting until the database
is run through a program like MDF. An entry like the one above might be entered as:
\lx
\ps
\sd
\ge
\gn
\re
\de
utan
n
Nplant
veg
sayur ; jamu
vegetable ; mushroom
non-bulbous edible leafy
and stalky plant and fungi
89
With a database structure, information not relevant to a particular audience or purpose (in
this case national language information) is ignored; formatting is automated (\lx converts
to style and point size defined for headword, \ps for part of speech, \cf can be replaced
consistently by italics See:, etc.). Fields such as \sd can be used for extraction and
retrieval of plant terms (using SHOEBOX filters), \ge can be selected by the computer for
a cursory interlinear gloss, while the words in \re can be used for the English finderlist
automatically creating entries under both vegetable and mushroom.
TIP: The compiler should use the codes and format recommended in this Guide,
gloss vernacular
gloss English
gloss national language (Indonesian, Filipino, Thai,
Spanish, French, Portuguese, Tok Pisin)
gloss regional language (Ambonese Malay, Kupang
Malay, Ternate Malay, Manado Malay, Makasar
Malay, Jakarta Malay, Cebuano, Swahili, etc.)
Reversal codes are used where what is required for interlinearizing is less than or
different from the gloss fields.1 See 2.3.
\re
\rn
\rr
reverse English
reverse national language
reverse regional language
1Many people interlinearize only in English, with a few also using the national language. Unless one
foresees interlinearizing in more than one language it is not economical to use two full sets of gloss and
reversal fields.
90
word-level gloss
cross-reference
\dv
\de
\dn
\dr
\rf [reference]
\xv [see 6.2]
\xe
\xn
\xr
\we
\wn
\wr
\cf
\ce
\cn
\cr
usage
lexical functions
restrictions (only)
encyclopedic
\uv
\ue
\un
\ur
\lf [see 7]
\le
\ln
\lr
\ov
\oe
\on
\or
\ev
\ee
\en
\er
variants
\va
\ve
\vn
\vr
ama
n
kb
F
father ; uncle (paternal)
father, uncle (paternal)
ayah ; bapak ; paman
bapak, paman...
91
be used for a second national language. In the current configuration of MDF the regional
language codes are tied to print when the national language options are selected. They do
not function independently so they should not be used for other categories of language
such as the researchers national language like Finnish, Italian, Korean, or French.
5.3 Categories of information in a lexical entry
Ignoring formatting purposes for the moment, there are basically three general categories
of information in a lexical entry: 1) information about the headword, 2) information about
words related to the headword, and 3) housekeeping information.
5.3.1 Information about the headword
Most field markers in a record relate directly to the headword. These include: [NOTE
\xx+ indicates a bundle of related fields.]
\lx
\ph
\sn
\ps
\ge+
\re+
\de+
\xv+
\ue+
\oe+
\ee+
\mr
homonym number
lexical functions
synonym
antonym
notes
paradigm [structural pattern or completeness]
etymology, historical
borrowed word; loan source
cross-reference
92
\sd
\va+
\mn
semantic domain
variant forms
main entry form
TIP: The JUMP feature in SHOEBOX <ALT+F6> allows the user to check the converse
93
to each other each time one of them is edited in SHOEBOX. Note that different
homonyms are structured as separate entries.
\lx
\hm
\ps
\ge
baa
1
AUX
only
\lx
\hm
\ps
\ge
baa
2
n
stem
baa2 n. stem.
There are a wide range of options in published dictionaries for indicating homonyms.
MDF uses the subscript (e.g. baa1, baa2) as one that is common, visually pleasing, and
easy to implement consistently on the computer. MDF provides for numbers in vernacular
fields to automatically subscript, assuming that they cross-reference a particular
homonym.4
\lx
\ps
\ge
\lf
\le
rahek
AUX
only
Syn = baa1
only
4For those who need superscripted tone numbers within vernacular fields, we suggest marking the tones
with otherwise unused symbols in SHOEBOX and then post-edit the MDF output in WORD, replacing
those symbols with the appropriate superscripted numbers.
5To get primary (\lx) and secondary (\hm) fields both involved in the sorting, MDF uses the SIL program
SRT, which uses a different command structure for defining the sort order than that used by SHOEBOX.
Thus it was not possible to have MDF find and read the SHOEBOX sort command sequence and
incorporate it for SRT.
6Compilers working on dictionaries in Spanish-speaking countries should be aware that the 10th Annual
Congress of the Association of Spanish Language Academies voted in April 1994 to eliminate ch and
ll from the Spanish alphabet. Words beginning with these letters will now be listed under c and l
respectively (reported in the Charlotte Observer, 30 April 1994). We are intrigued, since this move is
probably driven by the inconvenience or inability of many commercial computer programs to perform
non-ASCII or digraph sortssorts which are handled easily by SHOEBOX and MDF. Dictionary
compilers should check that the country in which they work subscribes to these proposed changes before
incorporating them by restructuring their lexicon (through a new sort order).
94
2)
Edit MDFDICT.ANS with a text editor that can save the file as Text only or
ASCII. Do not make any other changes than those noted here!
3)
In MDFDICT.ANS insert the changes in the \m field. If, for example, one wishes
to sort the digraphs nd, ng, the trigraph ngg and the monograph separate from
and following the ns, then the following change would be made:
\m @ a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 { | } ~ ! # $ % & ( ) * + , . /
: ; < = > ? [ \ ] ^ _ `
\m @ a b c d e f g h i j k l m n nd ng ngg o p q r s t u v
w x y z 0 1 2 3 4 5 6 7 8 9 { | } ~ ! # $ % & ( ) * +
, . / : ; < = > ? [ \ ] ^ _ `
4)
Save the file as Text only or ASCII as MDFDICT.ANS and then test MDF on a
sample database that includes data that should be effected by the changes (such as
headwords that begin with and ng).
5)
6)
Once that is done, then post-edit the file in WORD copying another section header
(the letters and line that appear before each new letter in the alphabet) to the
correct place and adjusting it to reflect the changes. This is most easily done if nonprinting characters are visible on the screen (set through the OPTIONS menu in
WORD). Be sure to copy the appropriate division breaks and paragraph marks as
well.
7)
If MDF on your computer will be used by other users or for other languages,
remember to copy MDFANS.SAV back to MDFDICT.ANS when you are done.
95
7Initial impressions must be corroborated by other evidence, since a wordlist-taking situation is often one
96
\lx
\lc
\ps
\ge
\ue
\lf
\le
-ao
bekeao
v
screech ; howl
Formal/ritual
SynR = bengeao
common speech
Usage:
Formal/ritual.
SynR:
bengeao common speech form.
If the choice is to sort the above entry under the As, then the compiler may want to
organize the data in the following way to make it visually obvious why the entry appears
out of place:8
\lx
\lc
\ps
\ge
\ue
\lf
\le
-ao
(beke)-ao
v
screech ; howl
Formal/ritual
SynR = bengeao
common speech
8We recommend testing a wide cross sample of users to see which approach is preferred for a given
97
98
fatu n. rock.
\lx wae
\ps n
\ge water
wae n. water.
\lx iko
\ps vi
\ge go
Compounds that have phonological evidence of functioning as a unit are also fairly
straightforward candidates for headwords.
\lx
\ps
\ge
\lt
\mr
fathese
n
cliff
rock-wall
fatu-hese
\lx
\ps
\ge
\lt
\mr
hektatak
vt
abandon_s.t
flee-drop
heka-tata-k
n. cliff. Lit:
Morph: fatu-hese.
fathese
rock-wall.
There are also combinations of words that do not show phonological changes, but clearly
function in a language as distinct cultural concepts or units. They may be phrasal or even
clausal. Often the combination of such units is different than the sum of its parts,
indicating non-restrictive, conventionalized, or semantically bleached senses. They often
indicate types of a kind. English words of this sort include blackboard (often green),
99
geba nega
n
adult
person easy
\lx
\ps
\ge
\cf
\ce
ba sohik
vt
hope_for_s.t
sohik
wait
\lx
\ps
\ge
\lt
wait.
There is disagreement among some lexicographers as to whether these latter two types
should be handled as subentries or as separate entries. If the primary audience is the local
populace, the separate entry strategy is probably best, supplemented by cross-references.
[See examples and discussion in 4.2 and 4.6].
These types of emic units, whether they are simple morphemes, compounds, phrasal, or
clausal, are all good candidates for a headword. Such structural variety is what drives
lexicographers to use the term lexeme, rather than word to describe these units.
Pawley (1993:30/3/93) describes two views of language that are in tension for compilers
of dictionaries.
Many of the ideas which people formulate in their language are highly subjective
constructions, having only the most tenuous connections with objectively
measurable things and events. Some of these subjective formulations may enter the
linguistic tradition, becoming standardized ways of saying things. Thus, each
language community develops a unique body of resources representing a particular
worldview, a particular shared tradition which is part of its culture.
In describing language as a device for encoding a particular culture, the object is
not to achieve the most parsimonious specification of grammatical form-meaning
pairings. The object is to describe what it takes to use a language properly as a
member of society. Part of this is knowing what things to say, when to say them
and how to say them in conventional ways. The culture encoding approach leads us
to take a very different definition of the lexicon from the grammarians. Instead of
striving to keep the lexicon small we need to enrich it. In fact we apply the terms
lexicon, lexeme (or lexical item) and lexicalized in ways quite different
from the grammarian. Now these terms are defined with respect to cultural facts as
100
well as with respect to purely structural criteria. Complex words and compounds,
and perhaps phrases, are considered part of the speakers cultural lexicon if we can
show that they have entered the social tradition, that they have attained the status
of social institutions, being recognized as conventional names of things, as
terms in a set or terminology, as set phrases, and perhaps as appropriate things
to say. All grammatical strings are not socially equal. We award special status to
those strings that are culturally significant, even though they may also be perfectly
grammatical. The upshot is an enormous increase in the number of lexemes
compared to the ideal grammarians dictionary. [emphasis added]
Pawley (1986) identifies a number of tests for English that may help determine whether
or not something can be considered to be a lexicalized form, and thus a candidate for a
lexeme (headword) in the above sense. Many of the tests are adaptable to other language
situations as well. Some of the tests depend upon a written tradition. The following
material is adapted from Pawley (1986) and most of the examples are also from that
source:
1)
The naming test: Can the candidate for a lexeme be referred to in questions or
statements such as the following: What is it called? It is called X. We call it X,
but they call it Y.
2)
3)
Customary status: Does the use of the phrase imply certain behavior patterns,
values, or sequences of activities that are known by society at large? They
represent conventionalized knowledge. For example, expected behavior at the front
door is different from at the back door (besides their participation in idioms),
indicating that these function as cultural units (lexemes) that are more significant
than the sum of the parts. Consider go to the mosque, get off work, take a vacation.
4)
Legal status: Some phrases have such status that they are codified in legal usage:
driving under the influence, breaking and entering, assault and battery, justifiable
homicide. Even so-called primitive societies with unwritten languages have
categories of this sort for dealing with things like marriage negotiations and
litigations over land, property, and adultery.
5)
Speech act formulas: Every language has some formulas which carry out
conversational moves (Pawley 1986:106). For example, excuse me, how are you,
yall have a nice day, etc.
101
6)
Use of acronyms: This is often proof that a multi-word phrase represents concepts
that have attained conventionalized or institutionalized status. Consider: VIP,
DWI/DUI, IQ, RBI, SAT, ASAP, PTO, PTL, AWOL, BS, RSVP, R and R; in
Indonesia: KB, DKI, KK, ABRI, DPRD, GBHN, etc.
7)
8)
Belonging to a terminological set: This is similar to (2), but focuses more on a pair
of antonyms. Consider: tell the truth tell a lie, take care of neglect.
9)
102
17) Stress and intonation patterns: Different languages give different phonological
clues for what is seen to function as a unit. English often uses stress and intonation.
Government jargon is often coined through these means. Consider political matters
memorandum (see Pawley 1986:108).
18) Invariable constituents or grammatical frame: The demanding and rhetorical Who
do you think you are? does not have the same impact in the future. Kick the bucket
does not mean the same when put in the passive. The thought had crossed my mind,
and he took the law into his own hands are unnatural in the passive. Compare also
stripped down formulaic sentences easier said than done, spoken like a man!
There are also syntactically irregular or archaic idioms like easy does it, no go, no
way, be that as it may, (she) wants in, once upon a time.
19) Use of definite article on first mention: In English this can indicate the
conventionalized nature of the object, showing the speaker assumes the identity is
understood by the addressee: the fire department, the foreign legion, the eight ball.
20) Writing conventions: Where there is a written tradition these may provide clues to
perceived status as a unit. Capitals may indicate lexemes that are not typical proper
nouns: Third World, Big Bang, Inner City. Beware that where a society has the
luxury of supporting a literary community, some writers manipulate the use of
capitals for unconventional purposes. Quotation marks may also indicate unitary
status: he was considered a bad boy. Orally, some speakers use so-called or a
preceding pause to mark an equivalent to quote marks.
21) Unpredictability of form-meaning relation in semantic idioms: kick the bucket,
chew the fat, shoot the breeze.
22) Arbitrary selection of one meaning: Notice that button hole is a hole FOR putting
buttons THROUGH, whereas bullet hole is a hole MADE BY bullets, posthole is a
hole FOR setting posts IN, etc.
23) Use in ritual language of parallelism: This is a special case of (2) and (8). Ritual
language in parallelisms is widespread. It is found, for example, in Biblical Hebrew
and many Austronesian languages, particularly in eastern Indonesia (Fox 1988).
Existence as a paired entity in this context is sufficient for justifying its status as a
conventionalized unit, and hence a lexeme.
Refer to Pawley (1986) for additional examples and more detailed discussion.
6.1.1 Affixes
Affixes should be entered into the lexical database, both for the resulting dictionary and
for interlinearizing. When entries for affixes are generated through the process of
6: Structuring information in lexical entries
103
interlinearizing, it is helpful to keep track of them on a piece of scrap paper and add the
hyphen to the key field later, as appropriate. Entries for affixes tend to map grammatical
functions and be less straightforward than entries for lexical roots.
\lx
\ps
\ge
\re
\de
ep- [prefix]
Vpref
CAUS
causative
causative prefix, usually
indicating direct causation
\lx
\ps
\ge
\re
\de
-n [suffix]
Nsuf
3sG
his ; hers ; its
his, hers, its; third
singular genitive suffix,
normally indicating a
physical or conceptual
part-whole relationship
\lx
\ps
\ge
\re
\de
<um> [infix]
Vinf
UF
undergoer focus
undergoer focus marker
genitive
suffix,
normally
indicating a physical or conceptual
part-whole relationship.
-bate
(ma)-bate
n
abundance
\lx
\lc
\ps
\ge
\de
-bafa
(na)-bafa
v
ambush
wait in ambush
104
(ma)-bate n. abundance.
MDF substitutes \lc for the headword when printing. This presents a dilemma, since until
the local audience learns how to parse words (which takes an educational infrastructure
and time) they may not know where to look up a word. MDF menu options allow the user
to choose whether these entries should be sorted by the \lx field or the \lc field. Because
of the nature of citation forms, sorting on the \lc will probably result in many languages in
certain sections of the printed dictionary being disproportionally huge.
6.2 Choosing example sentences
Why are sentences like See Spot run or Run, Spot, run! not good example sentences for a
lexicon?
An excellent discussion of example sentences is found in chapter 9 of Bartholomew and
Schoenhals (1983). A few of their points are summarized here:
Illustrative sentences serve both the compiler of the bilingual dictionary and its
user. During the process of eliciting illustrative sentences, the compiler becomes
aware of sense discrimination co-occurrence restrictions on classes of lexical
items, or grammatical restrictions which he had overlooked. (1983:59)
2)
3)
4)
5)
In other words, a well-chosen example sentence can be made to work for you,
highlighting some of the characteristics that may still be unclear from the definition.2
Good example sentences, of course, should be complete, grammatical, and preferably
natural. In addition:
A good illustrative sentence supplies a specific context which helps to define the
word being illustrated. Such a sentence should include at least one of the salient
characteristics of the word under consideration. in many instances it should be
possible to deduce the meaning of the word even if one were unfamiliar with the
105
gloss. Characteristic subjects or objects may be used with verbs to provide mental
clues as to the specific action indicated. Other useful contextual ideas include
instrument, location, or cause and effect relationship. (Bartholomew and
Schoenhals 1983:60)
TIP: Many of the characteristic associations or typical co-occurrences should be
mapped out in the lexical functions fields (\lf bundlesee chapter 7). Procedurally, we
recommend not eliciting or selecting illustrative sentences for a lexeme until most of
its lexical relations have been fully explored. Not only does this give the compiler a
more rounded picture of what s/he is dealing with, but it also gives the language
assistant(s) a broad and freshly explored context for thinking about example sentences.
One can then concentrate on choosing example sentences that are dynamic,
memorable, or even dramatic, as well as illustrative.
Bartholomew and Schoenhals (1983:61ff.) list with examples in Spanish and English the
following associational categories which can be included in an illustrative sentence as
context for the lexeme.3
1)
Characteristic attribute: He wore his red berang cloth across his chest to do the
war dance. She used the sharp katanan to peel the cassava.
2)
Characteristic behavior or action: Motin causes recurring fever, chills and shakes.
Geba emsihi often stagger home after drinking too much palmwine with their
friends.
3)
Characteristic use: My father has a kupan elen in which he keeps his valuables
out in the garden house. We use kelambu around our bed to keep out mosquitos
and other bugs.
4)
Characteristic position or location: My father left his waga at the shore after
paddling it across the lake. The warriors todo is kept in its scabbard.
5)
Characteristic material: We used split bamboo to make our hese [wall] on our
new house. Hunters make suran [spike traps] from uka bolo [bamboo sp.].
6)
7)
106
emteno [heavy (of people)], but the gunny sack of copra is beha [heavy (of
things)].
8)
9)
Abstractions or general classificatory terms: When all the grain in the bin had
been either eaten or planted, the grainbin was fuun [empty]. When he saw the
isaleu [python] in the jungle, he felt emgihi [horrified and grossed out], and moved
way quickly.
10) Part-whole relationships: The pigs ngisnap [tusk] was four fingers long. The
sufen [doorway] is where people go into the house.
11) Synonym or class name: Lian [caves] are holes in cliffs big enough for people to
sleep in. A yoho [civet cat] is small animal like a wild dog or cat that lives in the
jungle...
12) Comparison: Gehut rali doesnt have purple speckles like the traditional taro has.
Geb masi [coastal people] do not know how to survive in the jungle as well as
geb fuka [mountain people].
Bartholomew and Schoenhals (1983:6469) also have a good discussion of dos and
donts for obtaining good example sentences.
CAUTION: Avoid sentences created by non-native speakers or by the foreign
compiler. And avoid using translated materials as a source for illustrative sentences. If
one uses sentences extracted from natural text, remember that running text provides
context. Extracting a sentence from that context often leaves it depending on implicit
and presupposed information, or with anaphoric pointers that have nothing to point to.
Thus, while the sentence makes perfectly good sense in context, it may seem
incoherent or even ungrammatical to a native speaker when removed from context. It
is thus important to edit and check such sentences with the assistance of a skilled
native speaker before using them in isolation in the lexicon.
6.3 Different words or different senses? (homonymy vs. polysemy)
When a single form can function in more than one category without any explicit
derivation, the lexicographer must decide whether to handle them as homonymy (same
form but unrelated meaning, therefore separate lexemes), or as polysemy (same form with
107
range of related meanings, therefore subentries or multiple senses of the same lexeme).4
The following figure illustrates various relationships between categories as they relate to
homonymy and polysemy.
In one sense it is a moot point whether we should view the problem of lexemes like sail
(n) and sail (v) as a zero derivation or as part of the lexicon whose form class
membership is syntactically defined, if both views result in them being handled the same
way in the dictionaryas subentries of a single entry.
However, if there is a distinction in the lexicon between, for example, the following
categories, then we must indicate each portion of the lexicon as a different category:
Category A that part of the lexicon that is inherently nominal and must take verbal
derivations to function verbally.
Category B that part of the lexicon that is inherently verbal and must take nominal
derivations to function nominally.
Category C that part of the lexicon that can function in either capacity with either no
derivation or with either derivation.5
4Zgusta (1971:80-89) recognizes a vague intermediate status which he calls partial homonymy and
One possibility, as suggested in chapter 9, is to use a broader term, such as relater, where the
membership is more flexible than strictly preposition or conjunction. Another possibility is to distinguish
something like Headword n (= inherently nominal), from Headword As n (= flexible membership
syntactically defined).
108
The critical evidence for deciding between different senses of the same word (polysemy)
and different words (homonymy) is a corpus of natural text examples. Serious
lexicography assumes the presence of a large body of natural text, and an ability to cull
through those texts to see the range of meaning encompassed by a lexeme and if and how
they contrast.6 Mental searching by itself is inadequate.
How does one decide, for example, that just1 only and just2 fair, morally right are
separate lexemes, whereas just1 has several related senses 1) only (just sugar), 2) simply,
merely (theyll just have to go home), 3) exactly (as in British English she sat just there)?
In working through the following principles, it is wise to get a variety of native speaker
judgments rather than simply (just1, sense 2) relying on the intuitions of the compiler.
The process is dynamic. The lexicographer should plan to revise and refine entries that
are suspected to involve homonymy and polysemy.
Principles
1)
2)
If the difference is mainly one of different part of speech (\ps) and the language
has a large segment of vocabulary where part of speech is a function of the syntax
(slot in a sentence) rather than of the lexicon (something inherent in the word
itself), then consider handling them as different senses of the same lexeme.7
Consider:
shower v. washing the body standing under running water;
n. 1) the place used to shower (v). 2) the fixtures used to shower (v).
jalan [Indonesian] vi. go, walk, move;
n. path, trail.
3)
6The computer program FIESTA provides fast, interactive concordance capabilities for a text corpus of
the size normally processed by the average linguist or anthropologist. It can be ordered from International
Computer Services, Box 248, Waxhaw, North Carolina 28173 USA. This is the same address used for
ordering SHOEBOX.
7Some commercial English dictionaries have made an editorial policy where different parts of speech are
always handled as separate lexemes. But this grows out of a view of language that is often inaccurate
when put up against the data, assuming part of speech is something that is inherent in the lexicon. It is
also a natural consequence of lexicographers artificially removing words from communicative contexts
and isolating them as atomic units to organize in a alphabetical listing. Any four-year-old can see a
relationship between, for example, cook (v) and cook (n).
109
assistants to hypothesize about shared meanings, but one should have a healthy
disrespect or skepticism about accepting folk etymologies.
4)
\lx
\hm
\ps
\ge
fuka
1
vt
open
\lx
\hm
\ps
\sn
\ge
\sn
\ge
fuka
2
n
1
mountain
2
island
Assuming shared meaning, different senses tend to have different lexical networks
as mapped out in the lexical functions (\lf). Most lexicographers tend to limit
themselves to examining near synonyms with a paraphrase test, which is good, but
it need not be limited to synonyms. [CAUTION: Having different lexical networks
is also true for homonyms, so one must first establish the related meaning.] For
example, with fuka1 above:
\lx
\hm
\ps
\sn
\ge
\de
\lf
\le
\sn
\ge
\lf
\le
fuka
1
vt
1
open
open, reveal, undo,
unfasten
Syn = holik
open, undo
2
explain
Gen = prepa
speak, say
fuka1
In the example above, both senses share the idea of revealing in 1) things; in
2) knowledge. But if holik were substituted, one would not normally interpret it as
explain, and if prepa were substituted one could not interpret it as open,
unfasten.
110
\lx
\ps
\sn
\ge
\lf
\le
\lf
\le
\lf
\le
\sn
\ge
\lf
\le
\lf
\le
epmata
vt
1
kill
Nug = geba
people
Spec = fage
spear s.t.
Spec = rasi
poison s.o.
2
extinguish
Nug = bana
fire
Spec = skahik bana
pull apart logs to
let fire die
\lx
\ps
\sn
\ge
\de
\nt
caan
v
1
sense
hear, listen, sense
Passive hear, sense
or active listen
SynD = prenge
hear, listen [Lisela]
2
obey
Syn = hai
follow, obey
\lf
\le
\sn
\ge
\lf
\le
5)
epmata
111
6)
There is more likely to be ambiguity between different senses of the same word
than between different lexemes.
For example big rodeo is ambiguous between the sense of large and the sense of
important.
7)
man
n
1
adult male
human [specific]
\sn 2
\ge human [generic]
\lx
\ps
\sn
\ge
\de
man
beton
Time
1
night [part]
nighttime, period of
darkness in the normal
daily cycle of dark and
light
\sn 2
\ge day [whole]
\de entire 24-hour cycle. A
period of time telling
number of days travel,
number of days since
s.t. happened, etc
\lx
\ps
\sn
\ge
\sn
\ge
\sn
\ge
bia
n
1
palm [generic]
2
sago [specific]
3
paste [part]
n. 1) adult
2) human.
male
human.
Cautions
1)
Lexemes can have meanings that are historically related, but which are currently
considered different words by native speakers.
For example, Spanish caballero in its technical parse, and historically, meant
horseman. Because historically only aristocracy were allowed to ride horses,
112
3)
basa
1
vn
spicy
\lx
\hm
\ps
\ge
\bw
basa
2
n
language
Sanskrit via Malay
fi:bahasa
\lx
\hm
\ps
\ge
beta
1
vt
connect
\lx
\hm
\ps
\ge
\re
\de
\bw
beta
2
PRO
1s
I ; me
I, me
Malay
Where the vernacular language is genetically related to the national language the
differences between loans and inherited vocabulary may be more difficult to
unravel. For example, the vernacular language (Buru), the national language
(Indonesian) and the regional lingua franca (Ambonese Malay) all belong to the
Austronesian language family. Both Indonesian and Ambonese Malay are derived
historically from different strains of Malay (B.D. Grimes 1991). Both are sources
for loans in Buru. Sometimes the forms can be identified by principles of historical
and comparative linguistics, but there should be cautions, in that semantic shifts
113
can also take place. Both words in each of the following pairs of words have the
same ultimate historical source, but one member of each pair has been directly
inherited from the parent language, whereas the other member has taken an indirect
route.
\lx
\ps
\ge
\et
\eg
fofo
n
fish_trap
*bubu
fish trap
\lx
\ps
\ge
\bw
bubun
n
fish_trap
Malay
\lx
\ps
\ge
\et
\eg
fina
n
female
*binay
female
\lx
\ps
\ge
\bw
bini
n
wife
Malay
trap.
[inherited vocabulary]
bubun n. fish trap. From: Malay.
[borrowed word]
fina
n. female.
female.
Etym:
*binay
[inherited vocabulary]
bini n. wife. From: Malay.
114
\lx
\ps
\sn
\ge
\sn
\ge
\sn
\ge
\de
tal
n
1
shank
2
thigh ; upper_leg
3
rod
supporting rods (of
chair, etc.)
\lx
\ps
\ge
\de
tal
n
shank
shank, thigh, upper
leg, supporting timber
tal
115
\lx
\ps
\sd
\ge
\de
\lf
\le
\pd
hete
vt
Vcut
cut
cut into sections for use
Gen = lata
cut
-k
This information tells us (following C. Grimes 1991) that this entry shares a basic
structure with other cutting verbs:
Subject:Actor:agent DO:cut (Object:Undergoer:patient)
(uses preposition tu + instrument)
What distinguishes one cutting verb from another tends to be differences in manner,
typical instrument, typical object, and occasionally typical agent or purpose. A carrying
verb looks something like the following:
\lx
\ps
\sd
\ge
\de
leba
vt
Vcarry
carry_w/pole
carry on the shoulder with
a pole. Includes object at
one end, objects at both
ends, or object in the
middle carried by two
people
\lf Gen = ego
\le get, take, transfer control
\pd -h
8Figure is the object whose location is in question. Foley and van Valin (1984) use the term theme.
With carrying verbs only one oblique argument is normally expressedthe one most salient to the
discourse.
116
\lx
\ps
\sd
\ge
\re
\de
ama
n
Nkin
F
father ; uncle
father
ama n. father.
flehet
n
Ncult ; Ninstr
sago_pounder
2)
3)
4)
Froms with different distributional networks: similar lexemes used with different
collocational, contextual, syntactic, or morphological constraints in different
dialects.
For example, American English advertisement [advrtaizmnt] carries different stress and
vowel quality in Australian and British English [advrtIzmnt] (#1 above). American
English forest includes areas filled with unplanted trees, whereas Australian English
forest implies that the trees were planted (#3 above). American English supper implies
the meal at the end of the day, whereas Australian English supper implies a late evening
9This phrasing is adapted from the title of Simons (1979).
117
dessert, rather than the meal (#3 above). American English flashlight has a dialectal
equivalent in British and Australian English torch (#2 above). But American English also
has a word torch which implies using flame for light (this suggests #3 above). However,
British and Australian English also use the word torch with the sense which implies using
flame for light as does American English. Thus, torch in the two dialects can be said to
have the same meaning, different meanings and different distributional networks (#4
above).
To mix all dialect variations into a single amorphous cauldron without identifying a
primary dialect and without identifying which dialect the variants belong to is confusing
to language learners, misleading to comparative linguists, and disappointing to local users
who often want the dictionary to give them a strong sense of this is us; this is our
language! The mixed dialect approach belongs to nobody and represents nobody.
[CAUTION: Dialectal variants other than the dialect that is targeted as primary must be
explicitly identified.]
A complication arises in the multipurpose nature of the lexical databaseit is not just a
dictionary, but it is a receptacle for cataloging other information as well. Some field
workers want to use the lexical database as a place to catalog all known variations among
dialects. And of course, the lexical database is the appropriate place to do this, even
though it may not be appropriate to print all that information in a published dictionary for
certain audiences. Some linguists must catalog dialect variants to appropriately use the
Computer Assisted Related Language Adaptation [CARLA] programs for adapting texts
from one speech variety into a related speech variety.
MDF is structured on the assumption that one dialect is identified as primary in the
introduction to the dictionary. Thus, if no other information is given to the contrary, an
entry is assumed to represent the primary dialect. All major dialects should be identified
in the general introduction to the dictionary, and a dialect map should be included. If an
entry represents a different dialect, that dialect should be explicitly identified in the \ue
(usage) field bundle. Below are two related entries, the first representing the primary
dialect (Masareteand so is unmarked), and the second representing other dialects
(Lisela, Ranamarked in the \ue field).
\lx
\ps
\ge
\re
\de
apu
n
lime
lime ; chalk
lime slaked from burning
seashells and used as an
ingredient in chewing
betelnut
\et *apuR
\eg lime, chalk
118
\lx
\ps
\ge
\re
\de
ahul
n
lime
lime ; chalk
lime slaked from burning
seashells and used as an
ingredient in chewing
betelnut
\ue Lisela, Rana
\bw Kayeli
By itself, however, this pattern of using the \ue field does not cross-reference
semantically related forms. In the lexical functions fields (\lf) described in detail in
chapter 7, \lf SynD is provided for cataloging dialectal synonyms. In using the \lf field
bundle for this purpose, the contents of the \le field identify the dialect, rather than give
the gloss. The minor dialect entry should cross-reference the primary dialect form using
the \cf or \mn field bundles. The examples above are modified below to illustrate these
uses.
\lx
\ps
\ge
\re
\de
\lf
\le
\et
\eg
apu
n
lime
lime ; chalk
lime slaked from burning
seashells and used as an
ingredient in chewing
betelnut
SynD = ahul
Lisela, Rana dialects
*apuR
lime, chalk
\lx
\ps
\ge
\re
\de
ahul
n
lime
lime ; chalk
lime slaked from burning
seashells and used as an
ingredient in chewing
betelnut
\ue Lisela, Rana
\bw Kayeli
\mn apu
Some MDF users are annoyed by this strategy that prints the dialect name in single quotes
following the general strategy MDF uses with the \le field. Where dialect differences
represent different lexemes altogether, using \lf SynD = is certainly appropriate. But MDF
also provides the \va bundle of fields for handling dialectal variants (i.e. \va, \ve, \vn, \vr)
where the dialectal variant is given in \va, \ve gives the English version of the dialect
6: Structuring information in lexical entries
119
name and/or any pertinent comment, which MDF will print enclosed in (parentheses), \vn
the national language version of the dialect name or comments, and \vr the regional
language version of the dialect name or comments. The \va (variants) field is dual
purpose. It is intended for identifying structural variants or spelling variants in the
primary dialect (e.g. \lx examination, \va exam; \lx cannot, \va cant; \lx arent; \va aint).
It can also indicate the forms of other dialects. The following example is from
Indonesian:
\lx
\ps
\ge
\re
\de
\lf
\le
\va
\ve
\va
\ve
\va
\ve
\va
\ve
tidak
NEG
no
no ; not
no, not; standard negation
targeting the predicate
Sim = bukan
negator of nominal
arguments
tak
formal, written
seng
Ambonese Malay
sonde, son
Kupang Malay
tara
North Moluccan Malay
targeting
There are additional fields that are appropriate to use for clarifying dialectal information.
Complex information on semantic differences, social usage, forms, or distribution can be
spelled out at length using the \ns (notes on sociolinguistics) field. In addition to the \ue
(usage) field described above, the often underutilized \oe (restrictions) field could be
used to explain forms that are restricted to certain dialects.
120
1While known in most of the literature as lexical functions, some also use the term lexical relations to
avoid the potential for confusion with LFG [Lexical Functional Grammar] with which it has no relation.
2Additional actor nouns are also associated with these verbs, but with more specialized senses. E.g.
mountains from his village to mine, to tell me follow-up information about some lexical networks we had
been exploring together more than a year before when I had lived in his village. He thought it was
interesting information that I should know.
7: Lexical functions
121
J. Grimes (1992:125) similarly reports about his work among the Huichol (a Uto-Aztecan
language of west-central Mexico):
The intriguing thing about following the paths defined by lexical functions is that
the informants themselves, even when totally unsophisticated by academic
standards, have an intuitive grasp of what is going on and become more and more
interested. It was not uncommon for me to have Huichol friends who stopped by
casually to see what was going on come back a day or two later after having
thought of another lexical correlate, or having remembered a form the rest of us
had on the tip of our tongue but couldnt quite remember. I have never seen that
level of involvement when working on syntax.
Delayed reaction was normal. After we thought we had exhausted the lexical
neighborhood of one word and gone on to another, values of other lexical functions
of the first word would pop into peoples heads. They would interrupt, and we
would go back and fill in. We made it a regular procedure to stop every so often
and ask each other, What else? It was impossible to simply work our way down
a list; we were traveling around and back and forth within semantic neighborhoods
most of the time.
The bundle of field markers used for lexical functions (or a subset of them) is found
below. They can be inserted as needed in SHOEBOX through the DATABASE TEMPLATE,
manually, or through the use of a MACRO.
\lf
\le
\ln
\lr
[lexical function]
[English gloss of lexeme in \lf field]
[national language gloss of \lf field]
[regional language gloss of \lf field]
\lf bundles can be used recursively within a record as needed. Using a limited number of
field markers simplifies the formatting for later printing a dictionaryall lexical
functions are handled in the same way for printing. Using the FILTERS in SHOEBOX
provides for powerful search and retrieval possibilities.4 The format for using the \lf field
bundles is as follows:
4For example, a filter set up as [lf|Ant] allows one to look at all antonym relations in the lexicon.
122
\lx
\sd
\ps
\pn
\ge
\re
\de
huma
Ncult; Nhouse
n
kb
house
house ; hut ; building
; dwelling
any building or houselike
structure for shelter or
shade
rumah
Group = fenlale
village
kampung
Part = heset
wall
dinding
Part = atet
roof, thatch
atap
Part = subu
door
pintu
Mat = kau okon
tree bark
kulit kayu
Mat = srahen
split bamboo
bambu
Spec = humkolon
garden house, grain bin
rumah kebun
Spec = huma endefut
residential house
rumah tinggal
Spec = huma braun
meeting house
baileo, balai desa
\gn
\lf
\le
\ln
\lf
\le
\ln
\lf
\le
\ln
\lf
\le
\ln
\lf
\le
\ln
\lf
\le
\ln
\lf
\le
\ln
\lf
\le
\ln
\lf
\le
\ln
\dt 9/9/90
Below is a brief listing with description of lexical functions used in the Encyclopedic
Dictionary of the Buru Language (ms). Additional lexical functions which have been
shown to be relevant for languages like Russian or English, but which we have not yet
found to be applicable to Buru may be found listed and described in the works of Igor
Melchuk and Apresyan (various, cited in bibliography), or in J. Grimes (1990, 1992, and
ms). Applying them to a specific dictionary project and interaction with language
assistants using lexical functions is described in C. Grimes (1987). In several cases a
number of Melchuks functions have been generalized under a single lexical function for
7: Lexical functions
123
ease of learning and use. The abbreviations in Melchuks or J. Grimes schema are given
in square brackets following the description to link the MDF lexical functions with
comparable or closely related lexical functions in their systems. The symbol [~] indicates
similar to or encompasses.
*********************************
Syn
beka
AUX
first
first (before doing
s.t. else)
\lf Syn = peni
\le first (before doing
s.t. else)
SynD
inhadat
n
mosquito
SynD = senget
Rana, Lisela
Rana, Lisela.
ka
TAM
HAB
habitual aspect
SynL = jaga
Ambonese Malay
habitual
\oe fv:ka tends to be used
in nominal
constructions, whereas
fv:jaga tends to be
used in verbal
constructions.
124
SynL
SynR
SynT
Gen
irung [Javanese]
n
nose
SynR = grana
H [Krama Inggil]
Taboo synonym: Usually equivalent, but can also have non-taboo range of
meaning that is different. Often lexicalized circumlocutions. More localized
than SynD.
\lx
\ps
\ge
\lf
\le
minjangan
n
deer
SynT = wadun
deer, (back of neck)
\lx
\ps
\ge
\lf
\le
\et
\eg
uran
n
shrimp
SynT = sehe
shrimp, (reverse)
*uDang
shrimp, lobster
uran
feten
n
millet
foxtail millet
Gen = agat
grain
7: Lexical functions
grain.
125
\lx
\ps
\ge
\de
\lf
\le
Spec
sgege
vt
carry
carry under-arm
Gen = ego
get, take, carry
lata
vt
cut
Spec = bisi
carve
Spec = hete
cut into sections for
use
\lx
\ps
\ge
\lf
\le
\lf
\le
enhero
n
spear
Spec = pangneet
six-barbed spear
Spec = pangat goit
special spear for
killing humans
six-barbed
spear;
Spec:
killing humans.
Sim
Similar: Near synonyms or other terms at the same level of native taxonomy
that are subsumed under the same generic term and are relevant for clarifying
the headword. These terms are often given in describing the headword, saying
x is like y, but different. Normally, the more thorough list of the genericspecific taxonomy should be found under the generic term, rather than listing
many Sim under each specific. For Buru, reproducing all 17 cutting verbs under
each specific entry is not economical. [~ Syn^, Syn<, Syn>].
126
Nact
\lx
\ps
\ge
\de
\lf
\le
pangneet
n
spear
six-barbed spear
Sim = pangpaat
four-barbed spear
\lx
\ps
\ge
\lf
\le
bisi
vt
carve
Sim = dasa
cut to a sharp point
ekfilik
vt
sell
Nact = gebkaleli
merchant
sharp point.
merchant.
hete
vt
cut
cut into sections for
use
\lf Nug = kau bana
\le firewood
Nloc
n.
Nug
six-barbed spear.
Sim: pangpaat four-barbed
spear.
pangneet
agat
n.
grain
humkolon
(dried). Nloc:
grain storage
house.
5Using Nact, Nug, Ninst, etc. is a different strategy from the N0, N1, N2, N3 used by Melchuk and
company. We find our current system far more practical both for remembering and for training others to
use lexical functions.
7: Lexical functions
127
Ninst
Nben
Instrument noun: Instrument associated with the action of the headword; the
instrument implied if unspecified. [~ S3, N3].
\lx
\ps
\ge
\lf
\le
bisi
vt
carve
Ninst = katuen
machete
\lx
\ps
\ge
\lf
\le
dihi
vt
comb
Ninst = dihit
comb (n)
(n).
soso
vt
nurse
Nben = anmihan
infant
infant.
oli
vi
return
Ngoal = huma
house, home
home.
iko
vi
go
Ndev = enyikut
(his/her) going
going.
128
katuen
Res
Ninst:
Ndev
vt. carve.
machete.
Benefactee: The one who benefits from the activity. The one implied if none
specified.
\lx
\ps
\ge
\lf
\le
Ngoal
bisi
mata
vn
die
Res = enmata
death
Whole
Part
bubu enitu
n
ridgepole
Whole = huma
house, building
\lx
\ps
\ge
\lf
\le
maen
n
handle ; shaft
Whole = enhero
spear
Part of the whole: The part, of which the headword is the whole.
\lx
\ps
\ge
\lf
\le
\lf
\le
Mat
huma
n
house
Part = kasa
rafter
Part = subu
door
atet
n
thatch
Mat = bia omon
sago palm leaves
palm leaves.
enyikut
n
going
Vwhole = iko
go
heka
vi
move
move away quickly
Serial = heka tuha
run off with s.o. or
s.t.
7: Lexical functions
s.t..
129
wedding celebration.
atet
n
thatch
Prep = sau atet
sew thatch
thatch.
fultimo
Time
east_monsoon
Phase = Samsama
lunar month around
August
around August.
130
epkiki
vi
dance
Sit = pesta kaweng
wedding celebration
Max
Preparatory activity:
\lx
\ps
\ge
\lf
\le
Phase
heka
Prep
heka
vi
move
move away quickly
Compound = hektatak
abandon s.t.
bana
n
fire
Max = pothaki
forest fire
fire.
\lx
\ps
\ge
\lf
reden
n
dark
Max = reden tuni walet
mite
\le pitch black
Min
bage
vn
sleep
Min = bagleak
nap, siesta
siesta.
Caus
\lx
\ps
\ge
\de
\sc
\lf
\le
tonal
n
cuscus
cuscus marsupial
Phalanger spp
Degrad = mefu
rotten
\lx
\ps
\ge
\lf
\le
kau
n
wood
Degrad = bono
decayed
kau
n. wood.
decayed.
Degrad:
bono
Start
n.
cuscus
marsupial.
Phalanger spp. Degrad: mefu
rotten.
tonal
emgea
vn
embarrassed
Caus = pemgea
embarass s.o.
bana
n
fire
Start = enhewek bana
light a fire
7: Lexical functions
light a fire.
131
Stop
Feel
Cessative: Final phase. [~ Fin (the situation ends), Cess, Liqu (s.o. causes the
situation to end), State].
\lx
\ps
\ge
\lf
\le
dekat
n
rain
Stop = dekat dere
rain lets up
\lx
\ps
\ge
\lf
\le
enein
v
work
Stop = deak
stop, rest from
activity
bana
n
fire
Feel = poto
hot
dole
n
frog
Sound = troo-troo
ribet
ribet.
132
Cpart
Sound
kete
n
parent_in_law
Cpart = emsawan
son-in-law,
daughter-in-law
daughter-in-law.
Ant
Head
Unit
Group
emhama
vn
light(weight)
Ant = beha
heavy (thing)
Ant = emteno
heavy (person)
noro
n
kin_group
Head = gebhaa
local kin group head
fafu
n
pig
Group = fafu reren
pig herd
\lx
\ps
\ge
\lf
\le
geba
n
person
Group = geba rano
crowd of people
\lx
\ps
\ge
\de
\lf
\le
uka
n
bamboo
bamboo (generic)
Group = uka lale
stand of bamboo
herd.
crowd of people.
uka
n
bamboo
Unit = uka walan
bamboo pole
UnitPart = uka kasen
section of bamboo
7: Lexical functions
133
ParS
ParD
Idiom
saka
DEIC
up
ParS = lepak
go up, ascend
ascend.
saka
DEIC
up
ParD = pao
down
\lx
\ps
\ge
\lf
\le
supan
Time
morning
ParD = emhawen
evening
ParD:
agat
n
grain
Idiom = aga lahin
inheritance
inheritance.
For an alphabetized starter list of the lexical functions described in this chapter, see
Appendix D.
Users can use this \lf bundle to adapt needs not explicitly mentioned in this Guide. In
other words, users can use the \lf bundles to create or customize their own categories and
labels, keeping in mind that what comes before the equals sign [=] is italicized as a label,
what comes after the equals sign is assumed to be vernacular and is formatted as such,
and what comes in the \le, \ln, and \lr fields is enclosed in single quotes. For example, one
user of an earlier version of MDF working in Africa wanted to use his lexical database to
keep track of other words that are phonotactically similar to the headword, easily
confused, and mean something else. We suggested using the \lf bundles and creating the
label Not =, with or without the \le field as follows:
134
\lx
\ps
\ge
\lf
amana
v
gloss
Not = almana, amanna
\lx
\ps
\ge
\lf
\le
\lf
\le
amana
v
gloss
Not =
gloss
Not =
gloss
almana
of almana
amanna
of amanna
amana v. gloss.
amanna.
Not:
almana,
amanna.
Notice that Not = has nothing to do with the concept of lexical functions itself, but it is
the formatting sequences of the \lf field bundle that is being borrowed for other
purposes. These \lf bundles can be adapted to the needs of the language and the needs of
the compiler. Similarly, the \ee (encyclopedic) bundle of fields contains no labels or
formatting and may be used as a general all-purpose field, not restricted to just
encyclopedic information.
7: Lexical functions
135
136
1Many of the ideas in this chapter are adapted with permission from notes and discussions with Prof.
Andrew Pawley of the Australian National University, who has been grappling with many of these issues
over many years in the course of compiling dictionaries of Kalam, a Papuan language of Papua New
Guinea, and Wayan, an Austronesian language of northeastern Fiji.
137
\lx bahut
\ge mahogany
\de mahogany; k.o. hardwood
tree that grows to...
\lx pelat
\ge nettle
\de stinging nettle; k.o. shrub
with leaves spanning...
leaves spanning. . .
But many languages have more complex systems with three or more levels, providing
intermediate levels of classification. The nomenclature at the highest (broadest) level of
the taxonomy are called life forms. The nomenclature at the lowest (most specific) level
of the taxonomy are called terminal taxa (often popularly referred to as species, with a
finer level referred to as subspecies or varieties). Between these extremes different
languages may have one or more levels of intermediate taxa. These intermediate taxa are
often tricky to sort out.
Finding the names of the terminal taxa (what they call x), while full of hidden pitfalls, is
relatively easy compared to finding the intermediate taxa and life forms. Exploring folk
taxonomies carefully requires finding the cultural-specific framework within which to ask
the appropriate questions. Often questions designed to explore similar things at the same
138
manut
flying creatures whose wings are big enough or move slowly enough to see
while flying, including birds, bats, and butterflies
In other words, the terminal taxa here involve the classifier indicating the generic term
under which these are grouped. In MDF the \th field is intended for listing the vernacular
generic term under which the \lx lexeme is the terminal taxon (see 2.1). Additionally, \lf
Gen = and \lf Spec = are provided for recording the next higher level and next lower
level of the folk taxonomy. When a generic term is the headword (\lx) all known specifics
\lf Spec = should be listed in that entry. For entries of each of those specifics, crossreference back to the appropriate generic with \lf Gen =. It is not economical to crossreference all other terminal taxa that group under the same generic term for each terminal
taxa, but \lf Sim = is provided to list those that are directly relevant to the headword (see
2.2 and chapter 7).
139
Cautions:
1)
The semantic range of life forms between any two languages is rarely isomorphic,
particularly between unrelated languages. For example, the kinds of things covered
by the Selaru term masy is not a direct equivalent of its English gloss fishthe
Selaru term includes the English fish, dolphins and whales (which technically are
not fish, but popularly are) and for some Selaru speakers can include certain
mobile shellfish such as lobsters, and perhaps sea slugs.
2)
In many languages, intermediate taxa and life forms may be expressed as verbal
propositions (e.g. those things that retract their claws, those things that have
roots), rather than as simple generic nominals (e.g. felines and plants).
3)
The system of folk taxonomy may have some, or little correlation with the
scientific taxonomy, so one should not expect a good match. This is because the
scientific taxonomy is built primarily around similarities and differences in
physical structures, whereas the folk taxonomies may put behavioral patterns, or a
different physical feature into greater salience for structuring their taxonomies,
particularly at the intermediate and life form levels of the taxonomy. That does not
make one system better, or the other worsethey are simply different. But to an
English or academic audience, the point of reference to identify native flora and
fauna through the native nomenclature is the scientific nomenclature. In other
words, the lexicographer must identify the native emic system and terminology
with reference to the scientific etic system and terminology.
4)
Just because what is covered by one native term is handled by two or more
scientific terms, or vice versa, does not mean the native community is unaware of
the physical similarities and differences in the species. Thus, there may not be
great discrepancies in conceptual correspondence in many of the terminal taxa (the
plants and animals we seethe species) between the scientific system and the
folk system, but there may be large discrepancies in the terminological
140
A number of scholars have observed that flora and fauna of high cultural
significance tend to be over-differentiated in their terminology. Thus, plants which
are intensively cultivated locally (yams, sweet potato, taro, rice, corn, cassava,
millet, barley), and animals which are hunted or domesticated and play an intense
role in economics, bridewealth, death, clan totems, or religion (pigs, cows,
chickens, cuscus, buffalo, water buffalo, etc.), tend to have enriched lexical
networks.
6)
7)
141
8)
Further reading: See Berlin, Breedlove and Raven (1966, 1973, 1974), Bulmer (1967,
1970), Casagrande and Hale (1967), Conklin (1962), Frake (1962), J. Grimes (1980a,b),
Lakoff (1987).
8.1.1 Plants
There are some features of use that give a particular plant relevance or prominence in a
culture and these should be noted, where found. However, use by itself is not sufficient
information for an outside user of the dictionary to identify the particular plant. The
dictionary maker must eventually choose which information is most relevant for the
published dictionary, but in SHOEBOX using the MDF codes, all available information
can be recorded and organized for later selection. The following are issues to be
considered:
Physical characteristics about the plants appearance3
1)
2)
What is the average size (of the trunk, leaves, flowers, fruit)?
3)
Is there a distinctive shape or texture (of the trunk, bark, leaves, flower, fruit)?
4)
What kind of flowers and fruit does the plant bear, if any? Do these have
distinctive color, smell, or taste? [Also list as \lf Part =].
5)
Can someone be trained to make an accurate sketch of the plant including detail of
the leaves, flower and fruit? [Use \pc ].
2)
Where does this variety grow? In the distant gardens, or on the edge of the village?
Near the ocean, or inland? In the lowlands, mountains, or coastal plains? In the
3Use meters and cm, rather than vague and relative terms such as tall, large, small.
142
If it is planted or cultivated, does it need special tending such as stakes for support,
weeding, or pruning?
4)
Is part of the plant used to make something? For example, is the wood used for
fence posts, house posts, rafters, bows, spears, firewood, or tools? Is the inner bark
used to tie things? If it is a vine, is it used as rope? Are the leaves used as plates,
for wrapping, for thatch roofing, or for weaving baskets or mats? Is the bark used
to make cloth or string? Are parts of the plant useful for making gourds or buckets?
2)
Is the plant (or part of it) eaten? If so, which parts are eaten? Is it eaten raw or
cooked? If it is cooked, are there special instruments, materials or preparation
needed? Is it cooked with certain other foods? Is it eaten with certain other foods?
3)
Does the plant (or part of it) have other uses besides as food and utensils, such as
for medicine, oil, poison, glue, dye, perfume? Is it the leaves, the inner bark, the
sap, the roots, the fruit or the flowers that are used? How are these prepared?
4)
Is there a special social value associated with the plant? For example, is it fit for
presentation to nobility, or is it eaten only during famine when other foods are not
available?
2)
Is there special symbolism associated with the plant that requires its presence at
certain ceremonies? For example does it symbolize cool things, peace, prosperity,
longevity, promises?
3)
Does the plant function as a totem that is emblematic of a certain social group?
4)
Are there prayers or incantations associated with proper preparation of the plant?
5)
If it is planted, do both men and women plant it, only one sex, or are the different
sexes involved in different phases of the planting?
6)
143
Varieties
1)
Are there several kinds of this plant? Do they each have distinct names? Under the
most appropriate generic term list the varieties with at least one distinctive feature.
[Use \lf Spec = ]. Use the JUMP feature of SHOEBOX to create separate entries for
each of the varieties, also cross-referencing the generic term. [Use \lf Gen = ].
2)
Are there other names for the same plant? [Use \lf Syn = ; SynD = ; SynR = ;
SynT = ].
3)
Are there special lexemes associated with phases or stages of this plants growth?
[Use \lf Phase =].
8.1.2 Animals
Distinctive physical characteristics of the animals appearance
1)
2)
3)
What are the differences in size, shape, color, or other aspects of appearance
between males and females? Between infants, adolescents, and adults?
4)
5)
2)
3)
4)
5)
6)
7)
144
8)
Is the animal present year round, is it seasonal, or does it appear only occasionally
during times of drought or major storms in other areas?
9)
10) Does it have a characteristic call, or cry in a distinctive way? [Use \lf Sound =
(ribet)].
11) Does it have a characteristic smell?
12) Is it poisonous or aggressive, or otherwise dangerous to people?
Uses
1)
Is it eaten by people?
2)
3)
4)
Are parts of it used for other purposes? E.g. are its skin, bones, sinews, milk,
blood, eggs, horns, fur, or feathers useful?
5)
Is the animal used for other purposes? E.g. is it used for hunting, herding, carrying,
or pulling heavy loads? Is it kept as a pet?
6)
If it is domesticated, how is it raised? If it can be tamed from the wild how is that
done?
7)
If it is hunted, how is it caught? Note that for culturally important animals there
may be many ways. Are special implements used?
Are there special beliefs about this animal? E.g. when some animals behave in
certain ways they are thought to be omens; some societies believe that certain
animals can turn into humans and vice versa; in some societies snakes are
associated with evil or with spirits, whereas other societies consider them to
represent wisdom, or shrewdness.
2)
Are there taboos associated with this animal, or restrictions associated with killing
or eating it? Are there avoidance patterns associated with saying its name? Do
these types of taboos apply to society at large, to only certain segments of society,
to certain individuals, or to certain locales?
145
3)
Does the animal have special value? For example as a totem, in ceremonies, for
serving honored guests. For example, in Buru the head of a wild pig or cuscus is
given to an honored guest or belongs to the successful hunter. For domestic pigs, a
plate full of large cubes of pure pig fat is given to honored guests, whereas plain
meat is for the common man.
4)
Is the animal considered a pest? Are there special activities for dealing with this?
5)
Are there commonly known fables associated with this animal? Does the fable
explain prominent physical characteristics or a characteristic call (e.g. that is why
x has a short tail)?
Varieties
1)
Do males and females have different names? [Use \lf Male = (stallion, boar, bull);
\lf Female = (mare, sow, cow)].
2)
Do infants, adolescents and adults have different names for different stages of
maturity? [Use \lf Phase = (lamb, calf, piglet, puppy)].
3)
Are there different kinds (varieties) of this animal encompassed by a single term?
4)
Are there other names for the same animal? [Use \lf Syn = ; SynD = ; SynR = ;
SynT = ].
8.1.3 Birds
The guidelines for birds are generally the same as for animals above, but particular
attention should be paid to:
1)
2)
3)
4)
5)
6)
Special calls, particularly those that are characterized by their own lexeme. Some
birds have a variety of calls.
146
7)
Special cultural significance. For example, the call of certain jungle birds may be
associated with time to get up before dawn; others with the spirits of the dead (as
the hoot of an owl in Europe).
8)
8.1.4 Fish
The basic guidelines for fish are the same as for animals in general above, but paying
particular attention to:
1)
Habitat: freshwater, saltwater; river source, deep pools, river mouths; clear water,
murky water; tidal pools, surf, reefs, rocks, sandy bottom, deep ocean.
2)
Fin structure.
3)
Feeding habits.
4)
5)
6)
7)
8)
8.1.5 Insects
The basic guidelines for insects are similar to those for animals in general above, but
paying particular attention to:
1)
2)
Different phases of growth and if the local culture associates, for example, the soft
grub growing in the rotten log with the hard-shelled beetle that eventually emerges,
or associates the caterpillar with the cocoon, with the butterfly.
3)
Which insects are normal food, which are famine food, and which are never eaten.
How are they collected, processed and cooked?
4)
147
What is it part of? For English the point of reference is the next larger body part,
rather than the whole. For example, finger is made with reference to hand rather
than to body. [Use \lf Whole = ]
2)
Where is it? What does it attach to? Where does it start and end?
3)
What other important parts are contained within this part? For example, the head
includes: eye, ear, nose, mouth, hair, forehead, cheek, chin, temple, brain, etc. The
mouth includes teeth, tongue, gums, and in some languages lips. [Use \lf Part = ].
4)
What is it used for? For example, teeth are for biting and chewing.
5)
Does this body part term apply to both humans and animals? Does it extend to fish
and insects?
6)
Are there social values or avoidances associated with this body part?
7)
8)
Are there idioms associated with it? Do these idioms reflect slang or normal
speech? [Use \lf Idiom = (hes got a hole in his head; hes foot-loose and fancyfree)].
9)
Is there a picture that can be included (where socially appropriate)? [Use \pc ].
Is the term (\lx) a term of reference (talking about s.o.) or a term of address
(talking to s.o.), or may be used in both ways?
148
2)
3)
Is there a special reciprocal form involving this term and another? For example:
\lx
\ps
\sd
\ge
\lf
\le
feta
n
Nkin
sister_(m.s.)
Group = feta-sar-naha
reciprocally brothers and
sisters, referring to same
generation males and
females of different kin
groups that link to a
common grandparent
\lx
\ps
\sd
\ge
\re
dawe
n
Nkin
WB
wifes brother
; brother-in-law
\de wifes brother,
brother-in-law
\lf Group = tal-dawe
\le reciprocally brothers-inlaw, referring to men of
different fv:noro who have
married each others
sisters
4)
reciprocally brothers-in-law,
referring to men of different
noro who have married each
others sisters.
Are there variants or modifications of basic kin terms? For example, are there ways
to specify male and female forms?
\lx
\ps
\sd
\ge
\re
\de
\lf
\le
\lf
\le
5)
opo
n
Nkin
PP ; CC
grandparent ; grandchild
grandparent, grandchild;
signifies plus two or
minus two generations
Male = opomhana
grandfather, grandson
Female = opolfina
grandmother, granddaughter
Can the kin term also be used in a verbal form, or in an extended sense? Consider
English he fathered another child, she mothered too much. Child in many
149
Definitions need to accurately encompass the range of meaning and usage of the
headword. Translation equivalents are dangerously misleading. Consider ama
glossed as father (but which actually includes all males of the first ascending
generation to ego in the clans of either parent); or ina glossed as mother (but
which actually encompasses all females of the first ascending generation to ego in
the clans of either parent); or anat glossed as child (but which actually includes
all offspring of the first descending generation to egos classificatory brothers and
sisters).
7)
2)
3)
4)
Who uses them, and under what circumstances? Who does not or may not use
them, due to cultural norms or cultural taboos?
5)
150
6)
7)
Are they involved in ritual or commercial exchange? For example, there may be
mats, or cloth, or cooking pots that are exchanged in one direction by certain kin
relations at marriage or death. If they are used in ritual exchange, are there other
items that are always used to reciprocate in counter-exchange? What are they?
8)
Are there metaphors built around, or associated with these items of material
culture?
What does it look like? Check for size, shape, color, and texture.
2)
3)
4)
5)
Where is it found?
6)
Does it move?
7)
8)
9)
151
Who normally does this activity? Is the action normally associated with a restricted
segment of society, such as men, women, young girls? [Use \lf Nact = ].
2)
hete
vt
Vcut
cut
cut s.t. into sections for
intended use
\lf Nug = kau bana
\le firewood
firewood.
3)
4)
What location is normally associated with the activity? [Use \lf Nloc = (jungle,
village, gardens)].
5)
What preparatory activity is necessary before the action can be done? [Use \lf Prep
= ].
\lx smoke (meat)
\lf Prep = cut (meat) into strips
6)
What resulting thing or state is produced by the action? [Use \lf Nres = ; \lf Result
= (cooked)].
152
2)
3)
Does this state or process have a special form or idiom for representing an
emphasized or an extreme degree? Consider, for example, black, very black, jet
black (of things), pitch black (of the surrounding environment). Notice that the last
two represent an extreme degree of black and can be handled with \lf Max =. In
some languages the very black relationship is expressed by a normal adverb, or by
reduplication, or an affix. These can be handled in the grammatical introduction if
they fit the normal paradigm. If they are unpredictable, the form indicating an
intensified degree should be mentioned in the entry.
153
154
Javanese speech levels can be divided roughly into three: the highest, called
Kromo; the lowest called Ngoko; and a middle level called Kromo Madyo or just
Madyo. There are no clear boundaries between these levels, and Madyo is a
continuum between Kromo and Ngoko. The highest level, Kromo, is the refined
level, marked by a special vocabulary of somewhat more than a thousand items and
a few affixes for which there are special Kromo variants. Kromo is employed to
persons of high status. . . Ngoko is the unrefined level with which speakers choose
to address persons with whom they are familiar and persons who are not of high
status. Ngoko is marked by use of non-Kromo forms for the 1,000 or so items for
which there are special Kromo variants.
There are further substrata within the registers mentioned above. Kromo is additionally
marked by precise diction and slightly marked intonation. Nothofer (1982:291) notes that
Kromo and Kromo Inggil vocabulary shows less dialectal variation than Ngoko.
Buru has a special taboo register that is spoken in the part of the jungle called Garan that
has no villages, but takes two days to traverse by foot. In that region the taboo is that
nobody is permitted to speak the Buru language, hence the development of this entire
special register called Li Garan the language of Garan. Most functors are the same as in
the common register, but many lexical items (nouns and verbs) have Garan-register
forms. These follow the phonotactics of the Buru language, but are different. For
example, Li Garan em-kise-n person, man replaces the common Buru geba person,
man. Kise normally means growing bald or having a high forehead. The special
language of Garan is described more fully in C. Grimes (1991) and Grimes and Maryott
(1994).
Speech registers, such as those in Javanese or Buru can be handled just as what was
described for handling dialect variation in 6.5. Thus, one can use the variant fields (\va,
\ve, \vn, \vr), the usage fields (\uv, \ue, \un, \ur), the restrictions fields (\ov, \oe, \on,
\or), the notes on sociolinguistics field (\ns), and here, the \lf SynR = (register synonym)
rather than \lf SynD = (for dialectal synonym).
155
Many languages use parallelisms in formal speech, ritual speech, poetry, ballads, or
prayers. These parallelisms tend to be of two basic types: the second member of the pair
means essentially the same as the first member (in this context), as is common in Biblical
Hebrew. Or the second member means approximately the opposite end of a scale from the
first member. These are provided for as \lf ParS = (same), and \lf ParD = (different). See
Fox (1971, 1974, 1975, 1977, 1982, 1988) and Moore (1993) for more discussion and
examples. An example of parallelism meaning the same is from a Rotinese poem (Fox
1982:313)
Te leo mafo ai-la hiluk
Ma sao tua-la keko
Na, Suti, au o se
Ma, Bina, au o se
Fo au kokolak o se
Ma au dedeak o se
Tao neu nakabanik
Ma tao neu namahenak?
An example from Buru that mixes same and different in parallelisms describes hunting a
wounded pig.
Kami iko lepak
iko logok
hama saka
hama pao.
We ascended
we descended
we searched high
we searched low.
A better known example is based on parallelism from Biblical Hebrew in Psalms 139:7
10 (Jerusalem Bible).
Where shall I go to escape your spirit?
Where shall I flee from your presence?
If I scale the heavens you are there,
if I lie flat in Sheol, there you are.
If I speed away on the wings of the dawn,
if I dwell beyond the ocean,
even there your hand will be guiding me,
your right hand holding me fast.
In many languages which words can pair with which other words is conventionalized in a
frozen or semi-frozen state, such that not just any two words can go together. For
example, in the Buru example above, lepak ascend and logok descend pair together
as opposites, but lepak ascend and pao down cannot. These distinctions should be
recorded in the lexicon.
156
9: Parts of speech
157
Form: In some cases the structural form of an entire form class distinguishes it
from other form classes. In Buru, prefixes can be distinguished from proclitics on
the basis of formprefixes always take the shape eC, while proclitics can take
any V and are of the shape CV (Grimes 1991:60). Also in Buru, certain classes of
functors may be monosyllabic, but classes using content words (e.g. nouns and
verbs) are never monosyllabic.
b)
Function: When we talk about an entire form class, or the behavior of a single
lexeme in the syntax we usually refer to its function, or its range of functions
1In SHOEBOX the \ps field is provided for English parts of speech, and \pn for the national language
parts of speech. While the terminology between the two may be different (for example, \ps n = \pn kb),
the categories should be the same, because one is targeting the categories of the vernacular, not of
English or of the national language.
158
what it does, or how it (and things like it) behaves in different contexts. For
example, in many languages that have prepositions, the function of the class of
prepositions is to relate non-core arguments to the verb and to identify the semantic
role that argument is playing. The function of prepositions contrasts with the
function of verbs.2 Schachter (1985:4) observes the preference that the
assignment of parts of speech classes is based on properties that are grammatical
rather than semantic. Thus, defining nominals as the head of grammatical
arguments in a clause is preferable to defining them as words that name persons,
places, or things.3
c)
Lexicographers may assign parts of speech on the basis of the gloss in the national
(or international) language, rather than on the syntactic behavior of the form class
in the language itself. In Buru, for example, we might be tempted to call saa an
article because it most commonly translates into English with the indefinite
article a. However, in exploring the whole morphosyntactic network it becomes
clear that saa is a member of a closed class of what Grimes calls deictics that
share a variety of formal, functional, and distributional properties (C. Grimes
1991:167175).
2Except, of course, in the case of prepositional verbs or serial verbs where a verb functions as a
9: Parts of speech
159
b)
Lexicographers tend to remain committed to the parts of speech labels that they
first assigned to a lexeme in the early analysis of a language (with associated
assumptions about the behavior of that part of speech), even after those labels are
shown to be inappropriate. Ideas about part of speech categories need to be refined
and updated in the development of a lexicon to reflect developments in the
understanding of the grammar.
c)
d)
When a lexeme can function in two or more classes (e.g. both nominally and
verbally, or as a preposition and a conjunction), lexicographers tend to assume that
it must be primarily one class, and only secondarily the other, assigning primacy on
the basis of external (etic, rather than internal, emic) criteria. This is the flaw of
the excluded middle.
e)
f)
Lexicographers often fail to distinguish verbal subcategories that are relevant to the
language, assuming the only relevant primary division for verbs in all languages is
limited to transitive or intransitive.4 As described later in this chapter, the
fundamental division for some types of languages is more complex than a simple
binary distinction.
g)
Lexicographers often tag multiple pronominal sets with terminology that is not
appropriate to the type of language, such as using case terms (e.g. nominativeaccusative or ergative-absolutive) for split-S languages or for pragmatically driven
4At a recent lecture a world-renowned linguist reiterated the notion that all languages divide verbs into
two types: transitive and intransitive. This simplification encourages linguists and lexicographers to be
blinded to what distinctions languages actually are making where the fundamental divisions are more
complex, such as in split-S languages, and blinded to notions such as ambitransitive and
intradirective.
160
catalog of the lexicon, presenting a serial list of lexemes isolated from natural speech
and organized around principles of retrievability of information.
That, together with ideas about what comprises a lexical entry encourages linguists and
lexicographers to slip into incorrect application of the Aristotelian principle that:
This lexeme cannot be both A and not A at the same time.
In other words, the thinking goes, this lexeme cannot be, for example, both a noun and a
verb; therefore it must be primarily one and only secondarily the other (for example,
through a zero-derivation),6 or they must be two different lexemes. However, the problem
arises out of the artificial nature of the dictionary in trying to assign parts of speech to
lexemes in isolation. It is not the case in normal speech that a lexeme is functioning as
both a noun and a verb at the same time. Where a lexeme is functioning in more than one
category, it is either in different utterances, or in different syntactic slots within the same
utterance. We explore below two areas in which the conflict commonly arises.
9.3.1.1 Are they adpositions or conjunctions?
A problem often occurs in assigning parts of speech to certain types of functors that
operate in a variety of syntactic slots. For example, in English:
5This fallacy was reinforced at another lecture by a well-known linguist with the statement 75% of the
worlds languages are nominative-accusative and 25% are ergative-absolutive. This characterization
blinds newcomers to well-documented language types such as split-S, active/non-active (~ stative-active),
and Philippine-type languages which are numerically significant in the worlds languages.
6This view often surfaces at linguistic seminars in lively debate over whether lexeme X is primarily
category A or category Band the implications for syntactic arguments that follow from that. The linear
nature of a dictionary forces sense A to precede sense B, and it is part of the conventional culture of
dictionary users to assume that the sense presented first is more basicand for other reasons this makes
good lexicographic sense.
9: Parts of speech
161
(1)
(3)
Tu dii, DISCOURSE
SENTENCE1. Tu SENTENCE2
CLAUSE1, tu CLAUSE2
[N tu N]Subject, Predicate
S V (O) tu NP
7The flexibility of this ambivalent portion of the lexicon may also vary between dialects of the same
language. For example, Australian English can verbalize many words that are not able to be verbalized in
American English. She is flatting (= She is renting a flat/apartment).
162
(4)
[verbal]
[nominal]
(5)
[verbal]
[nominal]
(6)
[verbal]
[nominal]
(7)
[verbal]
[nominal]
(8)
[verbal]
[nominal]
Some lexicographers are tempted to argue etymologically for the primacy of membership
in one form class over another, but unless there are clear synchronic derivational
processes, the arguments may be much more difficult to substantiate and tend to appeal to
elusive processes such as zero-derivation, which have no surface marking and which
assume the primacy of one part of speech over another. Where zero-derivation is
warranted, there must be surface evidence somewhere in the morphosyntactic networks of
the forms in question. Otherwise the claim of zero-derivation is simply linguistic hocuspocus.
Like English and other languages, Malay also has a number of lexemes whose function is
distinguished only by its distribution within an utterance in an informal register,8 such as:
(9)
Orang-nya
person-ANAPH
jalan
walk
di
jalan
LOC path
situ.
DIST.LOC
8Formal Malay would require derivational affixes such as ber-jalan for the verbal predicate use. One
could argue on the basis of formal Malay that there is simply affix ellipsis for informal Malay. But this
leaves at least two difficulties. First, what is the status of the unmarked base to which the verbal affixes
(e.g. ber-, meN--kan) attach in the first place? Secondly, how can we argue for the elision of affixes that
simply are not used in these contexts in informal Malay?
9: Parts of speech
163
\lx
\ps
\ge
\re
\de
\ps
\ge
\re
\de
jalan
v
go
go ; walk
go, walk
n
way
path ; trail ; road ; way
path, trail, road, way
Paiwan
Verbal construction
Nominal interpretation
k<
m>an
kan-
n
k<in>an
kan-an
si-kan
CAUTION: The point is, the interpretation of these constructions as nominal or verbal
164
primarily nominal, verbal, or whatever. Such bound roots, with reference to their form
class membership are sometimes called precategorials.9 For example in Buru:
(11)
tea
tea-k
tea-n
ep-tea
(12)
mae
mae-t
mae-n
mae-k
(13)
bidu
bidu-k
bidu-t
(involving a cast-net)
to cast a cast-net
a cast-net
In the last example above, one could argue either that 1) the nominal form uses /t/ to
derive the instrument that is characteristically used to perform the action of the verb, or
2) the verbal form uses /k/ to derive a verb that is characteristically done using the noun.
Both are legitimate explanations in the derivational paradigms of the language.
For an academic audience, precategorials can be handled as bound root lexemes (e.g.
mae, tea, bidu) with the surface derivations as subentries. But for a local audience
this option is often not possible, since these bound root morphemes do not constitute a
minimal possible utterance. For such an audience, one can work with the community to
choose one derivation as the citation form [5.4.4] with the other forms as minor senses,
or else list each surface derivation as a separate lexeme. See 4.6 for extensive discussion
with examples on how to organize lexical information in these two ways. One way of
handling precategorials in MDF is as follows:
9Adelaar (1985:223) defines precategorials for his study of Proto-Malayic as roots that do not occur in
isolation, that is, roots which only occur in derivations and in compounds. For Buru I expand the
definition to include reduplication. E.g. pani-n wing, p-e-pani HAVE wing (s.t. of which wing is the
most salient feature), but never *[pani-] by itself. Some languages have a number of inherently
reduplicated roots which never occur in the unreduplicated form. These roots could also be considered
precategorials.
9: Parts of speech
165
\lx
\ps
\ge
\re
\de
\se
\ps
\ge
\re
\de
\se
\ps
\ge
\re
\de
biduRt
cast_net
*
[No reversal]
cast-net
bidut
n
cast_net
cast-net ; net (for
casting)
cast-net
biduk
vi
cast_net
cast-net ; net (use by
casting)
use a cast-net
10S in Dixons system is the single argument of intransitive verbs. In a split-S system Actor and
Undergoer are encoded differently on intransitive verbshence the name split-S. Dixon (1979) does not
use Actor and Undergoer as primitives, but rather S, A, and O. We use the terms Actor and Undergoer
in the sense of the macroroles described by Foley and Van Valin (1984).
166
hit, kill, break, return); non-active verbs (Undergoer-oriented) are BE or BECOME type
verbs (e.g. dark, ripe, white, sick, hungry, big, small, die, good, bad).11
(14)
hit
him.
ran.
is sick him.
him is sick.
[Active transitive;
[Active intransitive;
[Non-active, postposed S;
[Non-active, preposed S;
A V
SA V
V
SU V
O
SU
]
]
]
]
Split-S systems are fairly widespread within the Austronesian world, for example in
Aceh, North Sumatra (Durie 1985), and in many languages in eastern Indonesia, such as
Selaru, a language of southern Tanimbar (Coward 1990), and Dobel in the Aru Islands (J.
Hughes, 1991). Buru is split-S in its verbal semantics, but shows an incipient switchreference system in its pronominal typology (C. Grimes 1991).
All split-S languages must minimally distinguish three types of verbs in the lexicon, not
just two, but dictionaries and wordlists published over the last century for split-S
languages in eastern Indonesia have failed to do so. For Buru Grimes abbreviates the
three types as vt (active transitive), vi (active intransitive), and vn (non-active verbs).
9.3.2.2 Intradirective or quasi-reflexive verbs
An additional verb type shows up in many Austronesian languages in eastern Indonesia
and the Pacific. Active intransitive verbs tend to be verbs of motion or posture, such as
11The Selaru data and primary analysis are from Coward (1990). Some of the terminology and split-S
framework reflect Grimes adaptation of Cowards material. We avoid the label stative-active that is
widespread in the general literature for these types of languages, because the non-active verbs are
typically ambiguous in their internal aspectual interpretation as imperfective (process) or perfective
(state). The label stative at this macro level is thus highly misleading (see discussion in C. Grimes
1991:93-108).
12Relational grammarians call the S type verbs unergative and the S type verbs unaccusative. While
A
U
there is nothing wrong with the terms for linguistic purposes, we do not recommend using these labels as
parts of speech categories in a published dictionary as they severely limit the audience of effective users.
9: Parts of speech
167
go, return, stand, sit, in which the person doing the action is also the one undergoing the
action (their location or position is changed). For example, in I go, I am volitionally doing
something that results in my location being changed. There is only one semantic referent,
but some languages mark some (or all) active intransitive verbs of this sort as
morphologically transitive. In some literature on Oceanic languages these are referred to
as intradirective, or reflexive verbs (see Pawley 1973), and in other areas of the world as
quasi-reflexive verbs.
(15)
pina
female
ona-te
big-NOM
ria
inland
manahane.
outside
i-sipu-i,
3s-descend-3sU
i-eu-i
3s-go-3sU
Buru (archaic)
Kae
2sA
oli-m
return-2sU
beka.
first
13Numerically indicated subclasses (e.g. Class I, Class II, Class III, etc.) seem to be very frustrating to
everybody except the linguist who assigned those labels. An alternative such as the actual affixes that
distinguish the subclasses in defined contexts broadens the audience of potential users (e.g. em-verbs,
eb-verbs, etc.). This kind of morphological subclass is conventionalized in Spanish dictionaries in the
citation form of verbs as -ar, -ir, or -er verbs.
168
Da
3s
ba
DUR
kaa
eat
He is eating cassava.
(18)
Da
3s
ba
DUR
(NP object)
kaa-h.
eat-it
He is eating it.
(19)
Da
3s
ba
DUR
He is eating.
mangkau.
cassava
(pronominal object)
kaa.
eat
(object suppression)
9: Parts of speech
169
Huma
house
di
DIST
em-kele.
STAT-tall
[predicative]
puna
do
huma
[house
em-kele.
STAT-tall]NP
He made a pile house. [Lit. a tall house]
[attributive]
Da
3s
ba
DUR
haa
big
hede.
still
[predicative]
haa-t.
big-NOM]NP
[attributive]
He is still growing.
Da
3s
puna
make
huma
[house
Kau
wood
di
DIST
beha.
heavy
[predicative]
wada
shoulder_carry
kau
[wood
beha-t.
heavy-NOM]NP
[attributive]
14Nouns can also modify other nouns in Buru NPs, but behave quite differently from verb-derived
170
(23)
Feten
millet
boti
white
mohede.
not_yet
[predicative]
ego
get
labu-n
[shirt-GEN
boti-t.
white-NOM]NP
[attributive]
To label what translates into an English adjective as adjective for these languages fails
to recognize the behavior of the lexemes as verbs in the greater morphosyntactic networks
of the language.
9.4 Summary of \ps issues
Indicating parts of speech in the lexicon has been traditionally useful. The task of doing
so is often straightforward and uncomplicated, but there are many potential pitfalls. The
lexicographer must continually refine notions about parts of speech categories in a
language and update the lexicon as understanding of the grammar increases. Parts of
speech categories should be adequately defined to fit the language and make the
dictionary a useful and productive tool.
9.5 Checking paradigms (\pd)
Some languages have obligatory indexing on the verb for one or more core arguments.
For many Austronesian languages in the string of islands east of Bali, consonant-initial
verb stems are not inflected for person and number of the subject, but vowel-initial stems
are. In some languages, however, the paradigms are not complete for all possible
combinations (also noted generally by Zgusta 1971:122). For example, in Tetun of central
and east Timor (Therik and Grimes 1992), some verbs take the complete paradigm and
others are only partialthe citation form of all these verbs is the hform. The
completeness of the paradigm can also vary across dialects.
Where there is inconsistency in the completeness or regularity of paradigms it is not
economical to indicate the complete paradigm for every verb, but only for those that
deviate from a norm. This information should be in the \pd or related fields. See 2.1.
9: Parts of speech
171
PERSON
1s
2s
3s
1px
1pi
2p
3p
Complete Paradigms
eat
bring
kaa
kodi
maa
modi
naa
nodi
haa
hodi
haa
hodi
haa
hodi
raa
rodi
Incomplete Paradigms
look
wait
karee
maree
mein
naree
nein
haree
hein
haree
hein
haree
hein
pass by
mosi
nosi
hosi
hosi
hosi
ku-dengar-kan
1s-hear -VAL
ku- dengar-kan
1SG-hear -VAL
kudengar-kan
1.PERS.SING-hear -VAL
One issue for abbreviations and glossing is choosing between informal and formal
strategies. An informal strategy seeks to use the nearest translation equivalents in the
target language (e.g. English) for both content words and functors. Thus, an Austronesian
third singular genitive enclitic na could be variously glossed as -its, -his, -hers, -the,
and a free pronoun such as aku could be glossed as I, me, my, mine. A more formal
strategy would seek to use consistent and possibly more technical terms for grammatical
functions, such as -3sG or 1s.
Simons and Versaw (1987:236) observe:
The formal style uses abbreviations of technical terms for grammatical functions.
These abbreviation are in all upper case letters and do not include a terminating
period. For instance, one might use HAB rather than always as the gloss for a
habitual verb aspect, or DEF rather than the in glossing a definite article. The
use of upper case abbreviations to gloss functor morphemes is a fairly recent
practice among linguists but it has gained widespread acceptance and can now be
172
Pronouns: Since most Austronesian languages have pronominal clitics of one or two
letters (e.g. ku, mu, ng, m, n/na, etc.) it becomes important to use the shortest
abbreviation possible for personal deixis that is not ambiguous. By using lower case for
number we are then free to attach upper case grammatical or semantic tags. We suggest:
1s
2s
3s
3sn
1d
2d
3d
1pi
1px
2p
3p
(s = singular)
(non-human)
(d = dual)
(i = inclusive)
(x = exclusive)
(p = plural)
1sS
2sO
3sG
/ 1sSBJ
/ 2sOBJ
/ 3sGEN
[subject]
[object]
[genitive]
1dP
2dH
3dA
1piA
1pxE
2pD
3pU
/
/
/
/
/
/
/
[possessive]
[honorific]
[actor]
[absolutive]
[ergative]
[dative]
[undergoer]
1dPOS
2dHON
3dACT
1piABS
1pxERG
2pDAT
3pUG
Other functors: For grammatical categories it is good to use upper case with no period
(full stop). A period is superfluous.
9: Parts of speech
173
(25)
DUR
HAB
CAUS
durative
habitual
causative
For portmanteau morphemes (other than pronominal ones already taken care of like
1sPOSS) it is common to use a period [.]. (Another convention encountered in the
literature for portmanteau morphemes is the use of a colon [:], but a period is more
common).
(26)
PRES.PROG
PST.PRF
present progressive
past perfect
Hyphen [-] is, of course, a standard abbreviation for morpheme breaks. Among linguists
in general there is inconsistent use of plus [+] and equals [=] in which the symbols
sometimes appear to be there for principled reasons, and sometimes not. Plus [+] often
indicates a grammatical clitic that is phonologically bound. Equals [=] is used to indicate
reduplication of morphologically complex units (e.g. ep-tilo=ep-tilo).15
Standard non-technical abbreviations, of course, should be lower case. Some publishers
do not allow them to be used at all: etc., arch., cf., alt., e.g.
Source languages for loans, if we abbreviate them at all, should be the minimum
necessary for our purposes, using conventions widely accepted. E.g. Eng. rather than
Engl., Port. rather than Portug., Skt. rather than Sans.
Indefinite terms should follow a single pattern. We suggest (although the periods take up
extra space):
(27)
s.t.
s.o.
k.o.
something
someone
kind of
There will be some overlap for which choices have to be made. Some sort themselves out
along the lines suggested above.
(28)
gen.
GEN
(29)
COMP
completed/completive/
complement/complementizer?
continuative/contemplated/
contiguous?
CONT
174
9: Parts of speech
175
176
semantic domains in detail (e.g. kin terms, plants, cultivated plants, birds, fish), and to
publish a series of separate volumes on each of these topical domains along the way
toward publication of the complete dictionary. (See 6.4 for a discussion with
examples of the \sd, \th, and \is fields.)
This alternative strategy allows the compilers to foster and incorporate community
involvement along the way, and develop a community of readers who have a growing
ability to use reference-type materials. Furthermore, these topically oriented volumes feed
useful information to interested scholars, and demonstrate progress and competent work
to government officials and sponsoring agenciesthat is, if these are not also hasty
dumps.
All primary work should be done in the main lexical database. If the information is
flagged consistently, at the appropriate time one can extract the selected information into
a separate database for processing through MDF by using the SHOEBOX FILTERS. For
example, to extract kin terms and terms related to social structure (clan head, village
head, etc.) one could use the following SHOEBOX FILTER (two examples are givenone
simple and the other more complex):
177
2)
3)
Each topic covered in the introduction should be relevant to the dictionary and should be
expressed concisely. Elaboration of the information found in the introduction to the
dictionary should be included in a separate comprehensive grammar, an ethnography, and
perhaps a history. The relative ordering of presentation of various issues should involve
some creative thinking as to what information is more helpful.
If the dictionary is intended for publication in a linguistic journal, we recommend
contacting the editorial board as to their formatting and organizational requirements.
More specifically, the introduction should address the following:
1)
Identify the primary audience and purpose for the dictionary. Also explain the
overall organization of the dictionary information (e.g. give the ordering of the
alphabet for the language). Give a total number of entries for the main dictionary
and for the finderlist.
2)
Briefly describe the location of the language, the number of people in the ethnic
group, the number of speakers, and the regional context in which the language
group is located.
3)
178
4)
Provide a brief discussion about the language name and alternate names for the
language if this is a relevant issue.
5)
6)
7)
Provide a brief sociolinguistic profile, including the dialects, the social registers,
the patterns of lexical taboo, different speech patterns across genders or ages, or
educated speaker usage, or whatever else will assist the users of the dictionary to
get a dynamic view of the language and correctly interpret the \ue (usage), \va
(variant), \oe (restrictions), \lf SynD (dialectal synonym), \lf SynR (register
synonym), \lf SynT (taboo synonym), and \lf SynL (assimilated loan synonym)
fields.
8)
Provide maps in the introduction placing the language in its regional context, and a
dialect map to help the reader understand the information on dialectal variants. It is
surprising how many dictionaries of lesser known languages do not provide even a
simple context map.
9)
179
181
182
English
Label
National Language
Label
\an
\bb
\bw
\ce
\cf
\cn
\cr
\de
\dn
\dr
\dt
\dv
\ec
\ee
\eg
\en
\er
\es
\et
\ev
\ge
\gn
\gr
\gv
\hm
\is
\lc
\le
\lf
\ln
\lr
Ant:
Read:
From:
Lawan:
Baca:
Pinjaman:
See:
Lihatlah:
antonym
bibliographical ref. for further reading
borrowed word (loan)
cross-reference gloss (English)
cross-reference
cross-reference gloss (national lang.)
cross-reference gloss (regional lang.)
definition/explication (English)
definition/explication (national lang.)
definition/explication (regional lang.)
date (entry last worked on)
definition/explication (vernacular)
etymology comment
encyclopedic information (English)
etymology gloss
encyclopedic info. (National lang.)
encyclopedic info. (Regional lang.)
etymology source
etymology (proto form)
encyclopedic info. (vernacular)
gloss (English)
gloss (national language)
gloss (regional language)
gloss (vernacular)
homonym/homophone
index of semantics
citation form (lexical citation)
gloss of \lf (English)
lexical functions
gloss of \lf (national language)
gloss of \lf (regional language)
[Regnl: ...]
[Melayu: ...]
[...]
Etym:
Asal:
(supplanted by \de)
(supplanted by \dn)
[Regnl: ...]
[Melayu: ...]
(subscripted)
Semantics:
Kelompok:
(various, see 7)
183
Field
Codes
\lt
\lx
\mn
\mr
\na
\nd
\ng
\np
\nq
\ns
\nt
\oe
\on
\or
\ov
\pc
\pd
\ph
\pl
\pn
\ps
\rd
\re
\rf
\rn
\rr
\sc
\sd
\se
\sg
\sn
\so
\st
\sy
\tb
Function
literally
lexeme (headword/lemma)
main entry form
morphology
notes (anthropology)
notes (discourse)
notes (grammar)
notes (phonology)
notes (questions for investigation)
notes (sociolinguistics)
notes (general)
only/restrictions (English)
only/restrictions (national language)
only/restrictions (regional language)
only/restrictions (vernacular)
picture [or graphic link]
paradigm
phonetic form (pronunciation)
plural form
part of speech (national language)
part of speech
reduplication form(s)
reversal (English)
reference to written source
(text or data notebook)
reversal (national language)
reversal (regional language)
scientific name
semantic domain
subentry
singular form
sense number
source
status (for editing or printing)
synonym
table (chart)
English
Label
Lit: ...
National Language
Label
Lit: ...
Lihatlah kata induk:
Morf:
[Antro: ...]
[Wacana: ...]
[Tata: ...]
[Fono: ...]
[Tanya: ...]
[Sosio: ...]
[Cat: ...]
Terbatas:
[...]
VerRestrict:
Pola:
Jamak:
Redup:
Redup:
re:1
re:
Ref:
rn:
rr:[Regnl: ...]
rn:
rr:[Melayu: ...]
your text
Golongan:
SD:
Sg:
)
[Source: ...]
[Status: ...]
Syn:
Tunggal:
)
[Dari: ...]
Searti:
1The reverse fields and word-level gloss fields are not designed for printing, but these labels are given so
that if the user wants to print these fields, they can be differentiated from the rest of the information in
the entry.
184
Field
Codes
\th
\ue
\un
\ur
\uv
\va
\ve
\vn
\vr
\we
\wn
\wr
\xe
\xg
\xn
\xr
\xv
\1d
\1e
\1i
\1p
\1s
\2d
\2p
\2s
\3d
\3p
\3s
\4d
\4p
\4s
Function
thesaurus
usage (English)
usage (national language)
usage (regional language)
usage (vernacular)
variant forms
variant (English gloss or comment)
variant (national language)
variant (regional language)
word-level gloss (English)
word-level gloss (national language)
word-level gloss (regional language)
example (English free translation)
example (gloss for interlinearizing)
example (national lang. free trans.)
example (regional lang. free trans.)
example (vernacular)
first person dual inflection
first person plural exclusive
first person plural inclusive
first person plural
first person singular
second person dual inflection
second person plural
second person singular
third person dual inflection
third person plural
third person singular
non-human or non-animate dual
non-human or non-animate plural
non-human or non-animate singular
English
Label
Thes:
Usage:
VerUsage:
Variant:
(...)
National Language
Label
Keluarga:
Kegunaan:
[...]
VerUsage:
Bentuk lain:
(...)
(...)
we:
wn:
wr:[Regnl: ...]
we:
wn:
wr:[Melayu: ...]
[...]
1d:
1d:
1px:
1j:
1pi:
1j:
1p:
1j:
1s:
1t:
2d:
2d:
2p:
2j:
2s:
2t:
3d:
3d:
3p:
3j:
3s:
3t:
3dn:
3dn:
3pn:
3jn:
3sn:
3tn:
185
186
Function
lexeme
homonym number
lexical citation
phonetic
\se
\ps
\pn
\sn
subentry
part of speech
part of speech-national language
sense number
\gv
\dv
\ge
gloss-vernacular
definition-vernacular
gloss-English
(supplanted by a \de)
English
National Language
Label
Label
(subscripted)
187
Field
Codes
\re
\we
\de
\gn
\rn
\wn
\dn
\gr
\rr
\wr
\dr
Function
English
Label
National Language
Label
reverse-English
word level gloss-English
definition-English
gloss-national language
reverse-national language
word level gloss-national language
definition-national language
gloss-regional lang. (with \gn)
reverse-regional lang. (with \rn)
word-level gloss-regional (with \wn)
definition-regional lang. (with \dn)
re:1
we:
re:
we:
rn:
wn:
rn:
wn:
[Regnl: ]
rr:[Regnl: ]
wr:[Regnl: ]
[Regnl: ]
[Melayu: ]
rr:[Melayu: ]
wr:[Melayu: ]
[Melayu: ]
\lt
\sc
literal meaning
scientific name
Lit: ...
Lit: ...
(no label, but text as underlined italics)
\rf
\xv
\xe
\xn
\xr
\xg
[...]
***(not supported by MDF)***
\uv
\ue
\un
\ur
\ev
\ee
\en
\er
\ov
\oe
\on
\or
usage-vernacular
usage-English
usage-national language
usage-regional (combines with \un)
encyclopedic-vernacular
encyclopedic-English
encyclopedic-national language
encyclopedic-regional language
only (restrictions)-vernacular
only (restrictions)-English
only (restrictions)-national language
only (restrictions)-regional (with \on)
VerUsage:
Usage:
(supplanted by a \dn)
VerRestrict:
Restrict:
VerUsage:
Kegunaan:
[...]
[ ] (brackets only)
VerRestrict:
Terbatas:
[]
1The reverse fields and word-level gloss fields are not designed for printing, but these labels are given so
that if the user wants to print these fields, they can be differentiated from the rest of the information in
the entry.
188
Field
Codes
\lf
\le
\ln
\lr
\sy
\an
\mr
\cf
\ce
\cn
\cr
\mn
\va
\ve
\vn
\vr
lexical function
lexical function-English
lexical function-national language
lexical function-regional language
synonym
antonym
morphemic representation
cross-reference
cross-reference-English gloss
cross-reference-national gloss
cross-reference-regional gloss
main entry form
variant form
variant comment-English
variant comment-national language
variant comment-regional language
English
National Language
Label
Label
(\lf label, e.g. Spec, becomes the label)
(combines with \lf)
(combines with \lf)
(combines with \lf)
Syn:
Searti:
Ant:
Lawan:
Morph:
Morf:
See:
Lihatlah:
(combines with \cf)
(combines with \cf)
(combines with \cf)
See main entry: Lihatlah kata induk:
Variant:
Bentuk lain:
(...)
(...)
(...)
\bw
\et
\eg
\es
\ec
\pd
\sg
\pl
\rd
\1s
\2s
\3s
\4s
\1d
\2d
\3d
\4d
\1p
\1e
\1i
\2p
\3p
\4p
borrowed word
etymology
etymology-gloss
etymology-source
etymology-comment
paradigm
singular form
plural form
reduplication
1st person singular
2nd person singular
3rd person singular
singular non-human/non-animate
1st person dual
2nd person dual
3rd person dual
dual non-human/non-animate
1st person plural-general
1st person plural-exclusive
1st person plural-inclusive
2nd person plural
3rd person plural
plural non-human/non-animate
From:
Pinjaman:
Etym:
Asal:
(combines with \et)
(combines with \et)
(combines with \et)
Prdm:
Pola:
Sg:
Tunggal:
Pl:
Jamak:
Redup:
Redup:
1s:
1t:
2s:
2t:
3s:
3t:
3sn:
3tn:
1d:
1d:
2d:
2d:
3d:
3d:
3dn:
3dn:
1p:
1j:
1px:
1j:
1pi:
1j:
2p:
2j:
3p:
3j:
3pn:
3jn:
Function
189
\tb
table
\sd
\is
\th
semantic domain
index of semantics
thesaurus
SD:
Semantics:
Thes:
\bb
\pc
bibliographic reference
picture
Read:
Baca:
(...) (parentheses, or a graphic link)
\nt
\np
\ng
\nd
\na
\ns
\nq
\so
\st
\dt
notes-general
notes-phonology
notes-grammar
notes-discourse
notes-anthropology
notes-sociolinguistics
notes-questions
source
status
datestamp (a SHOEBOX field)
[Note: ]
[Cat: ]
[Phon: ]
[Fono: ]
[Gram: ]
[Tata: ]
[Disc: ]
[Wacana: ]
[Anth: ]
[Antro: ]
[Socio: ]
[Sosio: ]
[Ques: ]
[Tanya: ]
[Source: ]
[Dari: ]
[Status: ] (only one for all languages)
Golongan:
Kelompok:
Keluarga:
190
agriculture
animal
boat related
body part
material culture
fish related
food related
government
house related
insect
instrument
kinship
locative noun
nature/meteorological
part of a larger whole
plant
noun of result
ritual
sickness/medicine
social relations (non-kin)
time
Vaffect
Vagri
Vbody
Vcarry
Vcog
Vcolor
Vcut
1For a detailed discussion of many of the verbal subtypes for English see Dixon (1991). His appendix
191
Veffect
Vemot
Vevent
Vexchange
Vhit
Vhold
Vhunt
Vmotion
Vposture
Vrit
Vsee
Vsize
Vsocial
Vspeak
Vspeed
Vtouch
Vvalue
Vweath
Vweight
verb of effect
verb expressing emotion
verb naming or characterizing a whole event
verb of exchange (give, receive, take, get)
hitting verb
holding verb
hunting related
verb of locomotion
verb of posture or rest
verb describing ritual
verb of perception
verb of dimension
verb expressing social relationship
speech-act verb
verb of speed
touching verb
verb expressing value
weather verbs (rain, fog)
verb expressing weight
ADJage
ADJbodily
ADJcol
ADJemot
ADJphys
ADJsize
ADJspeed
ADJtext
ADJval
age
bodily function
color adjective
emotion/human propensity
physical property (hard, clean, hot)
size/dimension
speed
texture
value (good, bad, nice)
See cautions about distinguishing between verbs and adjectives in chapter 9. See Dixon
(1991) for more ideas.
Variations of the above information can be chosen according to the aesthetics of the
compiler. Some alternate possibilities are as follows:
Option 1
Nagri
Nbody
Vcarry
Vcut
ADJsize
ADJspeed
192
Option 2
nAgri
nBody
vCarry
vCut
adjSize
adjSpeed
Option 3
Agriculture
Body
Carry
Cut
Size
Speed
Making dictionaries: a guide to lexicography and MDF
Antonym
Causal
Lexicalized compound using headword not easily handled by other
lexical functions
Counterpart (complement, conversive)
Degraded degree or state
Feeling or sensation associated with headword
Generic
Collective/group
Head or leader of group
Idiom
Material used to make headword
Superlative degree of headword
Diminished degree of headword
Actor noun
Benefactee noun
Deverbal noun
Instrumental noun
Goal of action
Locative noun
Undergoer noun
Parallelism representing Same as headword
Parallelism representing Different end of scale
Part of headword
Phase of headword
Preparatory activity
Consequence or resulting state
Conventionalized serial verb combination not clearly handled by other
lexical functions
Similar type at same level of hierarchy
Situation or activity typically associated with headword
Sound associated with headword
Specific (kind of, type of, species)
Beginning phase of headword (inceptive)
193
Stop
Syn
SynD
SynL
SynR
SynT
Unit
Vwhole
Whole
194
MDL
Modal
NEG
NEGimp
NOM
NOMR
n
NUM
Negative
Negative imperative
Nominative
Nominalizer
Noun
Number
Particle
Participle
Pause word
Plural
Possessive
Possessor
Postposition
Preposition
Pronoun/pronominal
Proper noun
CLASS
CMPAR
CMPLR
CNJ
COND
CONF
CONN
COP
Classifier
Comparative
Complementizer
Conjunction
Conditional
Confirmative
Connective
Copula
PTCL
PART
PAUS
PL
POSS/P
POSSR
POST
PREP
PRO
PropN
DECL
Declarative
Query/Question/Interrogative
DEIC
DEM
DIR
QNT
Quantifier
EVID
EXASP
EXIST
Evidential
Exasperative
Existential
REC
REL
RFLX
RLR
Reciprocal
Relative(izer)
Reflexive
Relater
1For an alternative list and framework for organizing lexical data, see the SHOEBOX manual.
2Not a brand of computer.
Appendix E: Abbreviations
195
FOC
Focus marker
HORT
Hortative
ID
IMP
INTJ
INT/Q
ITR
Idiom
Imperative
Interjection
Interrogative
Intransitive(izer)
LIG
LOC
Ligature
Locative
CAUS
CESS
CIRC
Tense-Aspect-Mood
Time expression
Tense
Transitive(izer)
v
vi
vm
vt
vt/i
Verb/verbal
Intransitive verb
Middle verb
(non-agentive passive)
Non-active verb
Passive verb (agentive)
Reflexive/quasi-reflexive
/intradirective
Transitive verb
Ambitransitive verb
HON/H
HUM
i.e.
IMM
IMPRF
IMPRS
INAL
INAN
i/INC
INCEP
INCHO
INDEF
INF
INST
IO
IRR
IT
Honorific
Human
that is
Immediate
Imperfective
Impersonal
Inalienable
Inanimate
Inclusive (1pi)
Inceptive
Inchoative
Indefinite
Infinitive
Instrumental
Indirect Object
Irrealis
Iterative
JUSS
Jussive
k.o.
kind of
vn
vp
vr
TAM
TIME
TNS
TR
Benefactive
Causative
Cessative
Circumstantial
196
COLL
COM
COMP
CONC
CONT
Collective
Comitative
Completive
Concessive
Continuative
Lit.
Literally
MAN
M/masc.
MOD
Manner
Masculine (1sM)
Modifier
DAT
DEF
DER
DES
DIM
DIST
DISTB
DO
DUB
DS
DUR
Dative
Definite
Derivational
Desiderative
Diminutive
Distal
Distributive
Direct Object
Dubitative
Different Subject
Durative
NARR
NEC
NFUT
NHUM
Narrative
Necessity
Non-future
Non-human
O/OBJ
OBL
obs.
opp.
OPT
Object (3sO)
Oblique
Obsolete
Opposite
Optative
e.g.
EMPH
ERG
etc.
e/EXC
EXCLM
for example
Emphatic
Ergative
etcetera
Exclusive (1pe)
Exclamatory
PAT/P
PTT
PASS
PAST
PRF
PERS
PIV
PRES
PROG
PROX
PURP
Patient
Partitive
Passive
Past
Perfective
Personal
Pivot
Present
Progressive
Proximal
Purpose
FACT
F/fem.
FIG
FREQ
FUT
Factitive
Feminine (3sF)
Figurative
Frequentative
Future
QUOT
Quotative
GEN/G
GER
Genitive (1sG)
Gerund(ive)
REAL/R
RED
REF
Realis
Reduplication
Referential/Term of reference
HAB
Habitual
RES
Resultative
sp.
spp.
s.o.
Species
Species (plural)
Someone
REM
REP
TEMP
TOP
TOPR
Remote
Repetitive
Temporal
Topic
Topicalizer
U / UG
Undergoer
Appendix E: Abbreviations
197
s.t.
S/SUBJ
SPEC
SS
STAT
SBJV
SUP
Something
Subject (2sS)
Specific
Same subject
Stative
Subjunctive
Superlative
viz.
VOC
VOL
VP
vs.
namely
Vocative
Volitional
Verb Phrase
versus
Kinship:
B
C
D
e
F
(f.s.)
H
brother
child
daughter
elder
father
female speaking
husband
M
(m.s.)
P
S
W
y
Z
mother
male speaking
parent
son
wife
younger
sister
[This system allows combinations such as WBW wifes brothers wife, MB mothers
brother, eB(f.s.) elder brother (female speaking). These abbreviations are useful for
short interlinear glosses.]
Loan sources:
AM
Ar.
Bug.
Btn.
Du.
Eng.
Fr.
Ger.
Ind.
Jap.
Ambonese Malay
Arabic
Bugis
Butonese (generic)
Dutch
English
French
German
Indonesian
Japanese
Jav.
KM
Mak.
Mly
Port.
Skt.
SM
Sp.
Sw.
TM
Javanese
Kupang Malay
Makassar
Malay
Portuguese
Sanskrit
Standard Malay
Spanish
Swahili
Ternate Malay
Conventions:
*
**
[...]
/
.
=
~
198
199
sure to copy your original lexical database to floppies for safekeeping before going any
further.)
CAUTION: The CC table, UPDATE.CCT, assumes that your original lexical database
followed the guidelines included with the 0.9x versions of MDF. DO NOT use this CC
table on your database if it does not conform to the older 0.9x standards!
To use UPDATE.CCT type CC at the DOS prompt (what you type is bold):
C:\MDF>cc<ENTER>
The Consistent Changes program will display:
Consistent Change 7.4, 15May90 Copyright 19871990 SIL Inc.
Changes File? update.cct
Output File? newlex.db
Input File?
lexicon.db
(if that is the original name)
Next input file (<RETURN> if no more)?
(Press the <ENTER> key)
When you are asked for the input filename, give the name (and path, if needed) of your
lexical database. CC will not alter your lexical database in any way. Just be sure you dont
give the original lexical database name as the output filename. You can destroy your data
that way!
The output file should now be just like your original database, except that it has the updated
field markers and the new character formatting codes. But do not delete your original lexical
database until you are sure that the new file is accurate (a directory listing should show the
new file somewhat larger than the original). Also, be sure to tell SHOEBOX of the new
filename.
200
The following is a table depicting most of the changes that have been implemented in MDF
v.1.0. On the left are the old field markers while on the right are the replacements.
\le
\gi
\ri
\wi
\di
\xi
\ui
\gm
\rm
\wm
\dm
\xm
\um
>
>
>
>
>
>
>
>
>
>
>
>
>
\lx
\gn
\rn
\wn
\dn
\xn
\un
\gr
\rr
\wr
\dr
\xr
\ur
For the sake of consistency \en ethnographic notes has been combined with \na notes
anthropology, and \sl sociolinguistic notes has been renamed to \ns notes
sociolinguistics.
\en
\sl
>
>
\na
\ns
(notesanthropology)
(notessociolinguistics)
The earlier versions of MDF gave only one field marker for lexical relations (\lr). This was
recognized as inadequate, but at earlier stages of MDF development it was unclear as to how
people were encoding lexical relations on the computer. Grimes has documented how he and
others are using this system (see chapter 7 and C. Grimes 1987, 1994), and has suggested the
following field codes:
\lf
\le
\ln
\lr
(lexical function)
(lexical function glossEnglish)
(lexical function glossnational language)
(lexical function glossregional language)
The term lexical relations was changed to lexical functions to align it with the wider
literature and to allow \lr to consistently refer to regional glossing. Note that \le (the old
KEY field marker) is now used for English glossing of \lf. Be sure to convert all of your
old key field markers to \lx before implementing this feature. (If you use UPDATE.CCT
to convert your database, this is handled for you.)
The following gives an example of how lexical function field bundles are used.
201
\sy mlay
becomes
\lf Syn = mlay
\le
\ln
The user can then go through and fill in the \le and \ln fields at a later time (or leave them
blank if preferable).
By analogy the \cr cross-reference field has been converted to the following field
bundle:
\cf
\ce
\cn
\cr
202
(cross-reference)
(cross-reference glossEnglish)
(cross-reference glossnational language)
(cross-reference glossregional language)
There can be more than one bundle per entry, subentry, or sense. (Note that the bundles need
not use all of the fields.) UPDATE.CCT inserts a blank \ce and \cn field for every reference
in an old \cr field. For example:
-kw
my
-mw
your
-na
his
203
The use of etymology in the old MDF documentation was weak. It really addressed loan or
borrowed words rather than proto forms (which is what one would expect \et to refer to). So
the old \et has become \bw borrowed word.
(etymology)
(etymologygloss)
(etymologysource)
(etymologycomment)
For example:
\et *tebel
\eg thick (dimension)
\es PANDW
\ec metathesis?
By default, this bundle will print out as:
Etym: *tebel thick (dimension).
But if you request to include the \es and \ec fields through the Change Settings menu
option, it will print out as:
Etym: *tebel thick (dimension) PANDW (metathesis?).
Do not forget to include the * in the \et field. Also, UPDATE.CCT will insert a blank
gloss (\eg) field for each old \pf it converts to \et.
MDF will now format the \ph phonetic field with square brackets, so that:
\ph apa
will print as:
[apa]
The font associated with the data in the \ph field is determined by the PH style in the
MDFDICT.STY stylesheet. So, by changing the stylesheet, you can use a phonetic font for
204
this field. (The square brackets are not included in this PH stylethey are formatted with the
standard font.) \ph can be used in relation to both \lx (lexeme) and \se (subentry).
We have added encyclopedic fields for those who want their lexicon to be more of a cultural
knowledge base. These fields are:
\ev
\ee
\en
\er
(encyclopedicvernacular)
(encyclopedicEnglish)
(encyclopedicnational)
(encyclopedicregional)
These are printed with no label (though the regional language field will be bracketed with
square brackets).
The Usage fields (\ue, \un, and \ur) now have a vernacular counterpart, \uv, for
monolingual dictionaries. The vernacular field is labeled as VerUsage:
Only fields (\ov, \oe, \on, and \or) have been added to denote semantic or grammatical
restrictions pertinent to the headword. This field is given the label Restrict:
A \mr morphemic representation field has been added to provide a morpheme-bymorpheme breakdown of polymorphemic lexemes. This field is given the label Morph:
A \lt literal field has been added for clarifying the literal meaning of idioms, etc. This field
is given the label Lit: It also adds single quotes around the meaning.
A \bb bibliography field has been added for recording bibliographical references to where
the lexeme is treated at greater length (grammatically or ethnographically). This field is given
the label Read:
A \pn part of speechnational field has been added to allow for specifying the part of
speech using labels found in national language dictionaries. MDF requires that the \pn field
follow the \ps field:
\ps n
\pn kb
(noun)
(the national abbreviation for noun)
If the order is reversed, MDF will not function properly. MDF will format the \pn field only
if you specify that the output is for a national audience. When a national audience is
specified, the \pn field will replace the \ps field. But if there is no \pn field or if it is empty,
the \ps field will be output for the national audience as for an English audience.
In the conjugation form fields, the glaring oversight of not including first-person inclusive
and exclusive fields is corrected. These are \1i and \1e, respectively. The field marker \1p is
still retained for those who work with languages that do not make this distinction. Also,
dual verb forms are now supported with \1d, \2d, \3d, \4d (non-animate, non-human).
205
The \vg vulgar field is no longer supported (it didnt work right, and it was too limited in
function). We are suggesting that the \ue usageEnglish or \st status fields could be
used for encoding this type of information. UPDATE.CCT converts the \vg field to \ue
Vulgar. If you wish to discard any vulgar entry, subentry, or sense from a printed copy, first
format the dictionary normally, and then use SEARCH (or EDIT FIND) to locate Vulgar. This
will allow you to delete them out of the final copy. (You will be able to do this more
accurately than with the old MDF program.)
for vernacular
for English
for national
for regional
(from lv:)
(from le:)
(from li:)
(from lm:)
These fonts are supported as character styles in the stylesheet, so they can be modified at any
time. The standard font is used in MDF for formatting most information fields (\rf, \lt, \pd,
\lf, \is, \th, \sd, \bw, \et, and \cf), as well as for punctuation. The labels used in MDF to
mark the different fields (like the See: for the cross-reference field) are all encoded with the
FL style (mnemonic for fontlabel). With this style, you can change all labels in your
dictionary to a different point size or font in one quick step.
Specifying underlined characters is now:
uc:
ui:
206
(from un:)
(from us:)
DOC
DOC
STY
BAT
ICO
ICO
ANS
CCT
CTW
GLY
GLY
GLY
ANS
CCT
SAV
CCT
CCT
ANS
CCT
SAV
CCT
CCT
CCT
CCT
CCT
CCT
SAV
CCT
SAV
CCT
CCT
SAV
CCT
STY
STY
STY
STY
STY
STY
STY
(on-line Overview)
(for Overview)
(the MDF program)
(an icon you can use in Windows)
(an icon you can use in Windows)
(creates the formatted dictionary)
(creates the formatted dictionary)
(creates the formatted dictionary)
(creates formatted dict. for WORD v5.0)
(creates formatted dict. for WORD v5.5)
(creates formatted dict. for WORD v6.0)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(creates the finderlist)
(part of the MDF settings file)
(part of the MDF settings file)
(part of the MDF settings file)
(part of the MDF settings file)
(part of the MDF settings file)
(part of the MDF settings file)
(part of the MDF settings file)
(part of the MDF settings file)
(part of the MDF settings file)
(MDF stylesheet for dict. and lists)
(changing columns for dict. and lists)
(printing MDF output on HP 4L)
(printing MDF output on HP 4L)
(printing MDF output on HP Deskjet)
(printing MDF output on Toshiba 321SL)
(printing MDF output on Epson LQ series)
207
EXE
EXE
EXE
EXE
EXE
COM
EXE
GLY
TMP
TMP
DOC
OUT
SRT
TMP
DOC
MRG
REV
SRT
TMP
DOC
MRG
REV
SRT
TMP
(for WINWORD)
(for WINWORD)
(for WINWORD)
(for WINWORD)
(for WINWORD)
(for WINWORD)
208
MDFSAMPL
MDFSAMPL
MDFSAMPL
LXFIELDS
02-START
DB
DOC
ENG
DB
DOC
HP4
UPDATE
SAGO
ANSQ
STY
CCT
PCX
EXE
209
210
211
212
Bibliography
Adelaar, A. K. 1985. Proto-Malayic: the reconstruction of its phonology and parts of its lexicon
and morphology. Ph.D. dissertation. Rijksuniversiteit te Leiden. (Published 1992 as
Pacific Linguistics C119.)
Apresyan, Yu., Igor Melchuk, and A. K. Zholkovsky. 1970. Semantics and lexicography:
towards a new type of unilingual dictionary. In Ferenc Kiefer (ed). Studies in Syntax and
Semantics. Foundations of Language Supplemental Series 10:133. Dordrecht: D. Reidel.
, , and . 1973. Materials for an explanatory combinatory dictionary of
modern Russian. In Ferenc Kiefer (ed). Trends in Soviet theoretical linguistics.
Foundations of Language Supplemental Series 18:411438. Dordrecht: D. Reidel.
Bartholomew, Doris A. and Louise C. Schoenhals. 1983. Bilingual dictionaries for indigenous
languages. Mexico, D.F.: SIL International.
Beekman, John. 1968. Eliciting vocabulary, meaning, and collocation. Notes on Translation
29:111. Dallas: SIL International. (Reprinted in Alan Healey (ed). 1975. Language
learners field guide. Ukarumpa: SIL International. pp. 361388).
Benson, Morton, Evelyn Benson, and Robert Ilson. 1986. Lexicographic description of English.
Philadelphia: John Benjamins.
Benson, Morton, Evelyn Benson, and Robert Ilson, compilers. 1986. The BBI combinatory
dictionary of English: a guide to word combinations. Philadelphia: John Benjamins.
Berlin, Brent, Dennis E. Breedlove, and Peter H. Raven. 1966. Folk taxonomies and biological
classification. Science 154:273275.
, , and . 1973. General principles of classification and nomenclature in folk
biology. American Anthropologist 75:214242.
, , and . 1974. Principles of Tzeltal plant classification: an introduction to
the botanical ethnography of a Mayan-speaking people of highland Chiapas. New York:
Academic Press.
Bolton, Rosemary. 1990. A preliminary description of Nuaulu phonology and grammar. M.A.
thesis, University of Texas at Arlington.
Bright, William. 1984. The editors department. Language 60:692693.
Bulmer, Ralph. 1967. Why is the cassowary not a bird? A problem of zoological taxonomy
among the Karam of the New Guinea Highlands. Man 2:525.
Bibliography
213
. 1970. Which came first, the chicken or the egg-head? In J. Pouillon and P. Miranda
(eds). changes et communications: mlanges offert Claude Lvi-Strauss a loccasion
de son 60ime anniversaire. Paris: Mouton 1970. pp. 10691091.
Burchfield, R. W. (ed). 1987. Studies in lexicography. Oxford: Clarendon Press.
Carter, Ronald. 1987. Vocabulary: applied linguistic perspectives. London: Allen & Unwin.
Casagrande, Joseph B. and Kenneth Hale. 1967. Semantic relationships in Papago folkdefinitions. In Dell Hymes and William Bittle (eds). Studies in southwestern
ethnolinguistics. The Hague: Mouton and Co. pp. 165193.
Clark, Eve V. and Herbert H. Clark. 1979. When nouns surface as verbs. Language 55/4:767
811.
Clynes, Adrian. 1989. Speech styles in Javanese and Balinese. M.A. thesis, Australian National
University.
Comrie, Bernard. 1981. Language universals and linguistic typology. Oxford: Blackwell.
Comrie, Bernard and Norval Smith. 1977. Lingua descriptive studies: questionnaire. Lingua
42:172.
Conklin, Harold. 1962. Lexicographical treatment of folk taxonomies. In Fred W. Householder
and Sol Saporta (eds). Problems in lexicography. pp. 119141.
Coward, David F. 1990. An introduction to the grammar of Selaru. M.A. thesis, University of
Texas at Arlington.
. 1992ms. Recommended Maluku lexical database standards. Ambon: SIL International.
Crystal, David. 1985. A dictionary of linguistics and phonetics. 2nd edition. Oxford: Basil
Blackwell.
Davis, Daniel W. and John S. Wimbish. 1993. The Linguists SHOEBOX. Waxhaw: SIL
International.
Dixon, R. M. W. 1979. Ergativity. Language 55:59138.
. 1982. Where have all the adjectives gone? and other essays in semantics and syntax.
Amsterdam: Mouton.
. 1988. A grammar of Boumaa Fijian. Chicago: University of Chicago Press.
. 1991. A new approach to English grammar, on semantic principles. Oxford: Clarendon
Press.
. 1994. Ergativity. Cambridge Studies in Linguistics 69. Cambridge: University Press.
214
Durie, Mark. 1985. A grammar of Achehnese: on the basis of a dialect of north Aceh.
Verhandelingen van het Koninklijk Instituut voor Taal, Land en Volkenkunde 112.
Cinnaminson, N.J.: Foris Publications.
Ferrell, Raleigh. 1982. Paiwan Dictionary. Pacific Linguistics C73.
Fillmore, Charles J. 1968. Lexical entries for verbs. Foundations of Language 4:373393.
Foley, William A. and Robert D. van Valin, Jr. 1984. Functional syntax and universal grammar.
Cambridge Studies in Linguistics 38. Cambridge: University Press.
Fox, James J. 1971. Semantic parallelism in Rotinese ritual language. Bijdragen tot de Taal,
Land en Volkenkunde 127:215255.
. 1974. Our ancestors spoke in pairs: Rotinese views of language, dialect, and code. In
Richard Bauman and Joel Scherzer (eds). Explorations in the ethnography of speaking.
Cambridge: University Press. pp. 6585.
. 1975. On binary categories and primary symbols: some Rotinese perspectives. In R.
Willis (ed). The interpretation of symbolism. ASA Studies 3:99132. London: Malaby
Press.
. 1977. Roman Jakobson and the comparative study of parallelism. In C. H. van
Schooneveld and D. Armstrong (eds). Roman Jakobson: echoes of his scholarship. Lisse:
Peter de Ridder Press. pp. 5990.
. 1982. The Rotinese chotbah as a linguistic performance. Pacific Linguistics C76:311
318.
. 1988. Introduction. In James J. Fox (ed). To speak in pairs: essays on the ritual
languages of eastern Indonesia. Cambridge: University Press. pp. 128.
Fox, James J. (ed). 1988. To speak in pairs: essays on the ritual languages of eastern Indonesia.
Cambridge: University Press.
Frake, Charles O. 1962. The ethnographic study of cognitive systems. In Anthropology and
human behavior. Washington D.C.: Anthropological Society of Washington. pp. 2841.
Franklin, Karl. 1992. Lexicography considerations for Tok Pisin. Paper presented at the Congress
of the Linguistic Society of Papua New Guinea, September 1992. Madang.
Givn, Talmy. 1984. Syntax: a functional-typological introduction, Vol. 1. Amsterdam: John
Benjamins.
. 1990. Syntax: a functional-typological introduction, Vol. 2. Amsterdam: John
Benjamins.
Bibliography
215
Gleason, H.A. Jr. 1962. The relation of lexicon and grammar. In Householder and Saporta (eds).
Problems in lexicography. pp. 85102.
Grace, George. 1981. An essay on language. Columbia, S.C.: Hornbeam Press.
. 1987. The linguistic construction of reality. Sydney: Croon Helm.
Grimes, Barbara Dix. 1991. The development and use of Ambonese Malay. Pacific Linguistics
A81:83123.
Grimes, Barbara F. (ed). 1992. Ethnologue: languages of the world. 12th edition. Dallas: SIL
International.
Grimes, Charles E. 1987. Mapping a culture through networks of meaning. Notes on Linguistics
39:2546.
. 1991. The Buru language of eastern Indonesia. Ph.D. dissertation. Canberra: Australian
National University.
. 1992. Refining parts of speech in the lexicon. Paper presented at 1992 Asia International
Lexicography Conference, October 1992. Manila.
. 1994. Mapping semantic relationships in the lexicon using lexical functions. Notes on
Linguistics 65:525.
Grimes, Charles E. and Kenneth Maryott. 1994. Named speech registers in Austronesian
languages. In Tom Dutton and Darrell T. Tryon (eds)., Language contact and change in
the Austronesian world. Trends in Linguistics Studies and Monographs 77. Berlin:
Mouton de Gruyter. pp. 275319.
Grimes, Joseph E. 1980a. Huichol life form classification: IAnimals. Anthropological
Linguistics 22:187200.
. 1980b. Huichol life form classification: IIPlants. Anthropological Linguistics 22:264
274.
. 1989. Information dependencies in lexical subentries. In. M. W. Evens (ed). Relational
models of the lexicon: representing knowledge in semantic networks. Cambridge:
University Press. pp. 167182.
. 1990. Inverse lexical functions. In J. Steele (ed). MeaningText Theory: linguistics,
lexicography, and implications. University of Ottawa Press, Ottawa. pp. 350364.
. 1992. Lexical functions across languages. In Proceedings of the International Workshop
on The MeaningText Theory, 27 July 3 August 1992. Darmstadt, Germany. pp. 123
131.
216
. 1987ms. A field guide to words: relations and linkages in the lexicon. Dallas: SIL
International.
Grimes, Joseph E. and Barbara F. Grimes. 1993. Ethnologue language family index. Dallas: SIL
International.
Grimes, Jos, and others. 1981. El Huichol: apuntes sobre el lxico. Department of Modern
Languages and Linguistics, Cornell University, Ithaca, NY. [Out of print, reissued as
ERIC document ED 210 901].
Haiman, John. 1980. Dictionaries and encyclopaedias. Lingua 50:329357.
Halliday, M. A. K. 1961. Categories of the theory of grammar. Word 17:241292.
Hartmann, Reinhard R. K. (ed). 1983. Lexicography: principles and practice. London: Academic
Press.
. 1986. The history of lexicography. Philadelphia: John Benjamins.
Hashimoto, Mantaro J. 1977. The Newari language: a classified lexicon of its Bhadgaon dialect.
Tokyo: Institute for the Study of Languages and Cultures of Asia and Africa.
Horne, Elinore Clark. 1974. Javanese-English dictionary. New Haven: Yale University Press.
Householder, F.W. and Sol Saporta (eds). 1962. Problems in lexicography. Bloomington:
Indiana University Research Center in Anthropology, Folklore and Linguistics.
Hughes, Jock, 1991ms. Dobel, a language of the Aru Islands. Ambon: Pattimura University and
SIL International.
Ilson, Robert (ed). 1987. A spectrum of lexicography. Philadelphia: John Benjamins.
Jacobson, Marc. R. 1986. Philippine dictionaries on computer. Manila: SIL International.
Lakoff, George. 1987. Women, fire, and dangerous things: what categories reveal about the
mind. Chicago: University of Chicago Press.
Lakoff, George and Mark Johnson. 1980. Metaphors we live by. Chicago: University of Chicago
Press.
Lakoff, George and Mark Turner. 1989. More than cool reason a field guide to Poetic
Metaphor. Chicago: University of Chicago Press.
Landau, Sidney I. 1989. Dictionaries: the art and craft of lexicography. Cambridge: University
Press.
Langacker, Ronald, W. (ed). 19771984. Studies in Uto-Aztecan grammar, Vols. 14. Dallas:
SIL International and University of Texas at Arlington.
Bibliography
217
Leed, Richard L. and Alexander D. Nakhimovsky. 1979. Lexical functions and language
learning. Slavic and East European Journal 23(1):104113. [Revised in J. Steele (ed).
1990. MeaningText Theory: linguistics, lexicography, and implications. Ottawa:
University of Ottawa Press. pp. 365375].
Lehmann, Christian. 1982. Directions for interlinear morphemic translations. Folia Linguistica
16:199224.
Louw, Johannes P. and Eugene A. Nida (eds). 1988. GreekEnglish lexicon of the New
Testament based on semantic domains. New York: United Bible Societies.
McKeon, Richard (ed). 1941. The basic works of Aristotle. New York: Random House.
Melchuk, Igor, 1973. Towards a linguistic meaningtext model. In Ferenc Kiefer (ed). Trends
in Soviet theoretical linguistics. Foundations of Language Supplemental Series 18:3557.
Dordrecht: D. Reidel.
. 1982. Lexical functions in lexicographic description. In Proceedings of the Eighth
Annual Meeting of the Berkeley Linguistics Society. Berkeley: Department of Slavic
Languages and Literatures, University of California. pp. 427444.
. 1989. Explanatory Combinatorial Dictionary and Learners Dictionaries. SEAMEO
Regional Language Centre, Occasional Papers No. 45. Singapore: RELC
Melchuk, Igor and Nikolaj V. Pertsov. 1986. Surface syntax of English: a formal model within
the meaningtext framework. Philadelphia: John Benjamins.
Melchuk, Igor and Alain Polgure. 1987. A formal lexicon in meaningtext theory (or how to do
lexica with words). Computational Linguistics 13(3/4):261275.
Melchuk, Igor and A.K. Zholkovsky. 1970. Towards a functioning meaningtext model of
language. Linguistics 57:1047.
and . 1984. Explanatory combinatorial dictionary of modern Russian. Vienna:
Wiener Slawistischer Almanach.
and . 1988. The Explanatory Combinatorial Dictionary. In. M. W. Evens (ed).
Relational models of the lexicon: representing knowledge in semantic networks.
Cambridge: University Press. pp. 4174.
Moore, Bruce R. Doublets in the New Testament. Dallas: SIL International.
Mosel, Ulrike. 1991. Markedness theory and the distinction of major word classes in Samoan.
Seminar presented at the Australian National University. Canberra.
Murdock, George, and others. 1982. Outline of cultural materials. 5th revision. New Haven,
Connecticut: Human Relations Area Files, Inc.
218
Bibliography
219
Steele, James (ed). 1990. MeaningText theory: linguistics, lexicography, and implications.
Ottawa: University of Ottawa Press.
Simons, Gary F. 1979. Language variation and limits to communication. Ithaca, N.Y.:
Department of Modern Languages and Linguistics, Cornell University.
Simons, Gary F. and Larry Versaw. 1987. How to use IT: a guide to interlinear text processing.
Dallas: SIL International.
Svenson, B. 1992. Practical lexicography: principles and methods of dictionary making. Oxford:
Oxford University Press.
Taumoefolau, Melenaite. 1991. Verbal senses of concrete nouns in Tongan. Paper presented at
the Sixth International Conference on Austronesian Linguistics, May 1991. Honolulu,
Hawaii.
Therik, Tom and Charles E. Grimes. 1992ms. Baria Ulu: a Tetun text. Canberra: Australian
National University.
Tomaszczyk, Jerzy, and Barbara Lewandowska-Tomaszczyk (eds). 1990. Meaning and
lexicography. Philadelphia: John Benjamins.
Vonen, Arnfinn M. 1991. Hunting for nouns and verbs in Samoan. Seminar presented at the
Australian National University, 22 November 1991. Canberra.
. 1992. Nominalisations in Tokelau. Seminar presented at the Australian National
University, 15 May 1992. Canberra.
Weinrich, Uriel. 1962. Lexicographic definitions in descriptive semantics. In Householder and
Saporta (eds). Problems in lexicography. pp. 2544.
Wierzbicka, Anna. 1980. Lingua mentalis: the semantics of natural language. New York:
Academic Press.
. 1985. Lexicography and conceptual analysis. Ann Arbor: Karoma Publishers.
. 1986. Whats in a noun? (Or: How do nouns differ in meaning from adjectives?) Studies
in Language 10(2):353389.
. 1988. The semantics of grammar. Studies in Language Companion Series 18.
Amsterdam: John Benjamins.
. 1991. Cross-cultural pragmatics: the semantics of human interaction. Trends in
Linguistics Studies and Monographs 53. Berlin: Mouton de Gruyter.
. 1992. Semantics, culture, and cognition: universal human concepts in culture-specific
configurations. Oxford: Oxford University Press.
220
. to appear-a. Adjectives vs. verbs: the iconicity of part of speech membership. In: M.
Landsberg (ed). Proceedings of a symposium on iconicity. Zagreb.
. to appear-b. Back to definitions: cognition, semantics, and lexicography. In
Lexicographica 8.
Wimbish, John S. 1989. Shoebox: a data management program for the field linguist. Waxhaw:
SIL International.
Wolff, John, and Soepomo Poedjosoedarmo. 1982. Communicative codes in Central Java.
Ithaca, N.Y.: Southeast Asia Program, Cornell University.
Wurm, Stephen A. and B. Wilson. English finderlist of reconstructions in Austronesian
languages (post-Brandstetter). Pacific Linguistics C33.
Zgusta, Ladislav. 1971. Manual of lexicography. The Hague: Mouton.
Zgusta, Ladislav (ed). 1980. Theory and method in lexicography: Western and non-western
perspectives. Columbia, S.C: Hornbeam Press.
. 1988. Lexicography today: an annotated bibliography of the theory of lexicography.
Max Niemeyer Verlag: Tubingen.
Bibliography
221
222
Index
A
abbreviations................. 15, 24, 37, 43, 124, 172,
................................................. 175, 180, 195
abstract terms................................................... 68
academic audience................... 68, 140, 165, 180
acknowledgments .......................................... 181
active intransitive........................................... 167
active transitive.............................................. 167
active verbs .................................................... 166
activities......................................................... 130
activities and events....................................... 151
Actor ...................................................... 152, 166
Actor noun ..................................................... 127
actors................................................................ 21
Adelaar .......................................................... 165
adjectives ......................... 15, 160, 170, 171, 192
adpositions............................................. 161, 162
Adult .............................................................. 130
affixes ...................................... 51, 103, 159, 163
agent............................................................... 152
all-purpose fields ............................................. 21
alphabetizing................................ 67, 89, 93, 104
alternate pronunciations .................................. 23
ambiguity ....................................................... 112
ambitransitive ................................................ 169
ambivalent category....................................... 163
anaphoric pointers ......................................... 107
animals............................................. 68, 141, 144
Ant ......................................................... 133, 134
anthropologist ................................................ 137
Anti ................................................................ 133
antonym ......................................................... 122
antonyms............ 21, 22, 101, 102, 132, 133, 202
applying a style in WORD............................... 65
Apresyan................................................ 121, 123
archaic words................................................... 40
archiving dying languages ............................... 67
Aristotle ......................................................... 137
artifacts ............................................................ 73
associated activities ............................... 143, 145
asterisk....................................................... 17, 42
attributive....................................................... 170
audience..................... 68, 77, 104, 157, 178, 187
AUTOEXEC.BAT....................................... 2, 55
automated reverse indexing............................. 13
Index
B
backslash codes ................................................. 9
back-up .............................................................. 5
Bartholomew and Schoenhals ........ 19, 105, 106,
................................................. 107, 115, 157
basic field markers .......................................... 16
basic set of fields............................................. 76
basic strategies ................................................ 67
beginning of a dictionary project .................. 177
Benefactee ..................................................... 128
Berlin, Breedlove and Raven ........................ 142
bibliographical references ................. 27, 93, 205
bilingual..................................................... 41, 71
bilingual dictionaries........ 15, 16, 60, 67, 70, 71,
................................. 105, 114, 117, 148, 158
Birds .............................................................. 146
body part terms...... 67, 68, 69, 96, 115, 148, 191
Bolton ............................................................ 168
borrowed words............... 24, 113, 153, 178, 204
botanists........................................... 73, 137, 141
botany .............................................................. 19
both a noun and a verb .................................. 161
bound morphemes ................. 13, 42, 81, 95, 165
bound roots.................... 14, 86, 93, 95, 104, 164
Bright............................................................. 173
Bulmer ........................................................... 142
bundles ............................................................ 21
C
candidates for headwords................................ 99
Cap ................................................................ 133
carrying verbs .................................. 74, 115, 116
Casagrande and Hale..................................... 142
categories of information in a lexical entry..... 92
categorization ................................................ 157
category labels............................................... 158
Caus............................................................... 131
Causal ............................................................ 131
causative ........................................................ 153
223
224
compromise ..................................................... 84
Computer Assisted Related Language
Adaptation [CARLA] programs ............... 118
computer software manual ................................ 3
computerized graphic ...................................... 27
computerized lexical database................... 70, 74
Comrie ........................................................... 173
conceptual correspondence ........................... 140
concordance................................................... 109
confer............................................................... 22
conjunctions .................................. 160, 161, 162
Conklin .......................................................... 142
connotative meaning........................................ 39
Conseq........................................................... 128
consequence .................................................. 128
consistency in labeling .................. 8, 15, 83, 175
Consistent Changes [CC] program................ 200
content words ................................................ 164
contexts............................................................ 36
contextual meaning ....................................... 115
contrastive patterns........................................ 158
Conv .............................................................. 132
conventionalized knowledge ................... 80, 101
converse......................................................... 132
Convert-to-Word [CTW] program .................. 58
co-occurrence restrictions ............................. 105
core arguments .............................................. 171
corpus of natural texts ..................................... 73
corrupted file ..................................................... 5
Counterpart............................................ 132, 134
counting headwords......................................... 67
Coward .......................................................... 167
Cpart ...................................................... 132, 134
cross-reference ....... 4, 14, 21, 22, 23, 49, 64, 67,
........ 79, 82, 83, 94, 100, 119, 125, 126, 139,
......................................................... 180, 202
Crystal ............................................................. 39
cultural items ................................................. 150
cultural-linguistic units...................... 84, 99, 101
customize....................................................... 134
customize the output........................................ 13
customized output ........................................... 13
customized primary sort sequences................. 93
cutting verbs .................................... 74, 116, 126
D
data management ............................................. 67
data notebooks................................................. 19
database format.................................................. 9
database structure .................................... 7, 9, 89
database template............................... 75, 76, 122
data-gathering methods.................................... 72
Date.................................................................. 29
decayed state.................................................. 131
default audience............................................... 68
default configuration ....................................... 54
default sort order ............................................. 58
definitions ................... 16, 17, 18, 19, 36, 38, 39,
.......................... 40, 41, 45, 70, 71, 105, 114,
......................................... 137, 138, 150, 164
Degrad ........................................................... 131
deictics..................................................... 38, 159
denotative meaning............................ 39, 45, 155
department of education .................................. 84
description ................................................. 16, 18
deteriorated state............................................ 131
determining parts of speech........................... 158
deverbal noun ................................................ 128
diacritics ........................................................ 179
dialect information................................. 117, 120
dialect map..................................................... 179
dialect names ....................... 20, 22, 24, 119, 120
dialect variants......................... 23, 118, 119, 179
dialectal synonyms ................ 119, 124, 155, 179
dialectal variants.............................................. 23
dialects........................................... 119, 171, 179
dictionaries ...................................................... 60
dictionary................ 4, 5, 7, 9, 14, 16, 28, 42, 43,
..... 66, 67, 68, 69, 77, 84, 118, 161, 177, 180
dictionary of a related language ...................... 73
dictionary users................................................ 39
dictionary-making.............................................. 7
different audiences .......................................... 13
different classes of notes ................................. 28
different distributional networks ................... 118
different meanings ......................................... 118
different purposes ............................................ 36
different senses ...... 107, 109, 110, 111, 112, 114
differentiae............................................... 40, 137
diglot...................................... 15, 34, 64, 71, 199
digraphs ........................... 6, 7, 58, 89, 93, 94, 95
diminished degree.......................................... 131
directionals ...................................................... 38
disadvantages................................................... 89
discarded.......................................................... 17
discarding fields............................................... 57
discourse particle........................................... 162
Index
disease ............................................................. 68
distinguishing usage restrictions ..................... 23
distribution .................... 117, 120, 159, 162, 164
division breaks................................................. 95
Dixon..................................... 166, 169, 170, 191
dot on the screen.............................................. 58
dot-matrix printer ............................................ 63
double quotes................................................... 52
dual .................................................................. 25
duplicate glosses................................................ 9
Durie.............................................................. 167
E
edible plants .................................................... 68
editorial changes.............................................. 89
em-dash ........................................................... 45
emic ............................................................... 140
emic units ...................................................... 100
emic unity........................................................ 36
emic vernacular categories .............................. 27
emotion words ................................................. 74
emotions .......................................................... 74
empty \lf fields ................................................ 21
encyclopedic fields.......................... 18, 135, 205
encyclopedic information.......... 20, 39, 137, 200
English finderlist ............................................. 53
enhancements and changes to MDF.............. 199
entry......................... 16, 17, 21, 28, 42, 180, 203
Equip ............................................................. 133
ergative .......................................................... 166
ethnobotanists................................................ 137
ethnographic information ................................ 20
ethnographic notes......................................... 201
ethnographic sketch....................................... 180
ethnolinguistic pride........................................ 69
etic ................................................................. 140
etic checklist.................................................... 27
etymology.................. 24, 70, 153, 163, 178, 204
events............................................................. 152
example sentences ... 16, 19, 67, 70, 71, 105, 199
examples extracted from texts......................... 19
examples from dictionaries ............................... 4
excessive duplication ...................................... 43
exclude certain fields................................. 56, 57
exclude entries................................................. 28
exclude from the reversed finderlists .............. 42
exclude part of speech..................................... 61
excluding example sentences .......................... 57
225
F
false polysemy ............................................... 114
fast searches....................................................... 7
fauna .......................... 19, 73, 137, 140, 141, 142
Feel ................................................................ 132
Female ........................................................... 126
Ferrell ............................................................ 164
field codes.................................................... 1, 13
field markers .................................................. 183
field researchers............................................... 60
FIESTA.......................................................... 109
figurative sense.............................................. 102
files and programs ......................................... 207
files created.................................................... 208
filter ............................................................... 122
Filters................................................................. 8
Fin.................................................................. 132
Final phase..................................................... 132
final punctuation.............................................. 15
financial resources........................................... 70
finderlists ........... 5, 17, 41, 43, 56, 60, 61, 64, 67
finding words................................................... 72
first gloss ......................................................... 17
fish ............................................... 19, 67, 68, 147
fish names ...................................................... 115
fixed format ..................................................... 25
floppy drive.................................................. 2, 55
flora............................ 19, 73, 137, 140, 141, 142
Foley & Van Valin ........................................ 166
Foley and Van Valin.............................. 116, 173
folk etymologies ............................................ 110
folk taxonomies ................. 25, 27, 125, 126, 138
footers .............................................................. 63
form ............................................................... 158
form class....................................................... 157
formalism......................................................... 39
Format dictionary ................................ 53, 56, 57
formatted dictionary .................................. 10, 54
formatted output .............................................. 10
226
formatting .................................................... 9, 67
Fox......................................................... 103, 156
Frake.............................................................. 142
free disk space ................................................. 55
free translation................................................. 19
free-form fields.......................................... 49, 51
from the beginning ............................................ 4
fully edited....................................................... 29
function.......................................................... 158
functors.............. 41, 51, 155, 161, 162, 172, 173
fv: .......................................... 10, 49, 50, 51, 206
G
Gen ................................................................ 125
gender .............................................................. 25
general audience.............................................. 69
general note ..................................................... 28
generic ........... 25, 27, 68, 80, 125, 126, 139, 151
generic-specific ....................... 21, 112, 125, 126
genus........................................................ 40, 137
Givn ............................................. 157, 169, 173
gloss................................... 16, 17, 18, 36, 67, 90
gloss fields....................... 16, 36, 37, 38, 41, 187
glossary...................................... 67, 69, 209, 210
glossary files................................................ 2, 54
glosses ............................................................. 70
glossing strategies ........................................... 36
goal ................................................................ 128
government authorities.................................... 66
gradation........................................................ 132
grammatical introduction ...................... 153, 162
grammatical paradigm..................................... 25
grammatical particles ...................................... 37
grammatical restrictions .......................... 21, 105
graphics format type........................................ 28
Grimes and Maryott ...................................... 155
Grimes, B.D................................................... 113
Grimes, B.F. .................................................. 179
Grimes, C........................ 96, 116, 121, 123, 155,
......................... 159, 162, 167, 168, 170, 201
Grimes, J........................ 122, 123, 124, 142, 193
Grimes, J. and B.F. Grimes ........................... 179
Group............................................................. 133
group exploration .......................................... 142
H
Halliday ......................................................... 164
hanging indents................................................ 30
hard copy printout.............................................. 5
Hashimoto........................................................ 27
Head............................................................... 133
headers............................................................. 63
headword .................... 13, 14, 16, 18, 19, 22, 40,
.............................. 67, 73, 79, 89, 92, 96, 99,
................................. 101, 105, 125, 150, 205
helps file .......................................................... 13
hierarchical structure of an entry..................... 45
high frequency words ...................................... 40
historical and comparative linguistics ........... 113
historical reconstructions............................... 154
historically related ......................................... 112
homograph ....................................................... 14
homonym number ............................................ 58
homonym numbers .............................. 23, 57, 58
homonyms............. 14, 22, 45, 58, 61, 83, 93, 94,
................. 109, 110, 111, 113, 162, 163, 180
homonymy ..................................... 107, 109, 115
homonymy, partial......................................... 169
homophone .................................................. 9, 14
Horne ............................................... 40, 154, 155
housekeeping field..................................... 19, 24
housekeeping fields ......................................... 28
housekeeping information ... 8, 29, 67, 75, 89, 93
houses ...................................................... 74, 150
HRAF............................................................... 27
Hughes ........................................................... 167
Human Relations Area Files............................ 27
hyperonym ..................................................... 125
hyponym ........................................................ 126
Javanese........................................................... 40
joining underline ............................................. 17
Jump feature .............................................. 7, 144
jumping to nonadjacent entries ....................... 89
jungle plants .................................................... 68
Index
K
key field..................................... 13, 58, 104, 201
keyboard conventions........................................ 3
keyboard setup............................................. 2, 54
kin terms .............. 38, 67, 68, 115, 148, 149, 177
kinship ..................................... 89, 150, 180, 198
knowledge bank............................................... 20
227
L
Lakoff ............................................................ 142
Landau ................................................. 9, 77, 115
Langacker ...................................................... 173
language code .................................................. 50
language community........................................ 66
language learners ........................................... 118
language of parallelism.................................. 103
large print job .................................................. 63
Lead ............................................................... 133
learn the language and culture......................... 72
Lehmann ........................................................ 173
lemma .............................................................. 13
lexeme.... 13, 19, 38, 67, 100, 101, 106, 161, 205
lexeme-based ........................... 78, 79, 82, 83, 84
lexeme-oriented ......................................... 77, 78
lexical associations ........................................ 121
lexical citation form................................... 14, 61
lexical database............... 5, 9, 13, 54, 60, 67, 71,
............................... 73, 75, 84, 103, 118, 142
lexical entry ............................. 15, 43, 61, 73, 92
lexical functions ......... 16, 20, 21, 106, 110, 121,
......................................... 123, 134, 135, 193
lexical networks............... 74, 101, 110, 121, 141
lexical relations...................................... 121, 201
lexical roots ................................................... 164
lexical sets of similar words ............................ 72
lexical universals ....................................... 39, 40
lexicalized...................................................... 101
lexicalized circumlocutions ........................... 125
lexicalized compounds .................................. 130
lexicographers.................................................. 15
lexicography .................................................. 3, 7
lexicon ............................................................. 67
LEXICON.DB ............................................. 6, 54
life forms ....................................... 138, 140, 141
limitations ........................................................ 54
lingua franca .............................. 18, 72, 113, 153
linguistic analysis ............................................ 75
Liqu................................................................ 132
literally............................................................. 19
literature........................................................... 27
loan sources ................................................... 198
loan synonym......................................... 124, 179
loans......................................... 24, 113, 124, 153
local audience .......................... 69, 104, 105, 165
local audiences................................................. 77
228
M
MACROS............................. 52, 76, 122, 204, 209
Magn.............................................................. 130
main entry............................................ 19, 22, 23
major word classes .......................................... 40
Male............................................................... 126
Maluku Dictionary Formatter........................ 200
Manif ............................................................. 132
mapping lexical networks................................ 21
margins ............................................................ 63
Mat ................................................................ 129
Material ................................................. 129, 150
material culture................................................ 73
material world ............................................... 150
mature phase.................................................. 141
Max................................................................ 130
maximalist ..................................................... 137
McKeon......................................................... 137
MDF fields ...................................................... 13
MDF files .......................................................... 1
MDF output ..................................................... 29
MDFDICT.ANS .............................................. 94
MDF-prompted options................................... 56
MDFSAMPL.DB ........................................ 4, 53
meaning ........... 18, 36, 38, 39, 67, 114, 115, 121
meaning-centric ............................................... 77
meaning-oriented............................................. 79
medicines......................................................... 68
Melchuk ....................................... 121, 123, 124
menu options ......................................... 9, 53, 56
metaphors ...................................................... 151
metathesis ................................................ 24, 154
Min ................................................................ 131
minimal entries.......................................... 67, 74
minimalist...................................................... 137
minor entries........................................ 23, 41, 42
minor sense.............................................. 16, 165
N
Nact................................................................ 127
naive user......................................................... 15
national audience ............................................. 15
national government ........................................ 69
national language.............. 18, 20, 34, 49, 50, 54,
.. 64, 69, 72, 90, 113, 120, 153, 154, 158, 187
national language dictionaries ......................... 15
Index
O
odd-even running footers................................. 58
On-line helps ..................................................... 4
Only................................................................. 21
order of fields ...................................... 4, 13, 187
Organization .................................................. 133
original lexical database.................................. 53
orthographic conventions ........................ 52, 179
output file ........................................................ 58
over-differentiated......................................... 141
229
P
paradigms................................... 23, 25, 171, 175
parallelisms............................................ 134, 156
paraphrase test ............................................... 110
ParD ............................................................... 134
ParS................................................................ 134
parse words.................................................... 105
parsing ............................................................. 73
Part................................................................. 129
part of speech..... 15, 40, 45, 62, 67, 75, 109, 115
partial homonymy.................................. 108, 163
particles............................................................ 51
parts of speech ... 16, 37, 109, 157, 159, 175, 195
part-whole........................................ 21, 112, 141
path .................................................... 1, 2, 54, 55
patient ............................................................ 152
Pawley ................ 71, 72, 74, 100, 101, 103, 114,
......................................... 115, 137, 150, 168
PCX ................................................................. 27
perfective ....................................................... 167
periphrastic causative .................................... 153
Perm............................................................... 131
Phase...................................................... 130, 141
phonetic ......................................................... 204
phonetic fonts .................................................. 14
phonetic form................................................... 14
phonotactically similar .................................. 134
photograph ....................................................... 27
phrasal lexemes ......................... 13, 73, 100, 102
phrasal units..................................................... 67
phrases ............................................................. 49
physical characteristics.................. 141, 142, 146
picture .............................................................. 28
picture books ................................................... 73
picture in entry................................................. 27
plain space ....................................................... 17
plant names .................................................... 115
plants.................................. 67, 74, 141, 142, 177
plural................................................................ 25
Plus ................................................................ 130
Poedjosoedarmo ............................................ 154
poetic text ...................................................... 134
political considerations.................................... 83
polymorphemic ........ 79, 81, 82, 83, 84, 179, 205
polymorphemic forms.......................... 14, 47, 81
polysemy........................ 107, 109, 114, 115, 148
polysynthetic language .................................... 99
portmanteau morphemes................................ 174
230
Q
qualities ......................................................... 152
quality control ................................................... 8
quasi-reflexive verbs ............................. 167, 168
Quit.................................................................. 62
R
range of functions.......................................... 158
range of meaning ......... 18, 67, 70, 109, 115, 150
Range sets............................................ 8, 15, 175
raw SHOEBOX form ...................................... 29
Index
S
safekeeping........................................................ 5
same meaning ................................................ 118
sample database........................................... 4, 54
sample file ....................................................... 53
scale....................................................... 132, 133
Schachter ....................................... 157, 159, 170
scholarly audience ................................... 70, 104
scientific name..................................... 19, 50, 73
scientific nomenclature ................... 73, 140, 141
scientific taxonomy ....................................... 140
scope.............................................................. 162
screen prompts................................................... 9
search and retrieval ....................................... 122
secondary sort character .................................. 95
secondary sort order ........................................ 93
semantic arrangement...................................... 77
semantic categories ................................. 26, 115
semantic domain............... 26, 27, 37, 68, 73, 74,
......................................... 115, 175, 177, 191
semantic primitives ................................... 39, 40
semantic shift................................. 113, 117, 154
semantically bleached senses .......................... 99
semantically complex things ........................... 40
semantically related entries ............................. 27
sensation ........................................................ 132
sense .......................... 9, 17, 19, 21, 28, 180, 203
sense discrimination ...................................... 105
sense number ....................................... 16, 45, 61
sense numbers ................................................. 45
sentence number .............................................. 19
separate dictionaries ........................................ 69
separate publications ....................................... 71
separate volumes ........................................... 177
Seq................................................................. 130
sequence of key strokes..................................... 3
Serial.............................................................. 129
serial verbs..................................................... 159
sets of similar words........................................ 73
several researchers .......................................... 28
shared meaning.............................. 109, 110, 111
shared semantic thread .................................. 109
SHOEBOX ............. 9, 13, 26, 53, 56, 57, 58, 60,
... 73, 75, 76, 89, 93, 115, 122, 142, 144, 175
SHOEBOXs Jump feature ............................. 82
SHOEBOX datestamp..................................... 29
SHOEBOX Filters............................. 27, 90, 177
SHOEBOX interlinear function ...................... 17
231
Sim................................................................. 126
similar ............................................................ 126
Simons ........................................................... 117
Simons and Versaw ....................................... 172
simple morphemes ......................................... 100
simple reversals ............................................... 67
Sing................................................................ 133
singular ............................................................ 25
Sit................................................................... 130
situations........................................................ 130
sketch in a notebook ........................................ 27
slide.................................................................. 27
small caps .......................................................... 3
social usage.................................................... 120
sociolinguistics ...................................... 120, 155
Son ................................................................. 132
sort ................................................................... 14
sort order.......................................................... 58
sort sequences.................................................. 93
SORT.EXE ...................................................... 55
sorting ........................................................ 89, 93
Sound ............................................................. 132
source language ............................................... 24
source of data................................................... 28
space-semicolon-space .................................... 37
spacing integrity .............................................. 17
Spec ............................................................... 126
special characters............................................. 52
special classes of entries................................ 137
special registers ............................................. 154
specialized dictionaries.................................... 67
species.............................................. 40, 126, 137
specifics ......................................... 126, 139, 151
speech register name........................................ 22
speech registers.............................. 125, 154, 155
speech-act verbs....................................... 74, 115
speed in interlinearizing .................................. 17
spelling variants............................................. 120
split document files.......................................... 59
split intransitive ............................................. 166
split-S..................................................... 160, 166
sponsoring agencies....................................... 177
SRT.EXE ......................................................... 58
standard field codes ......................................... 53
standard format markers .................................... 9
Starosta, Pawley, and Reid ............................ 164
Start................................................................ 131
starter list ....................................... 191, 193, 195
State ............................................................... 132
232
T
tables in an entry ............................................. 25
taboo synonym ...................................... 125, 179
taboos ............................................................ 145
taxonomy ................................................... 19, 27
team of compilers ............................................ 28
technical definitions .................................. 18, 70
technical jargon ......................................... 40, 68
TED.COM ....................................................... 56
Template...................................................... 8, 76
U
unaccusative .................................................. 167
undergoer ................................. 21, 127, 152, 166
underline .......................................................... 17
underline bold.................................................. 50
underline character .......................................... 50
underline code ................................................. 51
underline italic................................................. 50
underlining affixes........................................... 51
underlying forms.............................................. 22
underlying roots............................................... 22
unergative ...................................................... 167
unergative-unaccusative ................................ 166
unformatted...................................................... 25
unifying definition ......................................... 115
uninitiated user .............................................. 157
Unit ................................................................ 133
unknown fields .......................................... 29, 56
unstructured text files ...................................... 89
unwanted fields.................................................. 9
UPDATE.CCT....................................... 199, 201
UPPER CASE.................................................... 3
Index
V
variant................ 24, 42, 117, 120, 124, 155, 179
variant forms ................................................... 23
varieties ......................................... 142, 144, 146
variety of output options ................................... 4
verb class ......................................................... 25
verbal subclasses ........................................... 166
vernacular .................................................. 20, 49
vernacular categories....................................... 26
vernacular definition ....................................... 41
vernacular explanations................................... 16
version of WORD............................................ 54
visual examples ............................................... 29
vocabulary ....................................................... 67
vulgar............................................................. 206
Vwhole .......................................................... 129
W
Whole ............................................................ 129
Wierzbicka .............. 40, 115, 157, 162, 164, 170
Windows users .................................................. 2
WINWORD................................................. 1, 54
Wolff and Poedjosoedarmo........................... 154
WORD............. 1, 3, 9, 53, 54, 58, 61, 63, 64, 95
word class...................................................... 157
WORD-for-DOS.............................................. 54
WORD-for-WINDOWS.................................. 54
word-level gloss ............................ 17, 18, 19, 38
wordlists .......................................................... 72
writing a good definition ................................. 39
Wurm and Wilson ........................................... 24
Y
your word processor .............................. 3, 54, 55
Z
zero-derivation....................................... 161, 163
Zgusta .................................... 108, 115, 157, 171
zoologists......................................... 73, 137, 141
zoology ............................................................ 19
233