Você está na página 1de 11

Characteristics of Korean Personal Names

Sungwon Kim
Department of Library and Information Science, Chungnam National University, Daehak-ro 99, Yuseong-ku,
Daejeon, Korea. E-mail: sungwonk@cnu.ac.kr
Seongyun Cho
Department of Digital Media, Anyang University, 37-22 Samduk-ro, Manan-ku, Anyang, Kyunggi-do, Korea.
E-mail: scho@anyang.ac.kr

Korea, along with Asia at large, is producing more and


more valuable academic materials. Furthermore, the
demand for academic materials produced in nonWestern societies is increasing among English-speaking
users. In order to search among such material, users rely
on keywords such as author names. However, Asian
nations such as Korea and China have markedly different
methods of writing personal names from Western naming
traditions. Among these differences are name components, structure, writing customs, and distribution of
surnames. These differences influence the Anglicization
of Asian academic researchers names, often leading to
them being written in various fashions, unlike Western
personal names. These inconsistent formats can often
lead to difficulties in searching and finding academic
materials for Western users unfamiliar with Korean and
Asian personal names. This article presents methods for
precisely understanding and categorizing Korean personal names in order to make academic materials by
Korean authors easier to find for Westerners. As such,
this article discusses characteristics particular to Korean
personal names and furthermore analyzes how the personal names of Korean academic researchers are currently being written in English.

Introduction
Names clearly have immense value to their owners. Not
only do names provide a way of differentiating and perceiving individuals, they help form ones sense of identity and self
(Thompson, 2006). Due to the development of information
and telecommunication technology, academic communication has expanded and globalized in various academic fields.
While Korean researchers, among others, are increasingly
publishing their research findings in English, American

Received March 7, 2012; revised June 7, 2012; accepted July 13, 2012
2012 ASIS&T Published online 6 December 2012 in Wiley Online
Library (wileyonlinelibrary.com). DOI: 10.1002/asi.22781

researchers are increasingly demanding materials produced


by foreign researchers or written in foreign languages
(OCLC, 2008). Furthermore, a survey revealed that most of
the 20 million books housed in Americas 50 or so specialized
Asian libraries are written in non-English languages or by
foreign authors (CEAL, 2011). As the number of foreignproduced or foreign-language documents increases, the
American demand for such documents is also increasing.
Accordingly, a more exact system of searching for and distinguishing between these documents is more necessary than
ever.
Traditionally in Western libraries, the authors name is
one of the main entries in a catalog and therefore a primary
access point. Although the authors name is an important
element in searching for and differentiating documents,
these searches and differentiations are often imperfect
(Smalheiser & Torvik, 2009). Asian authors face even more
difficulties than Western authors due to the fact that written
Asian names are more prone to inconsistencies and ambiguities. According to Fullers (1989) and Kims (2001)
research on the consistency of writing authors personal
names, Western authors are more consistent in their written
names. We can further attribute this to traditional namewriting disparities between the two cultures. Furthermore,
Asian authors follow their native naming formats rather than
Western formats when Anglicizing their names. Ultimately,
searching for documents by author name can be an unreliable process, all the more so when it comes to Asian and
Korean authors (Hu, 2000).
These problems are being addressed in several ways. The
fact that such various efforts are being made to improve
exact identification points to the importance of distinguishing individuals through names. This article offers a basic
foundation to such efforts by analyzing the characteristics of
Korean personal names and by examining the current trends
of how these characteristics are translated in the process of
Anglicization, focusing on the case of Korean academic
researchers. The characteristics of Korean personal names

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 64(1):8695, 2013

would include name components, structure, and writing


formats. The names of 192,336 academic researchers were
analyzed in order to determine the writing formats of
Korean personal names.

Prior Research
The ability to identify the author of a particular article or
all articles published by a particular author is a fundamental
issue in the field of information science. Although ostensibly
simple issues, these are in fact central, major problems that
have yet to be solved. Different approaches to author name
disambiguation have been developed across various fields
and can be organized into three broad types.
Within the library field, authority control is being performed in order to disambiguate author name, where
instances of one author being published under differing
names are aligned under one heading chosen as an authoritative form that is then linked to the various alternative name
forms (Hu, Lo, & Tam, 2004; Library of Congress, 2012;
Loesch, 2011; Mak, 2011; Maxwell, 2002; Naito, 2004;
Scoville, Johnson, & McConnell, 2003; Tillett, 2002).
Meanwhile, within the computer science field algorithms are
being developed to identify single authors published under
various differently written names and different authors with
identical names (Anderson & Carballo, 2001; Binongo,
2003; Feitelson, 2004; Han, Zha, & Giles, 2005). Furthermore, the academic and publishing worlds have experienced
problems in differentiating individuals and, as an alternative,
identify researchers through separate distinguishing IDs and
profiles (Dervos, Samaras, Evangelidis, Hyvrinen, &
Asmanidis, 2006; Elsevier, 2006; Thomson Reuters, 2006).
Individual organizations and publishers are attempting to
assign identification numbers, analogous to social security
numbers, to individual researchers as an alternative way to
solve issues of author name ambiguity. However, as these
efforts are being pursued independently, authors would be
assigned separate IDs from each publisher and organization
and thus the various IDs for each author could cause further
ambiguity problems. As a solution, the Open Researcher &
Contributor ID (ORCID) is working to improve efficient
academic communications by establishing a central registry
of single unique identifiers for individual researchers, a database that would be linked to other author ID schemes from
various organizations (ORCID, 2012).
Prior research regarding Korean names can be roughly
divided into the fields of library and information science,
computer science, and linguistics. Research in the field of
library and information science is primarily concerned with
the issues of authority control and indexing non-English
materials. Research regarding authority control usually
refers to the importance of collaboration and deduction in
methods of authority control (Naito, 2004; Takahashi,
2005). Naito (2004) introduced a collaborative project
between Korea, China, and Japan wherein the three countries would utilize collective authority records for authors

from all three nations. Takahashi (2005) pinpointed flaws


and suggested improvements drawn from his experience as a
non-Korean with Korean Author-Name Authority. Here we
can refer to research regarding authority control (Hu, 2000;
Hu et al., 2004) for Chinese names, which share similar
characteristics with Korean names, as well as the recent
methodology review project (Riemer & Schreur, 2012) for
aligning PCC (Program for Cooperative Cataloging) authority files and RDA (Resource Description and Access). The
Indexer published a series covering the problems in indexing
various non-English materials written in languages such as
East Asias Korean, Chinese, and Japanese, in which they
offered one perspective into the process of formatting nonEnglish names, including Korean (Akhtar, 2007; Indexing
personal names, 20062008).
In the field of computer science, author name disambiguation is a widely researched topic within Internet environments and various bibliographic databases (Smalheiser &
Torvik, 2009). Yet most of this research regarding name
disambiguation is focused on the topic as a whole, and it is
difficult to find studies that deal specifically with Korean
names. It has been reported that at least 10 different systems
of author name disambiguation have been proposed over the
past few years (Elliot, 2010). Although the various systems
proposed in the computer science field differ in their specific
methodology, they are primarily based on machine learning,
while the approaches can be divided into supervised and
unsupervised models (Pereira et al., 2009; Smalheiser &
Torvik, 2009). The machine learning involved in supervised
approaches (Han, Giles, Zha, Li, & Tsioutsiouliklis, 2004;
Torvik, Weeber, Swanson, & Smalheiser, 2005) takes input
in the form of pairs of articles that serve as training examples
labeled as either positive (author match) or negative (not
author match); unsupervised approaches (Han et al., 2005;
Yin, Han, & Yu, 2007) do not use such labeled training
examples. For this reason, supervised approaches generally
perform better (Smalheiser & Torvik, 2009). Within the
computer science field, research specifically focused on
Korean names primarily deals with automatic transliteration, with various studies being done on improving transliteration performance (Hong, Kim, Lee, & Rim, 2009;
Song & Park, 2011).
Other areas of research regarding Anglicization of Korean
names include studies about standardizing and optimizing
Korean name Anglicization, studies which are intimately
related to the research discussed in this article. Linguistic
research regarding Korean name Anglicization formats has
largely been performed by Korean researchers. Kim (2008)
suggested that the problems with Anglicizing Korean language, including names, required that Anglicization be considered a form of translation, thus offering a new basis for
understanding Anglicization. Kim (2006) considered the
problems with the phonetically based McCune-Reischauer
system, commonly used when cataloging Korean materials in
American libraries, and proposed several methods for
improvement. Kim (2001) analyzed the consistency of Anglicization in the names of Korean researchers and compared

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

87

the name order and format of the Anglicized names of 263


researchers of English literature with the names of 818 other
citizens. Chai (2009) proposed an order and format for Anglicized Korean names designed to be easily understood by
foreign readers. Kim and Kim (2012) patternized diverse
Romanization formats of Korean personal names and suggested a format for the Romanization of Korean personal
names which is considered to be the best.

Significance of Anglicizing Non-English


Personal Names
The surest way to completely convey Korean personal
names is to write them in their indigenous Korean language.
However, Korean names written in Korean would be impossible for foreign readers to understand, while it would be
equally impossible for foreigners to search and distinguish
between Korean written names. Hence, it is necessary for
Korean personal names to be expressed in the readers language. Writing personal names in forms other than their
native languages is essentially a type of translation. The
process of translating words between languages can be
divided into three categories: (1) zero translation, where the
usage of a word in one language is directly adopted into
another; (2) phonetic translation, where the spelling or pronunciation of the source language (SL) is expressed in the
alphabet of the target language (TL); and (3) meaning translation, where a word of the SL is substituted for a word from
the TL with an identical meaning.
With Korean proper nouns, including personal names,
meaning translations are fundamentally impossible and thus
phonetic translations are most appropriate. Phonetic translations are most often applied to proper names in a source
language where meaning translations are inapplicable,
such as personal names or local place names (Kim, 2008).
Phonetic translations offer insight into pronunciation and
spelling but present little information about the forms and
meanings of applicable words.

Characteristics of Korean Personal Names


The structure and writing formats of Korean personal
names have various differences from those of Western personal names. This article focuses on these differences in
examining the characteristics of Korean personal names.
First, personal names differ in their components. Korean
personal names are comprised of given names (Gn) and
surnames (Sn), without the commonplace component of a
middle name. Occasionally, names written as Sung Won
Kim or in the Gn Gn Sn format can be confused with the
Western tradition of middle names, when it is rather the
given name being written in two separate syllables. Furthermore, immigrants and international students and other
Koreans with extensive abroad experience sometimes facilitate easy communication with the local populace by adding
a given name in the local language such as Jessica Jihye
88

Kim or Michelle Heyon Kim. Although in this case, the


Korean personal name functions as a middle name, this is
not common and names in the pure Korean language do not
have middle names.
Second, the structural characteristic of Korean personal
names offers insight into typological standards. The typological standards of the Korean personal name are comprised of a single-syllable surname and a two-syllable given
name. Thirteen two-syllable surnames including Namgung
), Hwangbo (
), Jegal (
), and Sagong (
) do
(
exist, but these are exceptional cases applying to approximately 0.094% of the Korean population (Korea, Statistics
Korea, 2010), while the majority of Korean personal names
follow the format of single-syllable surnames.
Third, the standards mentioned above also include variability in spacing. Western names are uniformly spaced
between given name, middle name, and surname. By comparison, the three syllables of a Korean name can be written
as all attached or spaced. Nevertheless, this variability in
spacing does not create problems in distinguishing individuals, due to the fact that the first syllable and then the latter
two are invariably recognized as the surname and given
name. Furthermore, the 13 rare two-syllable last names are
already familiar to the Korean population at large and cause
few problems in distinguishing given names and surnames.
As such, variable spacing is usually directly applied in the
Anglicization of Korean names, the inconsistencies in separating the two syllables of the given name leads to difficulties in Anglicized name identification.
Fourth, Koreanand Chinese and Japanesepersonal
names are characteristically written with the surname before
the given name. Meanwhile, with the exception of a few
nations such as Hungary, Western names are mostly written
with the given name before the surname. The rules of Anglicizing Korean names follow the rules, Romanization of
Korean (Korea, Ministry of Culture, Sports and Tourism,
2000). According to those rules, the Anglicization of names
should follow the order of surname before given name and
attach the two syllables of the given name. These rules
directly apply Korean name formats to Anglicization and
oppose the Western formats of given name before surname,
therefore causing confusion among Western readers.
Fifth, another characteristic of Korean names is that
women do not change their names, regardless of marital
status. In Western culture the prevalent format is for women
to change their surnames from an unmarried maiden name
to their husbands surname (Patterson, 2009). In Korean
culture, women continue to use their patrilineal surnames
even after marriage rather than adopting married names.
Certain expatriates or international students may choose to
follow Western customs and change their married names,
but this is unusual in Korean traditions.
Sixth, Korean surnames are much fewer in number
than Western surnames, therefore the discernibility of
these names differs greatly. Great Britain has more than
1.45 million registered surnames (Cheshire, Longley, &
Singleton, 2010, p. 403), while Finland has approximately

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

75,000.1 The number of surnames will differ between languages and regions, but Western surnames can be estimated at
several ten thousand to a million per nation. By comparison,
the Korean population census lists 286 individual surnames
(Korea, Statistics Korea, 2010). Furthermore, if not relying
on the Chinese characters (i.e., written in Korean), this is
reduced to about 180 different names (Lee, Song, & Fourser,
2011). For example, the various Chinese characters for Koo
) and Noh (
) are reduced to the single
(
Korean spelling and pronunciation of Koo ( ) and Noh
( ). Accordingly, the Anglicized transliteration of the
Korean spelling or the transcription of the Korean pronunciation would be treated as identical names and impossible
to differentiate. The numerical differences between these
surnames influence their individual distinguishability. The
most frequent surname per capita among the American and
British populations is Smith, comprising a respective
0.881% and 1.121%, or approximately 1 in every 100
people.2 The most common surname in Finland is Hansen,
forming an estimated 0.7% of the entire population (Aksnes,
2008). In comparison, the five most frequent Korean surnames are as follows: Kim 21.6%, Lee 14.8%, Park
8.5%, Choi 4.7%, and Chung 4.3%, adding up to about
54% of the entire population, while the top 37 most frequent
surnames collectively account for more than 90% of the
population (Korea, Statistics Korea, 2010). The distribution
of top 50 surnames in the Korean and American population
is diagrammed in Figure 1. Thus, we can determine that
individual differentiation between Korean personal names is
difficult when solely on the basis of surnames; individual
differentiation therefore relies on the complete conveyance of
a combined given name and surname.

Thus far, this article has introduced some fundamental


characteristics of Korean personal names. As noted, the differences between the Anglicization of Korean names and the
typical Western naming formats lead to difficulties in identifying and distinguishing individuals when searching for
Korean names.
Diversity in the Anglicization of Korean
Personal Names
The diversity in Anglicized Korean names stems from
two main sources. The first is the diversity resulting from a
Korean name being Anglicized in various different ways,
whereas the second is diversity in writing formats.
Diversity in Spelling
The second most frequent Korean surname, Lee ( ) can
be spelled in a variety of ways including I, Lee, Leigh, Li,
Lie, Ree, Rei, Rey, Rhee, Rhi, Rhie, Rhye, Rhyie, Ri, Yee,
Yi, according to personal preference. According to a
national survey of passport applications, the surnames of
Kwon ( ) and Yu ( ) have been Anglicized with over
100 different spellings (Lee, Song, & Fourser, 2011). The
vastly diverse Anglicized spellings for identical Korean
names stem from different individual preferences in transliteration and transcription. The conventions for Anglicizing
the Korean language can be broadly differentiated into one
system devised by Korean linguistic scholars and another
system devised by foreign academic researchers. The system
devised by Korean linguists is nationally accepted as the
most representative standard for the Anglicization of the
Korean language. The system has been revised several times
through several generations. Therefore, different generations would use different systems of Anglicization, leading
to further inconsistencies. The foreign-devised Anglicization systems have included (1) the so-called Victorian
system developed by early Western missionaries; (2) the
Yale system collaboratively developed after extensive linguistic research by Korean and American scholars; and (3)
the McCuenReischauer system considered to be versatile
and still widely used across Western academic fields
(McCune & Reichauer, 1939; Rutt, 1972). With such a
variety of systems, individuals can utilize any one of several
spellings. However, since the practice of spelling Korean
names in English is based solely on personal preference, this
article will exclude the spelling issue from its analysis.
Diversity in Writing Formats

FIG. 1. Top 50 surname distribution in Korean and American populations


(accumulated percentages). (Color figure can be viewed in the online issue,
which is available at wileyonlinelibrary.com.)

1
2

http://koti.mbnet.fi/pasenka/names/about.htm
http://www.britishsurnames.co.uk/surnames/SMITH

Diversity in the actual formats in which Anglicized


Korean names are written further contributes to the disparities of Anglicization. The diversity in formats again depends
on the components of surname and given name and the two
syllables of the given names. As previously mentioned,
Korean names are conventionally written with the surname
before the given name. Accordingly, Anglicized Korean

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

89

names can be found written in the Korean (surname given


name) format or in the Western (given name surname)
format-in the former case the surname will sometimes be
followed by a comma (surname, given name) in order to
clarify the inversion of the typical Western format. These are
the three primary formats in which Anglicized Korean names
are ordered. Less conventionally, some cases will follow the
Western order while inserting a comma after the given name
(given name, surname). However, these instances are clerical
errors that comprise a mere 0.65% of the analyzed data.
Table 1 organizes the ordering formats found in Anglicized
Korean names, classified as Eastern and Western orders.
TABLE 1.

The ordering formats of anglicized Korean personal names.

No. Format
1
2
3
4

Gn Sn
Gn, Sn
Sn Gn
Sn, Gn

Description
Western given name-surname order
Western given name-surname order (separated by comma)
Eastern(Korean) surname-given name order
Eastern(Korean) surname-given name order (separated by
comma)

Sn: Surname, Gn: given name (normally consisting of two syllables).

The other major factor contributing to disparities is the


different formats in writing the two syllables of the given
name. The primary formats consist of (1) attaching the two
syllables, (2) dividing the two syllables into separate components, and (3) connecting the two syllables with a hyphen.
In each form, the capitalization of the second syllable can
differ as well. The writing formats for Anglicizing the two
syllables of the given name are organized in Table 2.
TABLE 2.

The writing formats of a two-syllable Korean given name.

No. Format Examples

Description

1
2

Gn-Gn Sung-Won The syllables of the given name are hyphenated


Gn-gn Sung-won
meant to distinguish syllables

3
4

GnGn
Gngn

5
6

Gn Gn Sung Won The syllables of the given name are separated


Gn gn Sung won

SungWon The syllables of the given name are attached


Sungwon

Gn: Syllable starts with capital letter, gn: Syllable starts with lowercase.

As the tables above indicate, the various components of


Korean names can lead to much diversity in Anglicization.
The four variations of writing order and the six variations of
writing format together create 24 possible forms of Anglicization. The complete list of all 24 variations is included in
Table 3.
Methods
This article examines both the structural and conventional characteristics of Korean personal names and analyzes the practical trends visible in the current practices of
Anglicizing the names of Korean researchers. The process
90

TABLE 3.

Format distribution of Korean researchers anglicized names.

Ranking

Format

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Total

Gngn Sn
Sn Gn Gn
Sn, Gn-Gn
Gn Gn Sn
Gn-Gn Sn
Sn, Gn Gn
GnGn Sn
Sn Gn gn
Sn Gngn
Sn Gn-Gn
Sn, Gngn
Sn, Gn-gn
Sn, GnGn
Sn Gn-gn
Sn GnGn
Gn-gn Sn
Gn gn Sn
Sn, Gn gn
Gn Gn, Sn
Gn-Gn, Sn
Gngn, Sn
GnGn, Sn
Gn-gn, Sn
Gn gn, Sn

Format
Proportion
no.
Frequency
(%)
4
17
19
5
1
23
3
18
16
13
22
20
21
14
15
2
6
24
11
7
10
9
8
12

44,720
32,360
30,543
14,084
13,782
9,865
7,087
6,815
5,493
5,236
4,786
4,354
2,790
2,430
2,336
2,153
1,566
686
401
345
217
148
107
32
192,336

23.25
16.82
15.88
7.32
7.17
5.13
3.68
3.54
2.86
2.72
2.49
2.26
1.45
1.26
1.21
1.12
0.81
0.36
0.21
0.18
0.11
0.08
0.06
0.02
100.00

Example
Sungwon Kim
Kim Sung Won
Kim, Sung-Won
Sung Won Kim
Sung-Won Kim
Kim, Sung Won
SungWon Kim
Kim Sung won
Kim Sungwon
Kim Sung-Won
Kim, Sungwon
Kim, Sung-won
Kim, SungWon
Kim Sung-won
Kim SungWon
Sung-won Kim
Sung won Kim
Kim, Sung won
Sung Won, Kim
Sung-Won, Kim
Sungwon, Kim
SungWon, Kim
Sung-won, Kim
Sung won, Kim

of analysis involved surveying the Anglicized names of


Korean researchers registered with the National Research
Foundation (NRF) of Korea, examining these collected data,
and classifying the results by type.
Data Collection and Clearing
In order to assemble data regarding the personal names of
Korean academic researchers, the Korean and Anglicized
names of 192,336 Korean researchers were collected from
the database at the NRF, Koreas leading research administration. The NRF database requires Anglicized names as a
mandatory field for registered researchers, who personally
input their information. The process does not distinguish the
surname and given name into separate fields, giving the
researcher control over how his or her name will be formatted. Thus we could collect information conducive to the
purpose of observing how individual Korean academic
researchers choose to Anglicize their names.
The Korean names of the Korean academic researchers
were included in the analyzed data due to the fact that it is
often impossible to distinguish between the different components of a Korean name in their Anglicized form. For
instance, a name written as Jung Woo Ri can be read as an
Sn Gn Gn format or a Gn Gn Sn format, but the Anglicization alone makes the intended order impossible to discern. Because the Korean name field follows the universal
Korean Sn Gn order, the Anglicized surname and givenname components can be identified with the Korean names
for reference.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

As previously mentioned, English names are a mandatory


field in this database; however, the form of entry is unregulated and as such some researchers have entered names in
forms inappropriate for this case and thus were excluded
from the analyzed data. Approximately 1,000 entries were
omitted, including cases where only two initials were
entered or names that were not spaced or of non-Korean
researchers. Furthermore, occasional errors such as a single
entry occupying more than one space were corrected or
clarified in the process of data clearing. The refined data
consist of two fields, Korean names and Anglicized names,
as exemplified in Figure 2.

FIG. 3. Flow chart for classifying Anglicized name format. (Color figure
can be viewed in the online issue, which is available at
wileyonlinelibrary.com.)

FIG. 2. Sample data. (Color figure can be viewed in the online issue,
which is available at wileyonlinelibrary.com.)

Designing the Analyzer for Formats of Anglicized


Korean Names
The refined data of the researchers names were then
processed using an analyzer developed to classify the
formats of Anglicized names. The analyzer was designed to
compare Korean names and Anglicized names and enable
distinction between the Anglicized formats. The processing
algorithm and procedure are as follows: (1) as Korean names
follow a fixed surname-given name order, the algorithm
deduces English letters to be used in Anglicization using
Korean names as reference; (2) the selected letters are compared to the researchers self-entered Anglicization and the
name order is inferred; (3) the Anglicized surname component is distinguished; (4) the format of the remaining given
name components are analyzed and identified; and (5) the
name order and the component format are combined and
identified as an Anglicization format. The complete algorithm for the Anglicization classification procedure is illustrated in Figure 3.
Analyzing the Anglicization Formats of Korean Names
Classifying Anglicized Korean names requires two
analyses, separately handling the name component order

and the format of the components. First the orders of the


Anglicized names are classified. The name order is discerned by comparing the Anglicized names to the immobile
order of the Korean names. As the unchanging surnamegiven name Korean order dictates the first syllable will be
the surname, after converting the first syllable to the appropriate English letters the Anglicization can be compared to
the English name data to determine the name order. In order
to do so, the syllables of the Korean name will be divided
into initial, medial, and final phonemes and each one
matched to the appropriate English alphabet letters, after
which the Anglicized first and last syllables will be compared to the corresponding syllables from English name data
and the surname and subsequent name order can be determined from this comparison. The first and last syllables of
Anglicized names are examined because the surname of a
Korean name can only ever be the first or last syllable in any
form of Anglicization, never the middle.
After determining the placement of the surname, the
name components can be divided into separate fields and the
formats of the components can be defined. The name component formats can be differentiated through elements such
as hyphens, blanks, and capitalization into six separate classifications, as previously demonstrated in Table 2. Once the
format of the name components have been decided and
combined with the name order, the name can be classified in
one of the 24 formats listed in Table 3. The completed analysis results are listed with distinguished surname and name
components, name order, and format number, as exemplified
in Figure 4.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

91

FIG. 4.

TABLE 4.

Sample results. (Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.)

Format characteristics used in verification.


Appearance
order of **

Number of **
Pair
no.
1
2
3
4

Format
no.

Format*

6
18
5
17
10
22
9
21

Gn gn Sn
Sn Gn gn
Gn Gn Sn
Sn Gn Gn
Gngn, Sn
Sn, Gngn
GnGn, Sn
Sn, GnGn

0
0
0
0
0
0
0
0

0
0
0
0
1
1
1
1

2
2
2
2
1
1
1
1

2
2
3
3
2
2
3
3

N
N
N
N
N
N
N
N

N
N
N
N
1
1
1
1

1
1
1
1
2
2
2
2

Additional methods of verification


Upper & lower case ULU(F6), UUL(F18)
Compare Sn components with Korean surname list (consists of 180 surnames)
Compare no. of letters before and after comma
Compare no. of uppercase before and after comma

*Sn: Surname, Gn: given name, syllable start with capital letter, gn: given name, syllable starts with lowercase.
**B: blank, C: comma, H: hyphen, U: uppercase letter, N: no appearance, 1/2/3: appearing order of components.

Verifying the Anglicization Formats of Korean Names


The accuracy of these classifications can be verified
through factors such as appearance orders and characteristic
symbols included in Anglicization such as commas,
hyphens, spaces, and capitalization. The majority of Anglicization formats can be reevaluated in accordance with their
inclusion of particular symbols or arrangements. Certain
formats will include all of the characteristics involved in
verification. The formats that these characteristics influence
are diagrammed in Table 4 by each identifying characteristic, verifying the precision of the format classifications. In
Table 4, patterns 6 and 18 share all the characteristics used
to distinguish pattern. Accordingly, in order to disambiguate
these two patterns the capitalization of each syllable has
been additionally qualified. Thus the pair can be distinguished as pattern 6 (Upper-lower-Upper) and pattern 18
(Upper-Upper-lower). The other pairs can be distinguished
according to the methods presented in Table 4.
Data Analysis Results
Overall, the most frequently used format in Korean
researchers Anglicized names is Format 4, Gngn Sn, utilized by 23.25% of the surveyed researchers. The name
92

order and component style of this format are fully congruent with typical Western naming formats. The second most
frequently used format is Format 17, Sn Gn Gn, which
16.82% of researchers used. The name order of this format
is identical to the order of Korean names, while the two
syllables of the given name are separated, as is commonly
practiced in Korean naming patterns. This format could
lead to confusion over surname placement when read by
non-Koreans, while the first syllable of the given name
could be misread as a middle name. The third most frequently used format is Format 19, Sn, Gn-Gn, used by
15.88% of researchers. Although the name order of this
format is ostensibly identical to the typical Korean order,
the inclusion of a comma to imply inversion indicates congruence with the Western order. In terms of the given name
format, the two syllables of the given name are connected
with a hyphen, indicating differentiation between the syllables. Thus this format reflects aspects of both Korean and
Western naming formats. These three formats comprise
55.95% of the analyzed data, whereas the remaining
formats were each utilized by less than 10% of the surveyed researchers. Table 3 demonstrates the analyzed data
results and the format distribution of the collected and
verified data.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

Order of Surname and Given Name Components


The percentage of Anglicized personal names formatted
in the given name surname (Gn Sn) order is high, at
43.36%. Meanwhile, 27.57% used the Korean name order
but inserted a comma between the two components (Sn, Gn).
This format includes a comma to imply an inversion of the
Western naming order; hence, if we interpret these entries
to be congruent with the Western naming order, the percentage of Western-ordered Anglicized names rises to 70.93%.
The other 28.42% of Anglicized names were formatted in
the surname given name Korean order (Sn Gn), whereas
the patently incorrect given name-comma-surname (Gn,
Sn) format was utilized by a very small percentage of
0.65% of researchers. The distribution of the various name
component orders is demonstrated in Table 5.

TABLE 5.

demonstrates, 61.86% of researchers capitalized the second


syllable of their given names. In particular, out of cases with
syllables hyphenated or separated in accordance with
Korean naming formats, 86.17% of hyphenated (ratio of
Table 6 Gn-Gn and Gn-gn) and 84.66% of separated
(ratio of Table 6 Gn Gn and Gn gn) also featured capitalized second syllables. In short, Anglicized names
modeled after Korean naming patterns had a higher ratio of
capitalized given name second syllables than names that
followed Western naming formats. In comparison, typically
Western-formatted names where the two syllables of the
given name were attached and regarded as one component
had an 81.7% rate (ratio of Table 6 Gngn and GnGn) of
un-capitalized given name second syllables. These trends
are organized in Table 7.

TABLE 7.

Distribution of name order.

Name
order

Frequency

Proportion (%)

Notes

Gn Sn
Gn, Sn
Sn Gn
Sn, Gn
Total

83,392
1,250
54,670
53,024
192,236

43.36
0.65
28.42
27.57
100

Format 16
Format 712
Format 1318
Format 1924

Capitalization of the second syllable of the given name.

Second syllable
Proportion
of given name Frequency
(%)
Capitalized

118,977

61.86

Un-capitalized

73,359

38.14

Total

192,336

100

Notes
Format 1, 3, 5, 7, 9, 11, 13, 15,
17, 19, 21, 23
Format 2, 4, 6, 8, 10, 12, 14, 16,
18, 20, 22, 24

Format of the Two-Syllable Given Name Components

Discussion and Conclusion

In all, 35.13% of Anglicized names format the two syllables of the given name component by attaching them,
whereas 34.22% separate the syllables and 30.65% hyphenate the two. Evidently, the three syllabic formats of the given
name are close to equal in usage. Although attaching the two
syllables conforms with Western naming conventions, cases
with syllables spaced or hyphenated in accordance with the
Korean order comprised 64.87% of the researchers. These
ratios are organized in Table 6.

This research is intended to further understanding of the


particulars of Korean personal names and the formats
present in Korean academic researchers Anglicized names,
make the process of searching and distinguishing by author
names more precise and efficient, and allow information
professionals to more precisely describe Korean personal
names while building library catalogs and metadata and
carrying out authority control. Such analyses contribute to
examining the current practices of Anglicizing the names of
Korean researchers. Our research provides a concrete, tangible analysis of the immediate trends of Anglicization
among 192,336 Korean researchers as well as insight into
how the theoretical Anglicization formats have been
adopted.

Format of the Second Syllable of the Given Name


The second syllable of the given name can determine
different formats based on its capitalization. As Table 7

TABLE 6.
Format of
given name
Gn-Gn
Gn-gn
GnGn
Gngn
Gn Gn
Gn gn
Total

Distribution of two-syllable given name formats.

Frequency

Proportion
(%)

49,906
9,044
12,361
55,216
56,710
9,099
192,336

25.95
4.70
6.43
28.71
29.48
4.73
100

Notes

Frequency

Proportion
(%)

Format 1,7,13,19
Format 2,8,14,20
Format 3,9,15,21
Format 4,10,16,22
Format 5,11,17,23
Format 6,12,18,24

58,950

30.65

67,577

35.13

65,809

34.22

192,336

100

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

93

FIG. 5. Examples of personal name authority records from VIAF (Virtual International Authority File). (Color figure can be viewed in the online issue,
which is available at wileyonlinelibrary.com.)

Overall, the Anglicization format of Gngn Sn, which


completely conforms to the Western naming format, is the
most frequently used format among Korean researchers.
However, the majority of the Anglicized names reflect the
characteristics of the original Korean names either in the
name order or the component style. When analyzing the two
foci of name order and component style, two factors that
serve to exacerbate ambiguity in Anglicized names, 70.93%
of the names adopted Western name orders. Meanwhile,
64.87% utilized formats modeled from the Korean tradition
in formatting the two syllables of the given name.
To be more specific, these research results can be used as
basic material for various approaches to name disambiguation. Within the library and information science field, these
results can also be referred to when determining a heading
format for personal name authority records. The current
state of personal name authority records often features conflicting heading formats according to the source of origin, as
exemplified in Figure 5 (VIAF, 2012).
The International Federation of Library Associations
(IFLA) suggests that the choice of entry-word [for personal
names] is determined as far as possible by agreed usage in
the country of which the author is citizen (Chaplin &
Anderson, 1967, p. ix). With this statement in mind, this
research can be used to determine what this agreed usage
would entail. In short, this research could consolidate
agreed usage of entry format by determining the most
commonly used name formats among Koreans. If the
empirical data in this research were used to consolidate the
heading in authority records, it would minimize the data
composition required for Alternative Name Forms contained in the 4XX data fields and thus improve the efficiency of authority control development. In addition, if the
Korean name characteristics outlined in this article are properly understood, we could prevent the errors in authority
record headings that stem from confusing the name elements, as seen in the final heading in Figure 5.
Furthermore, these results could be used in the computer
science field in developing automatic name disambiguation
algorithms. The various name formats organized in this
article could serve as basic materials for developing a rule
94

base for Korean name disambiguation as well as machine


learning for author differentiation.
It is our hope that with improved cultural understanding
of traditional Korean naming patterns as well as their many
possible transliterations along with a knowledge of the
actual name transliterations commonly used among scholars
today, librarians, computer programmers, and researchers
will be better able to identify and catalog Korean author
information correctly.
Acknowledgments
We thank the National Research Foundation of Korea
(NRF) for supplying the data that made this research possible, as well as the anonymous reviewers for their helpful
comments.
References
Akhtar, N. (2007). Indexing asian names. The Indexer, 25(4), C3:1214.
Aksnes, D.W. (2008). When different persons have an identical author
name: How frequent are homonyms? Journal of the American Society for
Information Science and Technology, 59(5), 838841.
Anderson, J.D., & Carballo, J.P. (2001). The nature of indexing: How
humans and machines analyze messages and texts for retrieval.
Part II: Machine indexing, and the allocation of human versus
machine effort. Information Processing and Management, 37(2), 255
277.
Binongo, J.G.N. (2003). Who wrote the 15th Book of Oz? An application
of multivariate analysis to authorship attribution. Chance, 16(2), 9
17.
CEAL. (2011). Volume holdings of East Asian materials in North American
institutions as of June 30, 2011. Council on East Asian Libraries (CEAL)
Statistics Database. Retrieved from http://lib.ku.edu/ceal/php/
tblview_adv/tblview_adv3.php
Chai, M.H. (2009). Order and format of the Romanized Korean personal
names: Focusing on non-Korean perspectives. Korean Journal of English
Language and Linguistics, 9(4), 583607.
Chaplin, A.H., & Anderson, D. (1967). Names of persons: National usage
for entry in catalogues. Sevenoaks, Kent, UK: International Federation of
Library Associations.
Cheshire, J.A., Longley, P.A., & Singleton, A.D. (2010). The surname
regions of Great Britain. Journal of Maps, 6(1), 401409.
Dervos, D.A., Samaras, N., Evangelidis, G., Hyvrinen, J., & Asmanidis, Y.
(2006). The universal author identifier system (UAI_Sys). Proceedings of

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

the 1st International Scientific Conference, eRA: The Contribution of


Information Technology in Science, Economy, Society and Education.
Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=
10.1.1.83.6707&rep=rep1&type=pdf
Elliot, S. (2010). A survey of author name disambiguation: 2004 to 2010.
Library Philosophy and Practice, Paper 473. Retrieved from http://
digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1491&context=
libphilprac
Elsevier. (June 13, 2006). Author identifier. Retrieved from http://
www.info.sciverse.com/scopus/scopus-in-detail/tools/authoridentifier
Feitelson, D.G. (2004). On identifying name equivalences in digital
libraries. Information Research, 9(4), 117. Retrieved from http://
informationr.net/ir/9-4/paper192.html
Fuller, E.E. (1989). Variation in personal names in works represented in the
catalog. Cataloging & Classification Quarterly, 9(3), 7595.
Han, H., Giles, C.L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two
supervised learning approaches for name disambiguation in author
citations. Proceedings of the ACM/IEEE Joint Conference on Digital
Libraries (JDCL 04) (pp. 296305). New York: Association for
Computing Machinery.
Han, H., Zha, H., & Giles, C.L. (2005). Name disambiguation in author
citations using a K-way spectral clustering method. Proceedings of the
ACM/IEEE Joint Conference on Digital Libraries (JDCL 05) (pp. 334
343). New York: Association for Computing Machinery.
Hong, G.W., Kim, M.J., Lee, D.G., & Rim, H.C. (2009). A hybrid approach
to English-Korean name transliteration. Proceedings of the 2009 Named
Entities Workshop: Shared Task on Transliteration (NEWS 09) (pp.
108111). Stroudsburg, PA: Association for Computational Linguistics.
Retrieved from http://dl.acm.org/citation.cfm?id=1699733
Hu, J.J. (2000). Transactional analysis: Problems on cataloging Chinese
names. Illinois Libraries, 82(4), 251260. Retrieved from http://www.
lib.niu.edu/2000/il0004251.html
Hu, L., Lo, P., & Tam, O. (2004). Chinese name authority control in
asia: An overview. Cataloging & Classification Quarterly, 39(1), 465
488.
Indexing personal names 14. (20062008). Centrepiece 14 [Special
section]. The Indexer 25(24), 26(2). Retrieved from http://www.
theindexer.org/index.php?option=com_content&task=%20view&id=
85&Itemid=56
Kim, H.S. (2001). Unity and consistency in the Romanization of Korean
personal names. Korean Journal of English Language and Linguistics,
1(3), 417435.
Kim, J.W. (2008). Korean orthography of loanwords and Romanization of
Korean from the viewpoint of translation. The Journal of Translation
Studies, 9(2), 6793.
Kim, S. (2006). Romanization in cataloging of Korean materials. Cataloging & Classification Quarterly, 43(2), 5376.
Kim, S.W., & Kim, J.W. (2012). A research on the format for Romanization
of Korean personal name. Journal of Information Management, 43(2),
199222.
Korea, Ministry of Culture, Sports and Tourism. (2000). Romanization of
Korean. Retrieved from http://www.korean.go.kr/eng/roman/roman.jsp
Korea, Statistics Korea. (2010). Results of the 2010 population and housing
census. Retrieved from http://kosis.kr/
Lee, S.U., Song, C., & Fourser, R.J. (2011). A research on the policy for
Romanization of Korean surnames. Retrieved from http://www.prism.
go.kr/homepage/researchsearch/organ/retrieveOrganLeft.do?detail_id=
1371000-201100133&flag=organ&levelUseYn=N
Library of Congress. (2012). Library of Congress namesauthorities &
vocabularies. Retrieved from http://id.loc.gov/authorities/names.html
Loesch, M.F. (2011). The virtual international authority file. Technical
Services Quarterly, 28(2), 255256.
Mak, L. (2011). Issues of personal name authority control in a retrospective
cataloging project. Technical Services Quarterly, 28(2), 160168.

Maxwell, R.L. (2002). Maxwells guide to authority work. Chicago:


American Library Association.
McCune, G.M., & Reichauer, E.O. (1939). The Romanization of the
Korean language: Based on its phonetic structure. Seoul: YMCA Press.
Naito, E. (2004). Names of the Far East: Japanese, Chinese, and Korean
authority control. Cataloging & Classification Quarterly, 38(3/4), 251
268.
OCLC. (2008). Online catalogs: What users and librarians want. Retrieved
from http://www.oclc.org/us/en/reports/onlinecatalogs/default.htm
ORCID. (2012). Welcome to ORCID. Retrieved from http://about.
orcid.org/
Patterson, M.J. (2009, May 27). Gender scholar studies hyphenation as a
cultural practice. Focus. Retrieved from http://news.rutgers.edu/focus/
issue.2009-05-26.7980804654/article.2009-05-27.9875911567
Pereira, D.A., Ribeiro-Neto, B., Ziviani, N., Laender, A.H.F., Goncalves,
M.A., & Ferreira, A.A. (2009) Using web information for author name
disambiguation. Proceedings of the 9th ACM/IEEE-CS joint conference
on Digital libraries (JCDL 09) (pp. 4958). New York: Association for
Computing Machinery.
Riemer, J.J., & Schreur, P.E. (2012, February 14). The future of undifferentiated personal name authority records and Other Implications for PCC
Authority Work. Retrieved from http://www.loc.gov/aba/pcc/Undiff%20
Personal%20NARs%20Discussion%20Paper%20March%202012.doc
Rutt, R. (1972). About the Romanization of Korean. Korea Journal, 12(5),
2025. Retrieved from http://www.ekoreajournal.net/issue/view_pop.
htm?Idx=881
Scoville, C.L., Johnson, E.D., & McConnell, A.L. (2003). When A. Rose is
not A. Rose: The vagaries of author searching. Medical Reference Services Quarterly, 22(4), 111.
Smalheiser, N., & Torvik, V. (2009). Author name disambiguation. In B.
Cronin (Ed.), Annual Review of Information Science and Technology.
(Vol. 43, pp. 287313). Medford, NJ: Information Today.
Song, H.J., & Park, S.B. (2011). English-Korean machine transliteration
by combining statistical model and web search. In S.I. Ao, O. Castillo,
C. Douglas, D.D. Feng, & J.A. Lee (Eds.), Proceedings of the
International MultiConference of Engineers and Computer Scientists
2011 (IMECS 2011) (Vol. 1, pp. 16). Hong Kong: Newswood
Limited.
Takahashi, N. (2005). The present state and the problem of the headings of
Korean author-name authority file in NACSIS-CAT: Analysis of type
of the character of author names and author profiles on the publications
of Korea and Japan. Journal of Japan Society of Library and Information
Science, 51(1), 1524.
Thompson, R. (2006). Bilingual, bicultural, and binominal identities: Personal name investment and the imagination in the lives of Korean Americans. Journal of Language, Identity & Education, 5(3), 179208.
Thomson Reuters. (June 8, 2006). Thomson Scientific announces development of full suite of authorship tools. Retrieved from http://
www.prnewswire.com/news-releases/thomson-scientific-announcesdevelopment-of-full-suite-of-authorship-tools-56000357.html
Tillett, B.B. (2002). A virtual international authority file. Workshop on
Authority Control among Chinese, Korean and Japanese Languages
(CJK Authority 3) (pp. 117139). Tokyo: NII. Retrieved from http://
www.nii.ac.jp/publications/CJK-WS/cjk3-08a.pdf
Torvik, V.I., Weeber, M., Swanson, D.R., & Smalheiser, N.R. (2005). A
probabilistic similarity metric for Medline records: A model for author
name disambiguation. Journal of the American Society for Information
Science and Technology, 56(2), 140158.
Virtual International Authority File. (2012). VIAF. Retrieved from http://
viaf.org/
Yin, X., Han, J., & Yu, P.S. (2007). Object distinction: Distinguishing
objects with identical names by link analysis. Proceedings of the
IEEE 23rd International Conference on Data Engineering (ICDE 2007)
(pp. 12421246). New York: IEEE.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi

95

Copyright of Journal of the American Society for Information Science & Technology is the property of John
Wiley & Sons, Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without
the copyright holder's express written permission. However, users may print, download, or email articles for
individual use.

Você também pode gostar