Escolar Documentos
Profissional Documentos
Cultura Documentos
Sungwon Kim
Department of Library and Information Science, Chungnam National University, Daehak-ro 99, Yuseong-ku,
Daejeon, Korea. E-mail: sungwonk@cnu.ac.kr
Seongyun Cho
Department of Digital Media, Anyang University, 37-22 Samduk-ro, Manan-ku, Anyang, Kyunggi-do, Korea.
E-mail: scho@anyang.ac.kr
Introduction
Names clearly have immense value to their owners. Not
only do names provide a way of differentiating and perceiving individuals, they help form ones sense of identity and self
(Thompson, 2006). Due to the development of information
and telecommunication technology, academic communication has expanded and globalized in various academic fields.
While Korean researchers, among others, are increasingly
publishing their research findings in English, American
Received March 7, 2012; revised June 7, 2012; accepted July 13, 2012
2012 ASIS&T Published online 6 December 2012 in Wiley Online
Library (wileyonlinelibrary.com). DOI: 10.1002/asi.22781
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 64(1):8695, 2013
Prior Research
The ability to identify the author of a particular article or
all articles published by a particular author is a fundamental
issue in the field of information science. Although ostensibly
simple issues, these are in fact central, major problems that
have yet to be solved. Different approaches to author name
disambiguation have been developed across various fields
and can be organized into three broad types.
Within the library field, authority control is being performed in order to disambiguate author name, where
instances of one author being published under differing
names are aligned under one heading chosen as an authoritative form that is then linked to the various alternative name
forms (Hu, Lo, & Tam, 2004; Library of Congress, 2012;
Loesch, 2011; Mak, 2011; Maxwell, 2002; Naito, 2004;
Scoville, Johnson, & McConnell, 2003; Tillett, 2002).
Meanwhile, within the computer science field algorithms are
being developed to identify single authors published under
various differently written names and different authors with
identical names (Anderson & Carballo, 2001; Binongo,
2003; Feitelson, 2004; Han, Zha, & Giles, 2005). Furthermore, the academic and publishing worlds have experienced
problems in differentiating individuals and, as an alternative,
identify researchers through separate distinguishing IDs and
profiles (Dervos, Samaras, Evangelidis, Hyvrinen, &
Asmanidis, 2006; Elsevier, 2006; Thomson Reuters, 2006).
Individual organizations and publishers are attempting to
assign identification numbers, analogous to social security
numbers, to individual researchers as an alternative way to
solve issues of author name ambiguity. However, as these
efforts are being pursued independently, authors would be
assigned separate IDs from each publisher and organization
and thus the various IDs for each author could cause further
ambiguity problems. As a solution, the Open Researcher &
Contributor ID (ORCID) is working to improve efficient
academic communications by establishing a central registry
of single unique identifiers for individual researchers, a database that would be linked to other author ID schemes from
various organizations (ORCID, 2012).
Prior research regarding Korean names can be roughly
divided into the fields of library and information science,
computer science, and linguistics. Research in the field of
library and information science is primarily concerned with
the issues of authority control and indexing non-English
materials. Research regarding authority control usually
refers to the importance of collaboration and deduction in
methods of authority control (Naito, 2004; Takahashi,
2005). Naito (2004) introduced a collaborative project
between Korea, China, and Japan wherein the three countries would utilize collective authority records for authors
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
87
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
75,000.1 The number of surnames will differ between languages and regions, but Western surnames can be estimated at
several ten thousand to a million per nation. By comparison,
the Korean population census lists 286 individual surnames
(Korea, Statistics Korea, 2010). Furthermore, if not relying
on the Chinese characters (i.e., written in Korean), this is
reduced to about 180 different names (Lee, Song, & Fourser,
2011). For example, the various Chinese characters for Koo
) and Noh (
) are reduced to the single
(
Korean spelling and pronunciation of Koo ( ) and Noh
( ). Accordingly, the Anglicized transliteration of the
Korean spelling or the transcription of the Korean pronunciation would be treated as identical names and impossible
to differentiate. The numerical differences between these
surnames influence their individual distinguishability. The
most frequent surname per capita among the American and
British populations is Smith, comprising a respective
0.881% and 1.121%, or approximately 1 in every 100
people.2 The most common surname in Finland is Hansen,
forming an estimated 0.7% of the entire population (Aksnes,
2008). In comparison, the five most frequent Korean surnames are as follows: Kim 21.6%, Lee 14.8%, Park
8.5%, Choi 4.7%, and Chung 4.3%, adding up to about
54% of the entire population, while the top 37 most frequent
surnames collectively account for more than 90% of the
population (Korea, Statistics Korea, 2010). The distribution
of top 50 surnames in the Korean and American population
is diagrammed in Figure 1. Thus, we can determine that
individual differentiation between Korean personal names is
difficult when solely on the basis of surnames; individual
differentiation therefore relies on the complete conveyance of
a combined given name and surname.
1
2
http://koti.mbnet.fi/pasenka/names/about.htm
http://www.britishsurnames.co.uk/surnames/SMITH
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
89
No. Format
1
2
3
4
Gn Sn
Gn, Sn
Sn Gn
Sn, Gn
Description
Western given name-surname order
Western given name-surname order (separated by comma)
Eastern(Korean) surname-given name order
Eastern(Korean) surname-given name order (separated by
comma)
Description
1
2
3
4
GnGn
Gngn
5
6
Gn: Syllable starts with capital letter, gn: Syllable starts with lowercase.
TABLE 3.
Ranking
Format
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Total
Gngn Sn
Sn Gn Gn
Sn, Gn-Gn
Gn Gn Sn
Gn-Gn Sn
Sn, Gn Gn
GnGn Sn
Sn Gn gn
Sn Gngn
Sn Gn-Gn
Sn, Gngn
Sn, Gn-gn
Sn, GnGn
Sn Gn-gn
Sn GnGn
Gn-gn Sn
Gn gn Sn
Sn, Gn gn
Gn Gn, Sn
Gn-Gn, Sn
Gngn, Sn
GnGn, Sn
Gn-gn, Sn
Gn gn, Sn
Format
Proportion
no.
Frequency
(%)
4
17
19
5
1
23
3
18
16
13
22
20
21
14
15
2
6
24
11
7
10
9
8
12
44,720
32,360
30,543
14,084
13,782
9,865
7,087
6,815
5,493
5,236
4,786
4,354
2,790
2,430
2,336
2,153
1,566
686
401
345
217
148
107
32
192,336
23.25
16.82
15.88
7.32
7.17
5.13
3.68
3.54
2.86
2.72
2.49
2.26
1.45
1.26
1.21
1.12
0.81
0.36
0.21
0.18
0.11
0.08
0.06
0.02
100.00
Example
Sungwon Kim
Kim Sung Won
Kim, Sung-Won
Sung Won Kim
Sung-Won Kim
Kim, Sung Won
SungWon Kim
Kim Sung won
Kim Sungwon
Kim Sung-Won
Kim, Sungwon
Kim, Sung-won
Kim, SungWon
Kim Sung-won
Kim SungWon
Sung-won Kim
Sung won Kim
Kim, Sung won
Sung Won, Kim
Sung-Won, Kim
Sungwon, Kim
SungWon, Kim
Sung-won, Kim
Sung won, Kim
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
FIG. 3. Flow chart for classifying Anglicized name format. (Color figure
can be viewed in the online issue, which is available at
wileyonlinelibrary.com.)
FIG. 2. Sample data. (Color figure can be viewed in the online issue,
which is available at wileyonlinelibrary.com.)
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
91
FIG. 4.
TABLE 4.
Sample results. (Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.)
Number of **
Pair
no.
1
2
3
4
Format
no.
Format*
6
18
5
17
10
22
9
21
Gn gn Sn
Sn Gn gn
Gn Gn Sn
Sn Gn Gn
Gngn, Sn
Sn, Gngn
GnGn, Sn
Sn, GnGn
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
2
2
2
2
1
1
1
1
2
2
3
3
2
2
3
3
N
N
N
N
N
N
N
N
N
N
N
N
1
1
1
1
1
1
1
1
2
2
2
2
*Sn: Surname, Gn: given name, syllable start with capital letter, gn: given name, syllable starts with lowercase.
**B: blank, C: comma, H: hyphen, U: uppercase letter, N: no appearance, 1/2/3: appearing order of components.
order and component style of this format are fully congruent with typical Western naming formats. The second most
frequently used format is Format 17, Sn Gn Gn, which
16.82% of researchers used. The name order of this format
is identical to the order of Korean names, while the two
syllables of the given name are separated, as is commonly
practiced in Korean naming patterns. This format could
lead to confusion over surname placement when read by
non-Koreans, while the first syllable of the given name
could be misread as a middle name. The third most frequently used format is Format 19, Sn, Gn-Gn, used by
15.88% of researchers. Although the name order of this
format is ostensibly identical to the typical Korean order,
the inclusion of a comma to imply inversion indicates congruence with the Western order. In terms of the given name
format, the two syllables of the given name are connected
with a hyphen, indicating differentiation between the syllables. Thus this format reflects aspects of both Korean and
Western naming formats. These three formats comprise
55.95% of the analyzed data, whereas the remaining
formats were each utilized by less than 10% of the surveyed researchers. Table 3 demonstrates the analyzed data
results and the format distribution of the collected and
verified data.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
TABLE 5.
TABLE 7.
Name
order
Frequency
Proportion (%)
Notes
Gn Sn
Gn, Sn
Sn Gn
Sn, Gn
Total
83,392
1,250
54,670
53,024
192,236
43.36
0.65
28.42
27.57
100
Format 16
Format 712
Format 1318
Format 1924
Second syllable
Proportion
of given name Frequency
(%)
Capitalized
118,977
61.86
Un-capitalized
73,359
38.14
Total
192,336
100
Notes
Format 1, 3, 5, 7, 9, 11, 13, 15,
17, 19, 21, 23
Format 2, 4, 6, 8, 10, 12, 14, 16,
18, 20, 22, 24
In all, 35.13% of Anglicized names format the two syllables of the given name component by attaching them,
whereas 34.22% separate the syllables and 30.65% hyphenate the two. Evidently, the three syllabic formats of the given
name are close to equal in usage. Although attaching the two
syllables conforms with Western naming conventions, cases
with syllables spaced or hyphenated in accordance with the
Korean order comprised 64.87% of the researchers. These
ratios are organized in Table 6.
TABLE 6.
Format of
given name
Gn-Gn
Gn-gn
GnGn
Gngn
Gn Gn
Gn gn
Total
Frequency
Proportion
(%)
49,906
9,044
12,361
55,216
56,710
9,099
192,336
25.95
4.70
6.43
28.71
29.48
4.73
100
Notes
Frequency
Proportion
(%)
Format 1,7,13,19
Format 2,8,14,20
Format 3,9,15,21
Format 4,10,16,22
Format 5,11,17,23
Format 6,12,18,24
58,950
30.65
67,577
35.13
65,809
34.22
192,336
100
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
93
FIG. 5. Examples of personal name authority records from VIAF (Virtual International Authority File). (Color figure can be viewed in the online issue,
which is available at wileyonlinelibrary.com.)
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJanuary 2013
DOI: 10.1002/asi
95
Copyright of Journal of the American Society for Information Science & Technology is the property of John
Wiley & Sons, Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without
the copyright holder's express written permission. However, users may print, download, or email articles for
individual use.