tmp13D3 TMP

Chapter 8
Mapping Science
A critical part of a scientific activity is to discern how a new idea is related to what
we know and what may become possible. As the number of new scientific publications arrives at a rate that rapidly outpaces our capacity of reading, analyzing, and
synthesizing scientific knowledge, we need to augment ourselves with information
that can guide us through the rapidly growing intellectual space effectively. In this
chapter, we address some fundamental issues concerning with what information
may serve as early signs of potentially valuable ideas. In particular, we are interested
in information that is routinely available and derivable upon the publication of a
scientific paper without assuming the availability of additional information such as
its usage and citations.
8.1 System Perturbation and Structural Variation

Many phenomena in the world share essential properties of a complex adaptive
system (CAS). Complex adaptive systems are a special type of complex system.
The study of CAS focuses on complex, emergent, and macroscopic properties of
the system. John H. Holland defines a CAS as a system that has a large number of
components that interact, adapt, or learn. These components are often called agents.
The most important properties of a CAS are concerned with a large population
of agents, non-linear and dynamic interactions between agents, open and blurred
boundaries, a constant flow of energy to maintain its organization, and autonomous
agents, and self-organizing mechanisms such as feedback.
In this chapter, we introduce a conceptualization of science as a complex adaptive
system and propose a theory that may have the potential of identifying early signs
of transformative ideas in science. We will demonstrate how the CAS perspective
can be used to detect information that triggers transformative and holistic changes
to the system.
C. Chen, Mapping Scientific Frontiers: The Quest for Knowledge Visualization,

DOI 10.1007/978-1-4471-5128-9 8, Springer-Verlag London 2013
259
260
8 Mapping Science
8.1.1 Early Signs

Detecting early signs of potentially valuable ideas has theoretical and practical
implications. For instance, peer reviews of new manuscripts and new grant proposals
are under a growing pressure of accountability for safeguarding the integrity of
scientific knowledge and optimizing the allocation of limited resources (Chubin
1994; Chubin and Hackett 1990; Hayrynen 2007; Hettich and Pazzani 2006).
Long-term strategic science and technology policies require visionary thinking and
evidence-based foresights into the future (Cuhls 2001; Martin 2010; Miles 2010). In
foresight exercises on identifying future technology, experts opinions were found
to be overly optimistic on hindsight (Tichy 2004). The increasing specialization
in todays scientific community makes it unrealistic to expect an expert to have a
comprehensive body of knowledge concerning multiple key aspects of a subject
matter, especially in interdisciplinary research areas.
The value, or perceived value, of an idea can be quantified in many ways. For
example, the value of a good idea can be measured by the number of peoples
life it has saved, the number of jobs it has created, or the amount of revenue it
has generated. In the intellectual world, the value of a good idea can be measured
by the number of other ideas it has inspired or the amount of attention it has
drawn. In this chapter, we are concerned with identifying patterns and properties of
information that can tell us something about the potential values of ideas expressed
and embodied in scientific publications. A citation count of a scientific publication is
the number of times other scientific publications have referenced to the publication.
Using citations to guide the search for relevant scientific ideas by way of association,
known as citation indexing, was pioneered by Eugene Garfield in the 1950s
(Garfield 1955). It is a general consensus that citation behavior can be motivated
by both scientific and non-scientific reasons (Bornmann and Daniel 2006). Citation
counts have been used as an indicator of intellectual impact on subsequent research.
There have been debates over the nature of citations and whether positive, negative,
and self-citations should all be treated equally. Nevertheless, even a negative citation
makes it clear that the referenced work cannot be simply ignored.
Researchers have searched for other clues that may inform us about the potential
impact of a newly published scientific paper, especially clues that can be readily
extracted from routinely available information at the time of publication instead
of waiting for download and citation patterns to build up over time. Factors such
as track record of authors, the prestige of authors institutions, the prestige of
the journal in which an article is published are among the most promising ones
that can provide an assurance of the quality of the article to an extent (Boyack
et al. 2005; Hirsch 2007; Kostoff 2007; van Dalen and Kenkens 2005; Walters
2006). The common assumption central to approaches in this category is that great
researchers tend to continuously deliver great work and, along a similar vein, an
article published in a high impact journal is also likely to be of high quality itself.
On the one hand, these approaches avoid the reliance on data that may not be readily
available upon the publication of an article and thus free analysts from constraints
261
due to the lack of download and citation data. On the other hand, the sources of
information used in these approaches are indirect to the new ideas reported in
scientific publications. In an analogy, we give credits to an individual based on
his/her credit history instead of assessing the risk of the current transaction directly.
In such approaches, we will not be able to know where precisely the novelty of an
idea is coming from. We will not be able to know whether similar ideas have been
proposed in the past.
Many studies have addressed factors that could explain or even predict future
citations of a scientific publication (Aksnes 2003; Hirsch 2007; Levitt and Thelwall
2008; Persson 2010). For example, is a papers citation count last year a good
predictor for new citations this year? Are the download times a good predictor
of citations? Is it true that the more references a paper cites, the more citations
it will receive later on? Similarly, the potential role of prestige, or the Matthew
Effect coined by Robert Merton, has been commonly investigated, ranging from
the prestige of authors to the prestige of journals in which articles are published
(Dewett and Denisi 2004). However, many of these factors are loosely and indirectly
coupled with the conceptual and semantic nature of the underlying subject matter of
concern. We refer them as extrinsic factors. In contrast, intrinsic factors have direct
and profound connections with the intellectual content and structure. One example
of intrinsic factor is concerned with the structural variation of a field of study.
A notable example is the work by Swanson on linking previously disjoint bodies
of knowledge, such as the connection between fish oil and Reynauds syndrome
(Swanson 1986a).
Researchers have made various attempts to characterize future citations and
identify emerging core articles (Shibata et al. 2007; Walters 2006). Shibata et al.
for example, studied citation networks in two subject areas, Gallium Nitride and
Complex Networks, and found that while past citations are a good predictor of nearfuture citations, the betweenness centrality is correlated with citations in a longer
term.
Upham et al. (2010) studied the role of cohesive intellectual communities
schools of thoughts in promoting and constraining knowledge creation. They
analyzed publications on management and concluded that it is significantly beneficial for new knowledge to be a part of a school of thought and the most influential
position within a school of thought is in the semi-periphery of the school. In
particular, boundary-spanning research positioned at the semi-periphery of a school
would attract attention from other schools of thought and receive the most citations
overall. Their study used a zero-inflated negative binomial regression (ZINB).
Negative binomial regression models have been used to predict the expected mean
patent citations (Fleming and Bromiley 2000). Hsieh (2011) studied inventions as
a combination of technological features. In particular, the closeness of features
plays an interesting role. Neither overly related nor loosely related features are
good candidates for new inventions. Useful inventions arise with rightly positioned
features where the cost of synthesis is minimized.
Takeda and Kjikawa (2010) reported three stages of clustering in citation
networks. In the first stage, core clusters are formed, followed by the formation
262
8 Mapping Science
of peripheral clusters and the continuous growth of the core clusters. Finally,
the core clusters growth becomes predominant again. Buter et al. (2011) studied
the emergence of an interdisciplinary research area from fields that did not show
interdisciplinary connections before. They used journal subject categories as a proxy
for fields and citations as a measure of interdisciplinary connection.
Lahiri et al. addressed how structural changes of a network may influence the
spread of information over the network (Lahiri et al. 2008). Although they did not
study bibliographic networks per se, their study indicates predictions made about
how information spreads over a network are sensitive to structural changes of the
network. This observation underlines the importance of taking structural change into
account in the development of metrics based on topological properties of networks.
Leydesdorff (2001) raised questions (p. 146) that are closely related to what we
are addressing: How does the new text link up to the literature, and what is its
impact on the network of previously existing relations? He took a quite different
approach and analyzed word occurrences in scientific papers from an informationtheoretic perspective. In his approach, the publication of a paper is perceived as an
event that may lead to the reduction of uncertainty involved in the current state of
knowledge. He devised diagrams that depict pathways of how a particular paper
improves the efficiency of communication. Although the information-theoretic
approach and our structural variation approach currently operate on different units of
analysis with distinct theoretical underpinnings, both share the fundamental concern
of changes introduced by newly published scientific papers on the existing body of
knowledge.
As shown above, many studies in the literature have addressed factors that may
influence citations. The value of our work is the introduction of the structural
variation paradigm along with computational metrics that can be integrated into interactive exploration systems to better understand precisely the impact of individual
links made by a new article.
8.1.2 A Structural Variation Model

There is a recurring theme from a diverse body of work on creativity. A major
form of creative work is to bridge previously disjoint bodies of knowledge. Notable
studies include the work of Ronald S. Burt in sociology (Burt 2004), Donald
Swanson in information science (Swanson 1986a), and conceptual blending as a
theoretical framework for exploring human information integration (Fauconnier and
Turner 1998). We have been developing an explanatory and computational theory
of transformative discovery based on criteria derived from structural and temporal
properties (Chen 2011; Chen et al. 2009).
In the history of science, there are many examples of how new theories revolutionized the contemporary knowledge structure. For example, the 2005 Nobel Prize
in medicine was awarded to the discovery of Helicobacter pylori, a bacterium which
was not believed to be possible to find in humans gastric system (Chen et al. 2009).
263
Fig. 8.1 An overview of the structural variation model
In literature-based discovery, Swanson discovered previously unnoticed linkage

between fish oil and Reynauds syndrome (Swanson 1986a). In terrorism research,
before the September 11 terrorist attacks, it was widely believed that only those
who directly witness a traumatic scene or directly experience a trauma could have
the risk of post-traumatic stress disorder (PTSD); however, later research had shown
that people may develop PTSD syndromes even by simply watching the coverage of
a traumatic scene on TV (Chen 2006). In drug discovery, one of the major challenges
is to find new compound structures effectively in the vast chemical space that satisfy
an array of constraints (Lipinski and Hopkins 2004). In mapping scientific frontiers
(Chen 2003) and studies in science of science (Price 1965), it would be particularly
valuable if scientists, funding agencies, and policy makers can have tools that may
assist them to assess the novelty of ideas in terms of their conceptual distance from
the contemporary domain knowledge. In these and many more scenarios, a common
challenge for coping with a constantly changing environment is to estimate the
extent to which the structure of a network should be updated in respond to newly
available information (Fig. 8.1).
264
8 Mapping Science
The basic assumption in the structural variation approach is that the extent
of a departure from the current intellectual structure is a necessary condition
for a potentially transformative idea in science. In other words, a potentially
transformative idea needs to bring changes to the existing structure of knowledge
in the first place. In order to measure the degree of structural variation introduced
by a scientific article, the intellectual structure at a particular moment of time needs
to be represented in such a way that structural changes can be computationally
detected and manually verifiable. Bibliographic networks can be computationally
derived from scientific publications. Research in scientometrics and citation analysis
routinely uses citation and co-citation networks as a proxy of the underlying
intellectual structure. Here we will focus on using several types of co-citation and
co-occurrence networks as the representation of a baseline network.
A network represents how a set of entities are connected. Entities are represented
as nodes, or vertices, in the network. Their connections are represented as links, or
edges. Relevant entities in our context include several types of information that can
be computationally extracted from a scientific article, such as references cited by the
article, authors and their affiliations, the journal in which the article is published, and
keywords in the article. We will limit our discussions to networks that are formed
with a single type of entities, although networks of multiple types of entities are
worth considering once we establish a basic understanding of structural variations
in networks of a single type of entities.
Once the type of entities is chosen, the nature of the interconnectivity between
entities is to be specified to form a network. Networks of co-occurring entities
represent a wide variety of types of connectivity. A network of co-occurring words
represents how words are related in terms of whether and how often they appear
in the vicinity of each other. Co-citation networks of entities such as references,
authors, and journals can be seen as a special case of co-occurring networks. For
example, co-citation networks of references are networks of references that appear
together in the bodies of scientific papers these references are co-cited.
Networks of co-cited references represent more specific information than networks of co-cited authors because references of different articles by the same author
would be lumped together in a network of co-cited authors. Similarly, networks
of co-cited references are more specific than networks of co-cited journals. We
refer such differences in specificity as the granularity of networks. Measurements
of structural variation need to take the granularity factor into account because it is
reasonable to expect that networks at different levels of granularity would lead to
different measures of structural variations.
Another decision to be made about a baseline network is a sampling issue. Taking
a particular year as a standing point to look at in the past, how far back should we
consider in the construction of a baseline network that would adequately represent
the underlying intellectual structure? Does the network become more accurate if
we go back more into the past? Will it be more efficient if we limit it to the most
recent years that really matter the most? Given articles published in a particular year
Y, the baseline network represents the intellectual structure using information from
articles published up to year Y1. Two types of baseline networks are investigated
265
here: ones using a moving window of a fixed size [Yk, Y1] and ones using the
entire history (Yo , Y1], where Yo is the earliest year of publication for records in
the given dataset.
8.1.3 Structural Variation Metrics

We expect that the degree of structural variation introduced by a new article can
offer prospective information because of the boundary spanning mechanism. If an
article introduces novel links that span the boundaries of different topics, then we
expect this signifies its potential in taking the intellectual structure for a new turn.
Given a baseline network, structural variations can be measured based on
information provided by a particular article. We will introduce three metrics of
structural variation. Each metric quantifies the degree of change in the baseline
network introduced by information provided by an article. No usage data is involved
in the measurement. The three metrics are modularity change rate, inter-cluster
linkage, and centrality divergence. The definitions of the first two metrics depend
on a partition of the baseline network, but the third one does not. A partition
of a network decomposes the network into non-overlapping groups of nodes. For
example, clustering algorithms such as spectral clustering can be used to partition a
network.
The theoretical underpinning of the structural variation is that scientific discoveries, at least a subset of them, can be explained in terms of boundary spanning,
brokerage, and synthesis mechanisms in an intellectual space (Chen et al. 2009).
This conceptualization generalizes the principle of literature-based discovery pioneered by Swanson (1986a, b), which assumes that connections between previously
disparate bodies of knowledge are potentially valuable. In Swansons famous ABC
model, the relationships AB and BC are known in the literature. The potential
relationship AC becomes a candidate that is subject to further scientific investigation
(Weeber 2003). Our conceptualization is more generic in several ways. First,
in the ABC model, the AC relation changes an indirect connection to a direct
connection, whereas our structural variation model makes no assumption about
any prior relations at all. Second, in the ABC model, the scope of consideration is
limited to relationships involving three entities. In contrast, our structural variation
model takes a wider context into consideration and addresses the novelty of a
connection that links groups of entities as well as connections linking individual
entities. Because of the broadened scope of consideration, it becomes possible to
search for candidate connections more effectively. In other words, given a set of
entities, the size of the search space of potential connections can be substantially
reduced if additional constraints are applicable for the selection of candidate
connections. For example, the structural hole theory developed in social network
analysis emphasizes the special potential of nodes that are strategically positioned
to form brokerage, or boundary spanning, links and create good ideas (Burt 2004;
Chen et al. 2009).
266
8 Mapping Science
8.1.3.1 Modularity Change Rate (MCR)

Given a partition of a network, i.e. a configuration of clusters, the modularity of
the network measures the degree of interconnectivity among the groups of nodes
identified by the partition. If different clusters are loosely connected, then the
overall modularity would be high. In contrast, if clusters are interwoven, then
the modularity would be low. We follow Newmans algorithm (Newman 2006)
to calculate the modularity with reference to a cluster configuration generated by
spectral clustering (Chen et al. 2010; von Luxburg 2006). Suppose the network G is
partitioned by a partition C into k clusters such that G D c1 C c2 C : : : C ck , Q(G)
is defined as follows, where m is the total number of edges in the network G. n is
the number of nodes in G. (ci , cj ) is known as the Kroneckers delta. It is 1 if nodes
ni and nj belong to the same cluster and 0 otherwise. deg(ni) is the degree of node
ni . The range of Q(G) is between 1 and 1.
!
n

deg .ni / deg nj
1 X
Q.G; C/ D
ci ; cj Aij
2m i;j D0
2m
The modularity of a network is a measure of the overall structure of the network.
Its range is between 1 and 1. The Modularity Change Rate of a scientific paper
measures the relative structural change due to the information from the published
paper with reference to a baseline network. For each article a, and a baseline network
Gbaseline , we define the Modularity Change Rate (MCR) as follows:
M CR.a/ D
Q .Gbaseli ne ; C / Q .Gbaseli ne Ga ; C /
100
Q .Gbaseli ne ; C /
where Gbaseline Ga is the updated baseline network by information from the article
a. For example, suppose reference nodes ni and nj are not connected in a baseline
network of co-cited references but they are co-cited by article a, a new link between
ni and nj will be added to the baseline network. In this way, the article changes the
structure of the baseline network.
Intuitively, adding a new link anywhere in a network should not increase the
modularity of the network. It should either reduce it or leave it intact. However, the
change of modularity is not a monotonic function as we initially expect. In fact, it
depends on where the new link is added and how the network is structured. Adding
a link may reduce the proportion of the modularity in some clusters, but it may
increase the modularity in other clusters in the network. Thus, the overall modularity
change is not monotonic.
Without losing any generality, assume that an article adds one link at a time
to a given baseline network. If the new link connects two distinct clusters, then
it has no effect on the corresponding term in the updated modularity because by
definition ij D 0 and the corresponding term becomes 0. Such a link is illustrated
267
Fig. 8.2 Scenarios that may increase or decrease individual terms in the modularity metric
by the dashed link e5,10 in the top diagram in Fig. 8.2. The new link eij will increase
the degree of nodes i and j by one, i.e. deg(i) will become deg(i) C 1. The total
number of edges m will increase to m C 1. A simple calculation at the bottom
of Fig. 8.2 shows that terms in the modularity formula involving blue links will
decrease from their previous values. However, if the network has clusters such as
CA with no changes in node degrees, then the corresponding values of terms of
lines in red will increase from their previous values as the denominator increases
from 2 m to 2(m C 1). In summary, the updated modularity may increase as well
as decrease, depending on the structure of the network and where the new link
is added. With this particular definition of modularity, between-cluster links are
always associated with a zero valued term in the overall modularity formula due
to the Kroneckers delta. What we see in the change of modularity is a combination
of results from several scenarios that are indirectly affected by the newly added link.
We will introduce our next metric to reflect the changes in terms of between-cluster
links directly.
268
8 Mapping Science
8.1.3.2 Cluster Linkage (CL)

The Cluster Linkage (CL) measures the overall structural change introduced by an
article a in terms of new connections added between clusters. Its definition assumes
a partition of the network. We introduce a function of edges (ci ,cj ) which is the
opposite of ij used in the modularity definition. The value of ij is 1 for an edge
across distinct clusters ci and cj . It will be 0 for edges within a cluster. ij will allow
us to concentrate on between-cluster links and ignore within-cluster links, which is
the opposite of how the modularity metric is defined. The new metric Linkage is the
sum of all the weights of between-cluster links eij divided by K the total number
of clusters in the network. Linking to itself is not allowed, i.e. we assume eii D 0
for all nodes. Using link weights makes the metric sensitive to links that strengthen
connections between clusters in addition to novel links that make unprecedented
connections between clusters.
It is possible to take into account the size of clusters that a link is connecting so that connections between larger-sized clusters become more prominent
For example, one option is to multiple each eij by
q in the measurement.

si ze .ci / si ze cj = max .si ze .ck //. Here we define the metric without such
modifications for the sake of simplicity. Suppose C is a partition of G, the Linkage
metric is defined as follows:
Pn
i j ij eij
Li nkage.G; C / D
K

0; ni 2 cj
ij D
1; ni cj
The Cluster Linkage is defined as the difference of Linkage before and after new
between-clusters links added by an article a.
CL.a/DLinkage.a/DLinkage .Gbaseline Ga ; C / Linkage.Gbaseline ; C /
Linkage(G C G) is always greater than or equal to Linkage(G). Thus, CL is
non-negative.
8.1.3.3 Centrality Divergence (CKL )

The Centrality Divergence metric measures the structural variation caused by an
article a in terms of the divergence of the distribution of betweenness centrality
CB (vi ) of nodes vi in the baseline network. This definition does not involve any
partitions of the network. If n is the total number of nodes. The degree of structural
change CKL (G, a) can be defined in terms of the K-L divergence.
CKL .Gbaseli ne ; a/ D
269
n
X

pi log
i D0
pi
qi
pi D CB .vi ; Gbaseli ne /

qi D CB vi ; Gupdat ed
For nodes where pi D 0 or qi D 0, we reset them as a small number 106 to avoid
log(0).
8.1.4 Statistical Models

We constructed negative binomial (NB) and zero-inflated negative binomial (ZINB)
models to validate the role of structural variation in predicting future citation counts
of scientific publications. The negative binomial distribution is generated by a
sequence of independent Bernoulli trials. Each trial is either a success with a
probability of p or a failure with a probability of (1p). Here the terminology
of success and failure in this context does not necessarily represent any practical
preferences. The random number of successes X before encountering a predefined
number of failures r has a negative binomial distribution:
X NB .r; p/
One can adapt this definition to describe a wide variety of count events. Citation
counts belong to a type of count events with an over-dispersion, i.e. the variance is
greater than the mean. NB models are commonly used in the literature to study this
type of count events. Two types of dispersion parameters are used in the literature,
and , where D 1.
Zero-inflated count models are commonly used to account for excessive zero
counts (Hilbe 2011; Lambert 1992). Zero-inflated models include two sources of
zero citations: the point mass at zero If0g (y) and the count component with a count
distribution fcount (counts) such as negative binomial or Poisson (Zeileis et al. 2011).
The probability of observing a zero count is inflated with probability D fzero (zero
citations).
fzeroinflated .citations/ D If0g .citations/ C .1 / fcount .citations/
ZINB models are increasingly used in the literature to model excessive occurrences of zero citations (Fleming and Bromiley 2000; Upham et al. 2010). The report
of a ZINB model consists of two parts: the count model and the zero-inflated model.
One way to test whether a ZINB model is superior to a corresponding NB model is
known as the Vuong test. The Vuong test is designed to test the null hypothesis that
the two models are indistinguishable. Akaikes Information Criterion (AIC) is also
commonly used to evaluate the goodness of a model. Models with lower AIC scores
are regarded as better models.
270
8 Mapping Science
We illustrate the model using global citation counts of scientific publications

recorded in the Web of Science. NB models are defined as follows using log as the
link function.
Global citations Coauthors C Modularity Change Rate C Cluster Linkage C
Centrality Divergence C References C Pages
Global citations is the dependent variable. Coauthors is a factor of three levels of
1, 2, and 3. Level 3 is assigned to articles with three or more coauthors. Coauthors is
an indirect indicator of the extent to which an article synthesizes ideas from different
areas of expertise represented by each coauthor.
Three structural variation metrics are included as co-variants in generalized
linear models, namely Modularity Change Rate (MCR), Cluster Linkage (CL), and
Centrality Divergence (CKL ). According to our theory of creativity, groundbreaking
ideas are expected to cause strong structural variations. If global citation counts
provide a reasonable proxy of recognitions of intellectual contributions in a
scientific community, we would expect that at least some of the structural variation
metrics will have statistically significant main effects on global citations.
The number of cited references and the number of pages are commonly reported
in the literature as good predictors of citations. In order to compare the effects of
structural variation with these commonly reported extrinsic properties of scientific
publications, References and Pages are included in the models. Our theory offers
a simpler explanation why the more references a paper cites, the more citations it
appears to get. Due to the boundary spanning synthetic mechanism, an article needs
to explain multiple parts and how they can be innovatively connected. This process
will result in citing more references than an article that covers a narrower range of
topics. Review papers by their nature belong to this category.
It is known that articles published earlier tend to have more citations than articles
published later. The exposure time of an article is included in the NB models in
terms of a logarithmically transformed year of publication of an article.
An intuitive way to interpret coefficients in NB models is to use incidence rate
ratios (IRRs) estimated by the models. For example, if Coauthors has an IRR of
1.5, it means that as the number of coauthors increases by one the global citation
counts would be expected to increase a factor of 1.5, i.e. increasing 1.5 times, while
holding other variables in the model constant. In our models, we will particularly
examine statistically significant IRRs of structural variation models.
Zero-inflated negative binomial models (ZINB) use the same set of variables.
The count model of ZINB is identical to the NB model described above. The zeroinflated model of ZINB uses the same set of variables to predict the excessive zeros.
We found little in the literature about good predictors of zeros in a comparable
context. We choose to include all the six variables in the zero-inflated model to
provide a broader view of the zero-generating process. ZINBs are defined as follows:
Global citations Coauthors C Modularity Change Rate C Cluster Linkage C
Zero citations Coauthors C Modularity Change Rate C Cluster Linkage C
271
Fig. 8.3 The structure of the system before the publication of the ground breaking paper by Watts
Fig. 8.4 The structure of the system after the publication of Watts 1998
8.1.5 Complex Network Analysis (19962004)

Figures 8.3 and 8.4 illustrate how the system adapts to the publication of the
groundbreaking paper by Watts98. The network was derived from 5,135 articles
published on small-world networks between 1990 and 2010. The network of 205
references and 1,164 co-citation links is divided into 12 clusters with a modularity
of 0.6537 and the mean silhouette of 0.811. The red lines are made by the
top-15 articles measured by the centrality variation rate. Only major clusters
labels are shown in the figure. Dashed lines in red are novel connections made
272
8 Mapping Science
by (Watts and Strogatz 1998) at the time of its publication. The article has the
highest scores in Cluster Linkage and CKL scores, 5.43 and 1.14, respectively. The
figure offers a visual confirmation that the article was indeed making boundaryspanning connections. Recall that the data set was constructed by expanding the
seed article based on forward citation links. These boundary-spanning links provide
empirical evidence that the groundbreaking paper was connecting two groups of
clusters. The emergence of Cluster #8 complex network was the consequence of the
impact.
Table 8.1 summarizes the results of five NB regression models with different
types of networks. They have an average dispersion parameter of 0.5270, which
is equivalent to an alpha of 1.8975. Coauthors has an average IRR of 1.3278.
References has an average IRR of 1.0126. Pages has an average IRR of 0.9714.
The effects of the three variables are consistent and stable across the five types of
networks. In contrast, the effects of structural variations are less stable. On the other
hand, structural variations appear to have a stronger impact on global citations than
other more commonly studied measures such as Coauthors and References. For
example, CL has an IRR of 3.160 in networks of co-cited references and an IRR of
1.33 108 in networks of noun phrases. IRRs that are greater than 1.0 predict an
increase of global citations.
We have found statistical evidence of the boundary-spanning mechanism. An
article that introduces novel connections between clusters of co-cited references is
likely to become highly cited subsequently. In addition, we have found that the
IRRs of Cluster Linkage are more than twice as much as the IRRs of Coauthors
and References. This finding provides a more fundamental explanation of why the
number of references cited by an article appears to be a good predictor of its future
citations as found in many previous studies. As a result, the structural variation
paradigm clarifies why a number of extrinsic features appear to be associated with
high citations.
A distinct characteristic of the structural variation approach is the focus on the
potential connection between the degree of structural variation introduced by an
article and its future impact. The analytic and modeling procedure demonstrated
here is expected to serve as an exemplar for subsequent studies along this line of
research. More importantly, the focus on the underlying mechanisms of scientific
activity is expected to provide additional insights and practical guidance for
scientists, sociologists, historians, and philosophers of scientific knowledge.
There are many new challenges and opportunities ahead. For example, how
common is the boundary-spanning mechanism in scientific discoveries overall?
What are the other major mechanisms and how do they interact with the boundaryspanning mechanism? There are other potentially valuable techniques that we have
not utilized in the present study, including topic modeling, citation context analysis,
survival analysis and burst detection. In short, a lot of work is to be done and this is
an encouraging start.
Figure 8.5 shows that the structural variation approach is applied to the study
of the potential of patents. The patent US6537746 is ranked high on the structural
variation scale. Its position is marked by a star. The areas where the patent made
0.000
0.305
0.000
0.665
0.000
0.000
0.5282
29,506
29,522
1.359
1.055
2.879
23.400
1.012
0.973
0.000
0.276
0.000
0.000
0.000
0.000
Author
Co-citation
log2 (Year)
3,271
0.5375
29,613
29,629
1.350
1.060
1.204
7.620
1.012
0.972
0.000
0.180
0.049
0.000
0.000
0.000
Journal
Co-citation
log2 (Year)
3,271
References involves the least amount of ambiguity with the finest granularity, whereas the other four types of units introduce ambiguity
at various levels
Models constructed with units of higher ambiguity are slightly improved in terms of Akaikes Information Criterion (AIC)
0.5150
29,491
29,508
0.5284
31,771
31,787
Dispersion parameter ()
2 log-likelihood
Akaikes Information Criterion (AIC)
0.5258
28,331
28,347
Incidence Rate Ratios (IRRs) in NB models

1.306 0.000 1.298 0.000 1.326
1.083 0.025 1.038 0.086 1.047
3.160 0.000 0.205 0.095 1.33 108
0.343 0.184 3.679 0.023 1.534
1.013 0.000 1.013 0.000 1.013
0.970 0.000 0.971 0.000 0.971
Noun phrase
Co-occurrence
log2 (Year)
3,254
Global citations
Coauthors
Modularity change rate
Weighted cluster linkage
Centrality divergence
Number of references
Number of pages
Keyword
Co-occurrence
log2 (Year)
3,072
Reference
Co-citation
log2 (Year)
3,515
Unit of analysis
Relation
Offset (exposure)
Number of citing articles
Data Source: Complex Network Analysis (19962004), top 100 records per time slice, 2-year sliding window
Table 8.1 Negative binomial regression models (NBs) of Complex Network Analysis (19962004) at five different levels of granularity
of units of analysis

273
274
8 Mapping Science
Fig. 8.5 The structural variation method is applied to a set of patents related to cancer research.
The star marks the position of a patent (US6537746). The red lines show where the boundaryspanning connections were made by the patent. Interestingly, the impacted clusters are about
recombination
boundary-spanning links are clusters #88 and #83, both labeled as recombination.
The map shows that multiple streams of innovation have moved away from the
course of older streams.
We conclude that structural variation is an essential aspect of the development of
scientific knowledge and it has the potential to reveal the underlying mechanisms
of the growth of scientific knowledge. The focus on the underlying mechanisms
of knowledge creation is the key to the predictive potential of the structural
variation approach. The theory-driven explanatory and computational approach sets
an extensible framework for detecting and tracking potentially creative ideas and
gaining insights into challenges and opportunities in light of the collective wisdom.
8.2 Regenerative Medicine

The Nobel Prize in Physiology or Medicine 2012 was announced on October 8,
2012. The award was shared by Sir John B. Gurdon and Shinya Yamanaka for
the discovery that mature cells can be reprogrammed to become pluripotent. The
potential of a cell to differentiate into different cell types is known as the potency
275
of the cell. Simply speaking, a differentiation process refers to how a cell is divided
into new cells. Cells in the next generation, in general, become more specialized
than their parent generation. Cells with the broadest range of potential can produce
all kinds of cells in an organism. This potential is called totipotency. The next
level of potency is called pluripotency, which means very many in its Latin origin
plurimus. A pluripotent cell can differentiate into more specialized cells. In contrast,
a unipotent cell can differentiate into only one cell type.
Prior to the work of Gurdon and Yamanaka, it was generally believed that the
path of cell differentiation is irreversible in that the potency of a cell becomes more
and more limited in generations of differentiated cells. Induced pluripotent stem
cells (iPS cells) result from a reprogramming of the natural differentiation. Starting
with a non-pluripotent cell, human intervention can reverse the process so that the
non-pluripotent cell could regain a more generic potency.
John B. Gurdon discovered in 1962 that the DNA of a mature cell may still have
all the information needed to develop all cells in a frog. He modified an egg cell of
a frog by replacing its immature nucleus with the nucleus from a mature intestinal
cell. The modified egg cell developed into a normal tadpole. His work demonstrated
that the specialization of cells is reversible. Shinya Yamanakas discovery was made
more than 40 years later. He found out how mature cells in mice could be artificially
reprogrammed to become induced pluripotent stem cells.
8.2.1 A Scientometric Review

On August 25, 2011, more than a year ago before the 2012 Nobel Prize was
announced, I received an email from Emma Pettengale. She is the Editor of a
peer-reviewed journal Expert Opinion on Biological Therapy (EOBT). The journal
provides expert reviews of recent research on emerging biotherapeutic drugs and
technologies. She asked if I would be interested in preparing a review of emerging
trends in regenerative medicine using CiteSpace and she would give me 3 months
to complete the review.
EOBT is a reputable journal with an impact factor of 3.505 according to the
Journal Citation Report (JCR) compiled by Thomson Reuters in 2011. Emmas
invitation was an unusual one. The journal is a forum for experts to express their
opinions on emerging trends but I am not a specialist in regenerative medicine at
all. Although CiteSpace has been used in a variety of retrospective case studies,
including terrorism, mass extinctions, string theory, and complex network analysis,
we were able to find independent reviews of most of the case studies to cross validate
our results or contact domain experts to verify specific patterns. The invitation was
both challenging and stimulating. We would be able to analyze emerging trends in
a rapidly advancing field with CiteSpace. Most importantly, we wanted to find out
if we can limit our source of information exclusively to patterns that are obviously
identified by CiteSpace.
276
8 Mapping Science
Regenerative medicine is a rapidly growing and fast-moving interdisciplinary

field of study, involving stem cell research, tissue engineering, biomaterials, would
healing, and patient-specific drug discovery (Glotzbach et al. 2011; Polak 2010;
Polykandriotis et al. 2010). The potential of reprogramming patients own cells
for biological therapy, tissue repairing and regeneration is critical to regenerative
medicine. It has been widely expected that regenerative medicine will revolutionize
medicine and clinical practices far beyond what is currently possible. Mesenchymal
Stem Cells (MSCs), for example, may differentiate into bone cells, fat cells, and
cartilage cells. Skin cells can be reprogrammed into induced pluripotent stem cells
(iPSCs). The rapid advance of the research has also challenged many previous
assumptions and expectations. Although iPSCs resemble embryonic stem cells in
many ways, comparative studies have found potentially profound differences (Chin
et al. 2009; Feng et al. 2010; Stadtfeld et al. 2010).
The body of the relevant literature grows rapidly. The Web of Science has 4,295
records between 2000 and 2011 based on a topic search of the term regenerative
medicine in titles, abstracts, or indexing terms. If we include records that are
relevant to regenerative medicine, but do not use the term regenerative medicine
explicitly, the number could be as ten times higher. Stem cell research plays a
substantial role in regenerative medicine. There are over two million publications on
stem cells on Google Scholar. There are 167,353 publications specifically indexed
as related to stem cell research in the Web of Science. Keeping abreast the fastmoving body of literature is critical not only because new discoveries emerge from
a diverse range of areas but also because new findings may fundamentally alter the
collective knowledge as a whole (Chen 2012).
In fact, a recent citation network analysis (Shibata et al. 2011) identified future
core articles on regenerative medicine based on their positions in a citation networks
derived from 17,824 articles published before the end of 2008. In this review, we
demonstrate a scientometric approach and use CiteSpace to delineate the structure
and dynamics of the regenerative medicine research. CiteSpace is specifically
designed to facilitate the detection of emerging trends and abrupt changes in
scientific literature. Our study is unique in several ways. First, our dataset contains
relevant articles published between 2000 and 2011. We expect that it will reveal
more recent trends emerged within the last 3 years. Second, we use a citation indexbased expansion to construct our dataset, which is more robust than defining a
rapidly growing field with a list of pre-defined keywords. Third, emerging trends
are identified based on indicators computed by CiteSpace without domain experts
intervention or prior working knowledge of the topic. This approach makes the
analysis repeatable with new data and verifiable by different analysts.
CiteSpace is used to generate and analyze networks of co-cited references based
on bibliographic records retrieved from the Web of Science. An initial topic search
for regenerative medicine resulted in 4,295 records published between 2000 and
2011. After filtering out less representative record types such as proceedings papers
and notes, the dataset was reduced to 3,875 original research articles and review
articles.
277
Fig. 8.6 Major areas of regenerative medicine
The 3,875 records do not include relevant publications if the term regenerative
medicine does not explicitly appear in the titles, abstracts, or index terms. We
expanded the dataset by citation indexing. If an article cites at least one of the 3,875
records, then the article will be included in the expanded dataset based on the assumption that citing a regenerative medicine article makes the citing article relevant
to the topic. The citation index-based expansion resulted in 35,963 records, consisting of 28,252 (78.6 %) original articles and 7,711 (21.4 %) review articles. The
range of the expanded set remains to be 20002011. Thus the analysis focuses on
the development of regenerative medicine over the last decade. The 35,963-article
dataset is used in the subsequent analysis. Incorrect citation variants to the two
highly visible references, a 1998 landmark article by Thomson et al. (1998) and a
1999 article by Pittenger (Pittenger et al. 1999), were corrected prior to the analysis.
8.2.2 The Structure and Dynamics

Figure 8.6 shows a visualization of the literature relevant to regenerative medicine.
This visualization provides an overview of major milestones in history. The concentrations of colors indicate the chronological order of the development. For example,
cluster #12 mesenchymal stem cell was one of the earlier focuses of the research,
278
8 Mapping Science
Table 8.2 Major clusters of co-cited references

Cluster ID Size Silhouette Label (TFIDF)
9
97 0.791
Evolving concept
17
71
0.929
67
0.980
12
62
0.891
5
19
7
53
42
40
0.952
0.119
0.960
15
25
0.930
Label (LLR)
Mesenchymal stem
cell
Somatic control
Drosophila
spermatogenesis
Mcf-7 cell
Intestinal-type
gastric cancer
Midkine
Human embryonic
stem cell
Grid2ip gene
Silico
Bevacizumab
Combination
Monogenic disease Induced pluripotent
treatment
stem cell
Tumorigenic
Cancer stem cell
melanoma cell
Label (MI)
Year Ave.
Cardiac
1999
progenitor
cell
Drosophila
1994
Change
2001
Dna
2002
Gastric cancer 2002

Cartilage
2004
Clinic
2008
Cancer
2003
prevention
Clusters are referred in terms of the labels selected by LLR
followed by #20 human embryonic stem cell, and then followed by the latest and
current #32 induced pluripotent stem cell. The patches of red rings in #32 indicate
this area is rapidly expanding as suggested by citation bursts.
Table 8.2 lists eight major clusters by their size, i.e. the number of members in
each cluster. Clusters with few members tend to be less representative than larger
clusters because small clusters are likely to be formed by the citing behavior of
a small number of publications. The quality of a cluster is also reflected in terms
of its silhouette score, which is an indicator of its homogeneity or consistency.
Silhouette values of homogenous clusters tend to close to 1. Most of the clusters
are highly homogeneous, except Cluster #19 with a low silhouette score of 0.119.
Each cluster is labeled by noun phrases from titles of citing articles of the cluster
(Chen et al. 2010).
The average year of publication of a cluster indicates its recentness. For example,
Cluster #9 on mesenchymal stem cell (MSCs) has an average year of 1999. The most
recently formed cluster, Cluster #7 on induced pluripotent stem cell (iPSCs), has an
average year of 2008.
Cluster #7 contains numerous nodes with red rings of citation bursts. The
visualized network also shows highly burst terms found in the titles and abstracts
of citing articles to the major clusters. For example, terms stem-cell-renewal and
germ-line-stem-cells are not only used when articles cite references in Cluster #17
drosophila spermatogenesis, but also used with a period of rapid increase. Similarly,
the term induced-pluripotent-stem-cells is a burst term associated with Cluster
#7, which is consistently labeled as induced pluripotent stem cell by a different
selection mechanism, the log-likelihood ratio test (LLR). We will particularly focus
on Cluster #7 in order to identify emerging trends in regenerative medicine.
Cluster #7 is the most recently formed cluster. We selected ten most cited
references in this cluster and 10 citing articles (See Table 8.3).
Coverage %
Author (Year) Journal, Volume, Page
Takahashi K (2006) Cell, v126, p663
Takahashi K (2007) Cell, v131, p861
Yu JY (2007) Science, v318, p1917
Okita K (2007) Nature, v448, p313
Wernig M (2007) Nature, v448, p318
Park IH (2008) Nature, v451, p141
Nakagawa M (2008) Nat Biotechnol, v26, p101
Okita K (2008) Science, v322, p949
Maherali N (2007) Cell Stem Cell, v1, p55
Stadtfeld M (2008) Science, v322, p945
Cites
1,841
1,583
1,273
762
640
615
501
445
391
348
65
68
73
73
73
73
77
77
80
95
Citing articles
Cited references
Cluster #7 induced pluripotent stem cell
Table 8.3 Cited references and citing articles of Cluster #7 on iPSCs
Archacka, Karolina (2010) induced pluripotent stem cells

hopes, fears and visions
Yoshida, Yoshinori (2010) recent stem cell advances:
induced pluripotent stem cells for disease modeling and
stem cell-based regeneration
Rashid, S. Tamir (2010) induced pluripotent stem cells
alchemists tale or clinical reality? rid c-6368-2011
Kun, Gabriel (2010) gene therapy, gene targeting and
induced pluripotent stem cells: applications in monogenic
disease treatment
Robbins, Reiesha D. (2010) inducible pluripotent stem cells:
not quite ready for prime time?
Lowry, William E. (2010) roadblocks en route to the clinical

application of induced pluripotent stem cells
Stadtfeld, Matthias (2010) induced pluripotency: history,

mechanisms, and applications
Kiskinis, Evangelos (2010) progress toward the clinical
application of patient-specific pluripotent stem cells
Masip, Manuel (2010) reprogramming with defined factors:
from induced pluripotency to induced transdifferentiation
Sommer, Cesar A. (2010) experimental approaches for the
generation of induced pluripotent stem cells
Author (Year) Title

279
280
8 Mapping Science
Table 8.4 Most cited references
Citation counts
2,486
2,223
2,102
1,841
1,583
1,273
1,145
1,061
1,030
945
References
Pittenger MF, 1999, Science, v284, p143
Thomson JA, 1998, Science, v282, p1145
Reya T, 2001, Nature, v414, p105 [Review]
Takahashi K, 2006, Cell, v126, p663
Yu JY, 2007, Science, v318, p1917
Jain RK, 2005, Science, v307, p58
Jiang YH, 2002, Nature, v418, p41
Evans MJ, 1981, Nature, v292, p154
Al-Hajj M, 2003, P Natl Acad Sci USA, v100, p3983
Cluster #
9
12
15
7
7
7
19
9
12
15
The most cited article in this cluster, Takahashi 2006 (Takahashi and Yamanaka
2006), demonstrated how pluripotent stem cells can be directly generated from
mouse somatic cells by introducing only a few defined factors as opposed to
transferring nuclear contents to oocytes, or egg cells. Their work is a major
milestone. The second most cited reference (Takahashi et al. 2007), from the same
group of researchers, further advanced the state-of-the-art by demonstrating how
differentiated human somatic cells can be reprogrammed into pluripotent stem cells
using the same factors identified in their previous work. As it turns out, the work
represented by the two highly ranked papers was awarded the 2012 Nobel Prize in
Medicine.
Cluster #7 consists of 40 co-cited references. The 10 selected citing articles are
all published in 2010. They cited 6595 % of these references. The one that has the
highest citation coverage of 95 % is an article by Stadtfeld et al. Unlike works that
aim to refine and improve the ways to produce iPSCs, their primary concern was
whether iPSCs are equivalent, molecularly and functionally, to blastocyst-derived
embryonic stem cells. The Stadtfeld article itself belongs to the cluster. Other citing
articles also seem to question some of the fundamental assumptions or call for more
research before further clinical development in regenerative medicine.
The most cited articles are usually regarded as the landmarks due to their
groundbreaking contributions (See Table 8.4). Cluster #7 has 3 articles in the top 10
landmark articles. Each of Clusters #9, #12, and #15 has two. The most cited article
in our dataset is Pittenger MF (1999) with 2,486 citations, followed by Thomson
JA (1998) with 2,223 citations. The third one is a review article by Reya T (2001).
Articles at the 4th6th positions are all from Cluster #7, namely Takahashi K (2006),
Takahashi K (2007), and Yu JY (2007). These three are also the more recent articles
on the list, suggesting that they have inspired intense interest in induced pluripotent
stem cells.
A citation burst has two attributes: the intensity of the burst and how long the
burst status lasts. Table 8.5 lists references with the strongest citation bursts across
the entire dataset during the period of 20002011. The first four articles with strong
citation bursts are from Cluster #7 on iPSCs. Interestingly, one 2009 article (again
281
Table 8.5 References with the strongest citation bursts

Citation bursts
124.73
121.36
81.37
71.24
66.23
63.12
62.54
References
Yu JY, 2007, Science, v318, p1917
Okita K, 2008, Science, v322, p949
Meissner A, 2008, Nature, v454, p766
Vierbuchen T, 2010, Nature, v463, p1035
Zhou HY, 2009, Cell Stem Cell, v4, p381
Cluster #
7
7
7
7
13
8
7
Table 8.6 Structurally and temporally significant references

Sigma
Burst
Centrality Citations References
377340.46 124.73 0.11

29079.18
37.38 0.32
195.15
121.36 0.04
58.91
81.37 0.05
15.97
19.53 0.15
1,841
202
1,583
1,273
130
Cluster #

7
Bjornson CRR, 1999, Science, v283, p534 9
7
Yu JY, 2007, Science, v318, P1917
7
Kiger AA, 2000, Nature, v407, p750
17
in Cluster #7) and one 2010 article (in Cluster #8, a small cluster) are detected to
have considerable degrees of citation burst. The leader of the group that authored
the top two references was awarded the 2012 Nobel Prize in Medicine.
The Sigma metric measures both structural centrality and citation burstness of a
cited reference. If a reference is strong in both measures, it will have a higher Sigma
value than a reference that is only strong in one of the two measures.
As shown in Table 8.6, the pioneering iPSCs article by Takahashi (2006) has
the highest Sigma of 377340.46, which means it is structurally essential and
inspirational in terms of its strong citation burst. The second highest work by this
measure is a 1999 article in Science by Bjornson et al. (1999). They reported an
experiment in which neural stem cells were found to have a wider differentiation
potential than previously thought because they evidently produced a variety of blood
cell types.
8.2.3 System-Level Indicators

The modularity of a network measures the degree to which nodes in the network
can be divided into a number of groups such that nodes within the same group are
connected tighter than nodes between different groups. The collective intellectual
structure of the knowledge of a scientific field can be represented as associated
networks of co-cited references. Such networks evolve over time. Newly published
articles may introduce profound structural variation or have little or no impact on
the structure.
282
8 Mapping Science
Fig. 8.7 The modularity of the network dropped considerably in 2007 and even more in 2009,
suggesting that some major structural changes took place in these 2 years in particular
Figure 8.7 shows the change of modularity of networks over time. Each network
is constructed based on a 2-year sliding window. The number of publications per
year increased considerably. It is noticeable that the modularity dipped in 2007 and
bounced back to the previous level before it dropped even deeper in 2009. Based
on this observation, it is plausible that groundbreaking works appeared in 2007 and
2009. We will therefore specifically investigate potential emerging trends in these
2 years.
Which publications in 2007 would explain the significant decrease of the
modularity of the network formed based on publications prior to 2007? If a 2007
publication has a subsequent citation burst, then we expect that this publication
played an important role in changing the overall intellectual structure. Eleven
publications in 2007 are found to have subsequent citation bursts (Table 8.7).
Notably, Takahashi 2007 and Yu 2007 top the list. Both of them represent pioneering
investigations of reprogramming human body cells to iPSCs. Both of them have
current citation bursts since 2009. Other articles on the list address the pluripotency
of stem cells related to human cancer, including colon cancer and pancreatic cancer.
Two review articles on regenerative medicine and tissue repair are published in 2007
with citation bursts since 2010. These observations suggest that the modularity
change in 2007 is an indication of an emerging trend in the human induced
pluripotent stem cells research. The trend is current and active as shown by the
number of citation bursts associated with publications in 2007 alone.
If the modularity change in 2007 indicates an emerging trend in human iPSCs
research, what caused the even more profound modularity change in 2009? The
427
299
283
265
247
229
Ricci-Vitiani et al. (2007)
Li et al. (2007)
Mikkelsen et al. (2007)
Laflamme et al. (2007)
Gimble et al. (2007) [R]

Phinney and and Prockop
(2007) [R]
90
438
OBrien et al. (2007)
Khang et al. (2007) [In

Korean]
640
1,273
Yu et al. (2007)
Wernig et al. (2007)
Local citations
1,583
References
Takahashi et al. (2007)
Title
Induction of pluripotent stem cells from adult human
fibroblasts by defined factors
Induced pluripotent stem cell lines derived from human
somatic cells
In vitro reprogramming of fibroblasts into a pluripotent
ES-cell-like state
A human colon cancer cell capable of initiating tumour
growth in immunodeficient mice
Identification and expansion of human
colon-cancer-initiating cells
Identification of pancreatic cancer stem cells
Genome-wide maps of chromatin state in pluripotent
and lineage-committed cells
Cardiomyocytes derived from human embryonic stem
cells in pro-survival factors enhance function of
infarcted rat hearts
Adipose-derived stem cells for regenerative medicine
Concise review: mesenchymal stem/multipotent stromal
cells: the state of transdifferentiation and modes of
tissue repaircurrent views
Recent and future directions of stem cells for the
application of regenerative medicine
Duration
20092011
20092011
20082009
20082009
20082009
20082008
20102011
20102011
20102011
20102011
20082009
Burst
121:36
81:37
26:70
18:13
8:83
9:78
19:59
16:48
25:19
16:52
35:25
Table 8.7 Articles published in 2007 with subsequent citation bursts in descending order of local citation counts
Range (20002011)

283
284
8 Mapping Science
cluster that is responsible for the 2009 modularity change is Cluster #7 induced
pluripotent stem cell (iPSC). On the one hand, the cluster contains Takahashi 2006
and Takahashi 2007, which pioneered the human iPSCs trend. On the other hand,
the cluster contains many recent publications. The average age of the articles in
this cluster is 2008. Therefore, we examine the members of this cluster closely,
especially focusing on 2009 publications.
The impact of Takahashi 2006 and Takahashi 2007 is so profound that their
citation rings would overshadow all other members in Cluster #7. After excluding
the display of their overshadowing citation rings, it becomes apparent that this
cluster is full of articles with citation bursts, which are shown as citation rings in
red. We labeled the ones published in 2009 and also two 2008 articles and one 2010
article (Fig. 8.2 and Table 8.8).
The pioneering reprogramming methods introduced by Takahashi 2006 and
Takahashi 2007 modify adult cells to obtain properties similar to embryonic stem
cells using a cancer-causing oncogene c-Myc as one of the defined factors and a
virus to deliver the genes into target cells (Nakagawa et al. 2008). It was shown
later on that c-Myc is not needed. The use of viruses as the delivery vehicle raised
safety concerns of its clinical implications in regenerative medicine because viral
integration into target cells genome might activate or inactivate critical host genes.
Searching for virus-free techniques motivated a series of such studies, leading by an
article (Okita et al. 2008) appeared on October 9, 2008.
What many of these 2009 articles have in common appear to be the focus on
improving previous techniques of reprogramming human somatic cells to regain a
pluripotent state. It was realized that the original method used to induce pluripotent
stem cells has a number of possible drawbacks associated with the use of viral
reprogramming factors. Several subsequent studies investigated alternative ways
to induce pluripotent stem cells with lower risks or improved certainty. These
articles were published within a short period of time. For instance, Woltjen 2009
demonstrated a virus-independent simplification of induced pluripotent stem cell
production. On March 26, 2009, Yu et al.s article demonstrated that reprogramming
human somatic cells can be done without genomic integration or the continued
presence of exogenous reprogramming factors. On April 23, 2009, Zhou et al.s
article demonstrated how to avoid using exogenous genetic modifications by
delivering recombinant cell-penetrating reprogramming proteins directly into target
cells. Soldner 2009 reported a method without using viral reprogramming factors.
Kaij reported a virus-free pluripotency induction method. On May 28, 2009, Kim
et al.s article introduced a method of direct delivery of reprogramming proteins.
Vierbuchen 2010 is one of the few most recent articles that are found to have
citation bursts. The majority of the 2009 articles with citation bursts focused
on reprogramming human somatic cells to an undifferentiated state. In contrast,
Vierbuchen 2010 expanded the scope of reprogramming by demonstrating the
possibility of converting fibroblasts to functional neurons directly (Fig. 8.8).
300
293
288
284
235
211
194
193
161
158
149
138
Yu et al. (2009)
Zhou et al. (2009)
Soldner et al. (2009)
Kaji et al. (2009)
Kim et al. (2009a, b)
Ebert et al. (2009)
Kim et al. (2009b)

Vierbuchen et al. (2010)
Lister et al. (2009)
Chin et al. (2009)
Discher et al. (2009)
Hong et al. (2009)
97
piggyBac transposition reprograms fibroblasts to

induced pluripotent stem cells
Human induced pluripotent stem cells free of vector
and transgene sequences
Generation of induced pluripotent stem cells using
recombinant proteins
Parkinsons disease patient-derived induced
pluripotent stem cells free of viral
reprogramming factors
Virus-free induction of pluripotency and subsequent
excision of reprogramming factors
Generation of human induced pluripotent stem cells
by direct delivery of reprogramming proteins
Induced pluripotent stem cells from a spinal
muscular atrophy patient
Oct4-induced pluripotency in adult neural stem cells
Direct conversion of fibroblasts to functional
neurons by defined factors
Human DNA methylomes at base resolution show
widespread epigenomic differences
Induced pluripotent stem cells and embryonic stem
cells are distinguished by gene expression
signatures
Growth factors, matrices, and forces combine and
control stem cells
Suppression of induced pluripotent stem cell
generation by the p53p21 pathway
Hydrogels in regenerative medicine
320
Woltjen et al. (2009)
Slaughter et al. (2009)
Title
Local Citations
References
Table 8.8 Articles published in 2009 with citation bursts
31:68
43:71
43:14
45:39
51:93
31:87
63:12
41:91
56:03
46:71
53:94
62:54
59:97
52:65
Burst
20102011
20102011
20102011
20102011
20102011
20092011
20102011
20102011
20102011
20092011
20102011
20102011
20102011
20092011
Burst Duration
Range (20002011)

285
286
8 Mapping Science
Fig. 8.8 Many members of Cluster #7 are found to have citation bursts, shown as citation rings in
red. Chin MH 2009 and Stadtfeld M 2010 at the bottom area of the cluster represent a theme that
differs from other themes of the cluster
8.2.4 Emerging Trends

Two articles of particular interest appear at the lower end of Cluster #7, Chin et al.
(2009) and Stadtfeld et al. (2010). Chin et al.s article has 158 citations within
the dataset. A citation burst was detected for Chin 2009 since 2010. Chin et al.
questioned whether induced pluripotent stem cells (iPSCs) are indistinguishable
from embryonic stem cells (ESCs). Their investigation suggested that iPSCs should
be considered as a unique subtype of pluripotent cell.
The co-citation network analysis has identified several articles that cite the work
by Chin et al. In order to establish whether Chin et al. represents the beginning of
a new emerging trend, we inspect these citing articles listed in Table 8.9. Stadtfeld
2010 is the most cited citing article by itself with 134 citations. Similarly to Chin
et al., Stadtfeld 2010 addresses the question whether iPSCs are molecularly and
functionally equivalent to blastocyst-derived embryonic stem cells. Their work
identified the role of Dlk1-Dio3 gene cluster in association with the level of induced
pluripotency. In other words, these studies focus on mechanisms that govern induced
pluripotency, which can be seen as a distinct trend from the earlier trend on
improving reprogramming techniques. Table 8.9 includes two review articles cited
by Stadtfeld 2010.
287
Table 8.9 Articles that cite Chin et al.s 2009 article (Chin et al. 2009) and their citation counts
as of November 2011
Article
Stadtfeld et al. (2010)
Citations Title
134
Aberrant silencing of imprinted genes on chromosome
12qF1 in mouse induced pluripotent stem cells
Boland et al. (2009)
109
Adult mice generated from induced pluripotent stem cells
Feng et al. (2010)
72
Hemangioblastic derivatives from human induced
pluripotent stem cells exhibit limited expansion and
early senescence
Kiskinis and Eggan
59
Progress toward the clinical application of patient-specific
(2010) [R]
pluripotent stem cells
Laurent et al. (2011)
48
Dynamic changes in the copy number of pluripotency and
cell proliferation genes in human ESCs and iPSCs
during reprogramming and time in culture
Bock et al. (2011)
31
Reference maps of human ES and iPS cell variation
enable high-throughput characterization of pluripotent
cell lines
Zhao et al. (2011)
22
Immunogenicity of induced pluripotent stem cells
Boulting et al. (2011)
17
A functionally characterized test set of human induced
16
Control of the embryonic stem cell state
Young (2011) [R]a
Ben-David and Benvenisty 11
The tumorigenicity of human embryonic and induced
(2011) [R]a
[R] Review articles

a
Cited by Stadtfeld et al. (2010)
The new emerging trend is concerned with the equivalence of iPSCs and their
human embryonic stem cell counterparts in terms of their short- and long-term
functions. The new trend has critical implications on the therapeutic potential of
iPSCs. In addition to the works by Chin et al. and Stadtfeld et al., an article
published on August 2, 2009 by Boland et al. (2009) reported an investigation of
mice derived entirely from iPSCs. Another article (Feng et al. 2010) appeared on
February 12, 2010 investigated abnormalities such as limited expansion and early
senescence found in human iPSCs. The Stadtfeld 2010 article (Stadtfeld et al. 2010)
we discussed earlier appeared on May 13, 2010.
Some of the more recent citing articles of Chin et al. focused on providing
resources for more stringent evaluative and comparative studies of iPSCs. On
January 7, 2011, an article (Laurent et al. 2011) reported a study of genomic
stability and abnormalities in pluripotent stem cells and called for frequent genomic
monitoring to assure phenotypic stability and clinical safety. On February 4, 2011,
Bock et al. (2011) published genome-wide reference maps of DNA methylation
and gene expression for 20 previously derived human ES lines and 12 human iPS
cell lines. In a more recent article (Boulting et al. 2011) published on February 11,
2011, Boulting et al. established a robust resource that consists of 16 iPSC lines and
a stringent test of differentiation capacity.
iPSCs are characterized by their self-renewal and versatile ability to differentiate
into a wide variety of cell types. These properties are invaluable for regenerative
medicine. However, the same properties also make iPSCs tumorigenic or cancer
288
8 Mapping Science
Fig. 8.9 A network of the regenerative medicine literature shows 2,507 co-cited references cited
by top 500 publications per year between 2000 and 2011. The work associated with the two labelled
references was awarded the 2012 Nobel Prize in Medicine
prone. In a review article published in April 2011, Ben-David and Benvenisty (BenDavid and Benvenisty 2011) reviewed the tumorigenicity of human embryonic
and iPSCs. Zhao et al. challenged a generally held assumption concerning the
immunogenicity of iPSCs in an article (Zhao et al. 2011) on May 13, 2011. The
immunogenicity of iPSCs has clinical implications on therapeutically valuable cells
derived from patient-specific iPSCs.
In summary, a series of more recent articles have re-examined several fundamental assumptions and properties of iPSCs with more profound considerations
for clinical and therapeutic implications on regenerative medicine (Patterson et al.
2012) (Fig. 8.9).
8.2.5 Lessons Learned

The analysis of the literature of regenerative medicine and a citation-based
expansion has outlined the evolutionary trajectory of the collective knowledge over
289
the last decade and highlighted the areas of active pursuit. Emerging trends and
patterns identified in the analysis are based on computational properties selected by
CiteSpace, which is designed to facilitate sense-making tasks of scientific frontiers
based on relevant domain literature.
Regenerative medicine is a fascinating and a fast-moving subject matter. As
information scientists, we have demonstrated a scientometric approach to tracking
the advance of the collective knowledge of a dynamic scientific community by
tapping into what experts in the domain have published in the literature and how
information and computational techniques can help us to discern patterns and trends
at various levels of abstraction, namely, cited references and clusters of co-cited
references.
Based on the analysis of structural and temporal patterns of citations and cocitations, we have identified two major emerging trends. The first one started in
2007 with pioneering works on human induced pluripotent stem cells (iPSCs),
including subsequently refined and alternative techniques for reprogramming. The
second one started in 2009 with an increasingly broad range of examinations
and re-examinations of previously unchallenged assumptions with clinical and
therapeutic implications on regenerative medicine, including tumorigenicity and
immunogenicity of iPSCs. It is worth noting that this expert opinion is solely based
on scientometric patterns revealed by CiteSpace without prior working experience
in the regenerative medicine field.
The referential expansion of the original topic search of regenerative medicine
has revealed a much wider spectrum of intellectual dynamics. The visual analysis of
the broader domain outlines the major milestones throughout the extensive period
of 20002011. Several indicators and observations converge to the critical and
active role of Cluster #7 on iPSCs. By tracing interrelationships along citation links
and citation bursts, visual analytic techniques of scientometrics are able to guide
our attention to some of the most vibrating and rapidly advancing research fronts
and identify the strategic significance of various challenges addressed by highly
specialized technical articles. The number of review articles on relevant topics is
rapidly increasing, which is also a sign that the knowledge of regenerative medicine
has been advancing rapidly. We expect that visual analytic tools as we utilized in this
review will play a more active role in supplement to traditional review and survey
articles. Visual analytic tools can be valuable in finding critical developments in the
vast amount of newly published studies.
The key findings of the regenerative medicine and related research over the last
decade have shown that regenerative medicine has become more and more feasible
in many areas and that it will ultimately revolutionize clinical and healthcare
practice and many aspects of our society. On the other hand, the challenges
ahead are enormous. The biggest challenge is probably related to the fact that
human beings are a complex system in that a local perturbation may lead to
unpredictable consequences in other parts of the system, which in turn may affect
the entire system. The state of the art in science and medicine has a long way to
go to handle such complex systems in a holistic way. Suppressing or activating a
seemingly isolated factor may have unforeseen consequences.
290
8 Mapping Science
The two major trends identified in this review have distinct research agendas as
well as different perspectives and assumptions. In our opinion, the independencies
of such trends at a strategic level are desirable at initial stages of these emerging
trends so as to maximize the knowledge gain that is unlikely to be achieved by a
single line of research alone. In a long run, more trends are expected to emerge from
probably the least expected perspectives. Existing trends may be accommodated by
new levels of integration. We expect that safety and uncertainty will remain to be
the central concern of regenerative medicine.
8.3 Retraction
The reproducibility of the results in a scientific article is a major cornerstone
of science. If fellow scientists follow the procedure described in a scientific
publication, would they be able to reproduce the same results in the original
publication? If not, why not? The publication of a scientific article is subject to the
scrutiny of fellow scientists, the authors own institutions, and everyone who may
be concerned, including patients, physicians, and regulatory bodies of guidelines.
The retraction of a scientific article is a formal action that is taken to purge the
article from the scientific literature on the ground that the article in question is not
trustworthy and therefore disqualified to be part of the intellectual basis of scientific
knowledge. Retraction is a self-correction mechanism of the scientific community.
Scientific articles can be retracted for a variety of reasons, ranging from selfplagiarism, editorial errors, to scientific misconduct, which may include fabrication
and falsification of data and results. The consequences of these diverse types
of mistakes differ. Some are easier to detect than others. For example, clinical
studies contaminated by fabrications of data or results may directly risk the safety
of patients, whereas publishing a set of valid results simultaneously in multiple
journals is not ethical but nonetheless less likely to harm patients directly. On the
one hand, some retracted articles may remain to be controversial even after their
retraction. For example, Lancet partially retracted a 1998 paper (Wakefield et al.
1998) that suggested a possible link between a combination of vaccines against
measles, mumps, and rubella and autism. The ultimate full retraction of the Lancet
article didnt come until 2010. On the other hand, the influence of other retracted
articles may come to an end more abruptly after their retraction, for example, the
fabricated stem cell clone by Woo-Suk Hwang (Kakuk 2009).
The rate of retraction from the scientific literature appears to be increasing. For
example, retractions in MEDLINE were found to have increased sharply since 1980
and reasons for retraction included errors or non-reproducible findings (40 %),
research misconduct (28 %), redundant publication (17 %) and unstated/unclear
(5 %) (Wager and Williams 2011). We verified the increase of retraction in PubMed
on 3/29/2012. As shown in Fig. 8.10, the total number of annual publications
in PubMed increased from slightly more than 543,000 articles in 2001 to more
than 984,000 articles in 2011. The increase is remarkably steady, by about 45,000
8.3 Retraction
291
Fig. 8.10 The rate of retraction is increasing in PubMed (As of 3/29/2012)
new articles per year. The rate of retracted articles is calculated as the number of
eventually retracted articles published in a year divided out of the total number of
articles published in the same year in PubMed. The rate of retraction is the number
of retraction notices issued each year out of the total number of publications in
PubMed in the same year. The retraction rate in 2001 was 0.00005. It was doubled
three times since then, in 2003, 2006, and 2011, respectively. The retraction rate
in 2011 was 0.00046. Figure 8.10 shows that the number of retracted articles per
year peaked in 2006. The blue line is the retraction rate, which is growing fast. The
red line is the actual number of retracted articles. Although currently fewer recent
articles have been retracted than the 2006 peak number, we expect that this is in part
due to a delay in recognizing potential flaws in newly published articles. We will
quantify the extent of such delays later in a survival analysis.
On the one hand, the increasing awareness of mistakes in scientific studies
(Naik 2011), especially due to the publicity of high-profile cases of retraction and
fraudulent cases (Kakuk 2009; Service 2002) has led to a growing body of studies
of retractions. On the other hand, the study of retracted articles, the potential risk
that these articles may bring to the scientific literature in a long run, and actions
that could be taken to reduce such risks is relatively underrepresented, given the
urgency, possible consequences, and policy implications of the issue. We will
address some common questions concerning retracted articles. In particular, we
introduce a visual analytic framework and a set of tools that can be used to facilitate
situation awareness tasks at macroscopic and microscopic levels.
At the macroscopic level, we will focus on questions concerned with retracted
articles in a broader context of the rest of scientific literature. Given a retracted
article, which areas of the scientific literature are affected? Where are the articles
that directly cited the retracted article? Where are the articles that may have related
to the retracted articles indirectly?
292
8 Mapping Science
Table 8.10 The number of retractions found in major sources of scientific publications (As of
3/29/2012)
Sources
PubMeda
Web of Science
(1980present)
Google scholar
Elsevier Content
Syndication (CONSYN)
Items
2,073
2,187
1,775
Document type
Retracted article
Retraction notice
Retracted article
Search criteria
Retracted publication [pt]
Retraction of publication [pt]
Title contains (Retracted article.)
1,734
219
659
Retraction notice
Retracted article
Retracted article
(full text)
Title contains (Retraction of vol)

Allintitle: retracted article
Title: Retracted article
a
http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=DetailsSearch&Term=
%22retracted+publication%22%5Bpublication+type%5D
At the microscopic level, we will focus on questions concerned with postretraction citations to a retracted article. Are citations prior to retractions distinguishable from post-retraction citations, quantitatively and qualitatively?
PubMed is the largest publically available resource of the scientific literature
with the most extensive coverage of scientific publications in medicine and related
disciplines. Each PubMed record has an attribute called Publication Type [pt]. The
retraction of an article is officially announced in a retraction notice. The publication
type of the retraction notice is Retraction of Publication. The retracted articles
publication type is updated to Retracted Publication. PubMed provides a list of
special queries, including one for retracted publication.1
The Web of Science, compiled by Thomson Reuters, has a field called Document
Type. The value of Document Type includes Article, Review, Correction, and a few
other types of value. The Document Type of Correction2 is used for retractions
as well as corrections of other types such as additions and errata. The title of the
retraction notice consists of the title of the retracted article and a phrase (Retraction
of) so that the title is self-sufficient for identifying the retracted article. The title of
a retracted article in the Web of Science is also modified with a phrase to mark the
fact that it is retracted. For example, the Wakefield paper is shown with a phrase
(Retracted article. See vol 375, pg 445, 2010).
In Google Scholar, retracted articles are identified with a prefix of
RETRACTED ARTICLE to their title. In advanced Scholar search, one can
limit the search to all the records with the phrase in the title.
Table 8.10 summarizes the number of retractions found in major sources of
scientific publications as of 3/29/2012. The search on PubMed contains all the years
available, whereas the search on the Web of Science is limited by the coverage of
our institutional subscription (1980present).
1
http://www.ncbi.nlm.nih.gov/PubMed?term=retracted+publication+[pt]
Correction: Correction of errors found in articles that were previously published and which have
been made known after that article was published. Includes additions, errata, and retractions. http://
images.webofknowledge.com/WOKRS51B6/help/WOS/hs document type.html
8.3 Retraction
293
8.3.1 Studies of Retraction

A retraction sends a strong signal to the scientific community that retracted articles
are not trustworthy and they should be effectively purged from the literature. Studies
of retraction are often limited to formally retracted articles. It is a common belief
that many more articles should have been retracted (Steen 2011). On the other hand,
it has been noted that retraction should be made to scientific misconduct, whereas
correction is a more appropriate term for withdrawing articles with technical errors
(Sox and Rennle 2006). We outline some of the representative studies of retraction
as follows in terms of how they addressed several common questions.
Time to retraction How long does it take on average for a scientific publication
to be retracted? Does the time to retraction differ between senior and junior
researchers?
Post-retraction citations Does the retraction of an article influence how the article
is cited, quantitatively and qualitatively? How soon can one detect the decrease
of citations after retraction?
Cause of concern How was an eventually retracted article noticed in the first place?
Are there any early signs that one can watch for and safeguard the integrity of
scientific publications?
Reasons for retractions What are the most common reasons for retraction?
How are these common causes distributed? Should they be retreated equally or
differently as far as retraction is concerned?
Deliberate or accidental Do scientists simply make mistakes with good faith or
some of them intended to cheat in terms of deliberate misconduct.
Table 8.11 outlines some of the most representative and commonly studied
aspects of retraction, including corresponding references of individual studies.
Several studies found that on average it took about 2 years to retract a scientific
publication and it took even longer for articles that were responsible by senior
researchers. Time to retraction of articles was particularly studied in a survival
analysis in (Trikalinos et al. 2008). Based on retractions made in top-cited highimpact journals, it was found that the median survival time of eventually retracted
articles was 28 months. In addition, it was found that it took much longer to retract
articles authored by senior researchers, i.e. professors, lab directors, or researchers
with more than 5 years of publication records, than junior ones.
Post-retraction citations were studied at different time points after retraction,
ranging from the next calendar year, 1 year after retraction, to 3 years after
retractions. In general, citation counts tend to reduce after a retraction, but there
are outliers that are apparently unaware of a retraction after 23 years.
Irreproducibility and unusually high-level of productivity are among the most
common causes of initial concern. For example, Jan Hendrik Schon fabricated
17 papers in 2 years in Science and in Nature. He produced a new paper every
8 days at his peak (Steen 2011). Irreproducibility can be further explained in terms
of an array of specific types of reasons, including types of errors and deliberate
294
8 Mapping Science
Table 8.11 Major aspects of retraction

Attributes of retraction
Time to retraction
(months)
Post-retraction citations
(lag time)
Cause of concern
Reasons for retraction
Types of errors
Types of misconduct
Deliberate or accidental
Sources of the literature
Findings and references

28 months (mean) (Budd et al. 1998); Fraudulent 28.41 months
(mean), Erroneous 22.72 months (mean) (Steen 2011); 28
months (median), Senior researchers implicated 79 months,
junior researcher implicated 22 months (Trikalinos et al.
2008); case study (Korpela 2010)
1 year after retraction (Budd et al. 1998); 3 years after (Neale et al.
2007); next calendar year (Pfeifer and Snodgrass 1990)
Irreproducibility, unusually high-level of productivity (Budd et al.
1998; Steen 2011)
Scientific misconduct, irreproducibility, errors (Wager and
Williams 2011)
Errors in method, data or sample; duplicated publication; text
plagiarism (Budd et al. 1998)
Identified or presumed; fraud, fabrication, falsification, data
plagiarism (Budd et al. 1998; Neale et al. 2007; Steen 2011)
A higher rate of repeat offenders found in fraudulent papers than
erroneous papers (Steen 2011)
PubMed/MEDLINE (Budd et al. 1998; Neale et al. 2007; Steen
2011)
misconduct. It has been argued that, pragmatically speaking, fabricating data and
results is perceived to be much more harmful than plagiarizing a description or an
expression. For example, some researchers distinguish data plagiarism from text
plagiarism and retreat data plagiarism as a scientific misconduct (Steen 2011).
A sign that may differentiate a deliberate fraudulent behavior from a good faith
mistake is whether it happens repeatedly with the same researcher. A higher rate
of repeat offenders was indeed found in fraudulent papers than erroneous papers
(Steen 2011).
Studies of retraction almost exclusively focused on the literature of medicine,
where the stake is high in terms of the safety of patients. PubMed and the Web
of Science are the major resources used in these studies. Analysts in these studies
typically searched for retracted articles and analyzed the content of retraction
notices as well as other types of information. Most of these studies appear to rely
on labor-intensive procedures with limited or no support for visual analytic tasks.
Several potentially important questions have not been adequately addressed due to
such constraints.
8.3.1.1 k-Degree Post-retraction Citation Paths

An article may cite a retracted article without realizing the corresponding retraction.
This type of citing articles may infect the integrity of the scientific literature. Studies
of retraction so far essentially focused on first-degree citing articles, i.e. articles that
8.3 Retraction
295
directly cited a retracted article. Citation counts and whether it is evident that the
citers were aware of the status of retracted articles are the most commonly studied
topics.
Given a published article ato , retracted or not, a citation path between a
subsequently published article atk and the original article can be defined in terms
of pairwise citation relations as follows: ato
at1

atk , where denotes
a direct citation reference, ti < tj if i < j, and the length of each segment of the path
is minimized. In other words, ati
ati C1 means ati C1 has no direct citation to any
of the articles on the path prior to ati . The length of a citation path is the number
of direct citation links included in the path. Existing studies of citations to retracted
articles are essentially limited to citation paths that contain one step only. Longer
citation paths originated from a retracted article have not been studied. It is clear
that the retraction of the first article is equivalent to the removal of the first article
from a potentially still growing path such as ato
at1

atk because newly
published articles may unknowingly cite the last article atk without questioning the
validity of the potentially risky path. By k-degree post-retraction citation analysis,
we introduce a study of such paths formed by k pairwise direct citation links as in
ato
at1

atk:
8.3.1.2 Citation Networks Involving Retracted Articles

Over the recent years, tremendous advances have been made in scientometrics
(Boyack and Klavans 2010; Leydesdorff 2001; Shibata et al. 2007; Upham et al.
2010), science mapping (Chen 2006; Cobo et al. 2011; Small 1999; van Eck
and Waltman 2010), and visual analytics (Pirolli 2007; Thomas and Cook 2005).
Existing studies of citations to retracted articles have not yet incorporated these
relative new and more powerful techniques. Vice versa researchers who have access
to the new generation of analytic tools have not applied these tools to the analysis
of citation networks involving retracted articles.
8.3.1.3 Citation Context

It is important to find out how much a citing articles authors know about the
current status of a retracted article when they refer to the retracted article. Previous
studies have shown that this is not always clear in text. A retracted article may have
been cited by hundreds of subsequently published articles. Manually examining
individual citation instances is time consuming and cognitively demanding. It is
an even more challenging task for analysts to synthesize emergent patterns from
individual citation instances and discern changes in terms of how a retracted article
has been cited over an extensive period of time because it is known that retracted
articles can be cited continuously for a long time after the retraction.
296
8 Mapping Science
Table 8.12 Survival analysis of time to retraction

Meana
Median
95 % confidence interval
95 % confidence interval
Estimate Std. error Lower bound Upper bound Estimate Std. error Lower bound Upper bound
2.578
a
0.066
2.448
2.707
2.000
0.052
1.898
2.102
Estimation is limited to the largest survival time if it is censored
The provision of full text articles would make it possible to study the context
of citations to a retracted article with computational tools. It would also make it
possible to study higher-level patterns of citations and how they change over time
with reference to retraction events.
We address these three questions and demonstrate how visual analytic methods
and tools can be developed and applied to the study of citation networks and citation
contexts involving retracted articles. There are many other issues that are important
to study but we decide to focus on the ones that are relatively fundamental.
8.3.2 Time to Retraction

In the Web of Science, the title of a retracted article includes a suffix of Retracted
article. As of 3/30/2012, there are 1,775 records of retracted articles. The distribution of the 1,775 retracted articles since 1980 shows that retractions appear to have
peaked in 2007 with 254 retracted articles recorded in the Web of Science alone.
On the other hand, it might be still too soon to rule out the possibility of more
retrospective retractions.
It is relatively straightforward to calculate on average how long it may last
before the retraction of an article since its publication. It is common that the time
of retraction of an article is retrievable from the amended title of the article. For
example, if the title of an article published in 2010 is followed by a clause in the
form of (Retracted article. See vol. 194, pg. 447, 2011), then we know that the
article was retracted in 2011. We loaded the data into a built-in relational database
of CiteSpace and used the substring function in SQL to extract the year of retraction
from the title by counting backwards, i.e. substring (title, 5, 4). We found that the
mean time to retraction is 2.57 years, or 30 months, based on the retraction time
of the 1,721 retracted articles, excluding 54 records with no retraction date. The
median time to retraction is 2 years, i.e. 24 months (See Table 8.12).
Figure 8.11 shows a plot of the survival function of retraction. The probability
of surviving retraction reduces rapidly for the first few years since publication.
In other words, the majority of retractions took place within the first few years.
The probability of survival is below 0.2 for a 4-year old eventually to be retracted
article.
8.3 Retraction
297
Fig. 8.11 The survival function of retraction. The probability of surviving retraction for 4 years
or more is below 0.2
8.3.3 Retracted Articles in Context

Table 8.13 lists the ten most highly cited retracted articles in the Web of Science.
The 1998 Lancet paper by Wakefield et al. has the highest citations of 740. The
least cited of the ten has 366 citations. Three papers on the list were published in
Science and two in Lancet. In the rest of the article, we will primarily focus on these
high-profile retractions in terms of their citation contexts at both macroscopic and
microscopic levels.
We are interested in depicting the context of retracted articles in a co-citation
network of a broadly defined and relevant set of scientific publications. First, we
retrieved 29,756 articles that cited 1,584 retracted articles in the Web of Science.
We use CiteSpace to generate a co-citation network based on the collective citation
behavior of the 29,756 articles between 1998 and 2011. The top 50 % most cited
references were included to the formation of the co-citation network with an upper
limit of 3,000 references per year. The resultant network contains 7,217 references
and 155,391 co-citation links. A visualization of the co-citation network is generated
Lead author
Wakefield AJ
Reyes M
Fukuhara A
Nakao N
Chang G
Kugler A
Rubio D
Gowen LC
Hwang WS
Makarova TL
Citations
740
727
659
618
512
492
433
391
375
366
20012006
20042006
19982003
20052010
20002003
20012006
20032009
20052007
20012009
19982010
Publicationretraction
Table 8.13 The ten most highly cited retracted articles

Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and
pervasive developmental disorder in children (See vol 375,
pg 445, 2010)
Purification and ex vivo expansion of postnatal human marrow
mesodermal progenitor cells (See vol. 113, pg. 2370, 2009)
Visfatin: A protein secreted by visceral fat that mimics the
effects of insulin (See vol 318, pg 565, 2007)
Combination treatment of angiotensin-II receptor blocker and
angiotensin-converting-enzyme inhibitor in non-diabetic
renal disease (COOPERATE): a randomised controlled trial
(See vol. 374, pg. 1226, 2009)
Structure of MsbA from E-coli: A homolog of the multidrug
resistance ATP binding cassette (ABC) transporters (See
vol 314, pg 1875, 2006)
Regression of human metastatic renal cell carcinoma after
vaccination with tumor cell-dendritic cell hybrids (See
vol. 9, p. 1221, 2003)
Spontaneous human adult stem cell transformation (See
vol. 70, pg. 6682, 2010)
BRCA1 required for transcription-coupled repair of oxidative
DNA damage (See vol 300, pg 1657, June 13 2003)
Evidence of a pluripotent human embryonic stem cell line
derived from a cloned blastocyst (See vol 311, pg 335,
2006)
Magnetic carbon (See vol 440, pg 707, 2006)
Title (retraction notice)
Journal
Nature
Science
Science
Cancer Research
Nature Medicine
Science
Lancet
Science
Blood
Lancet
298
8 Mapping Science
8.3 Retraction
299
Fig. 8.12 An overview of co-citation contexts of retracted articles. Each dot is a reference of an
article. Red dots indicate retracted articles. The numbers in front of labels indicate their citation
ranking. Potentially damaging retracted articles are in the middle of an area that otherwise free
from red dots
and overlaid with the top-ten most cited retracted articles as well as other highly
cited articles without retractions (See Fig. 8.12). Each dot in the visualization
represents an article cited by the set of 29,756 citing articles. The dots in red
are retracted articles. Lines between dots are co-citation links. The color of a cocitation link is the earliest time a co-citation between two articles was made. The
earliest time is in blue; more recent time is in yellow and orange. The size of a dot,
or a disc, is proportional to the citation counts of the corresponding cited article.
The top ten most cited retracted articles are labeled in the visualization. Retracted
articles are potentially more damaging if they are located in the middle of a densely
co-cited articles. In contrast, isolated red dots are relatively less damaging. This
type of visualizations will be valuable to highlight how deeply a retracted article is
embedded in the scientific literature.
Figure 8.13 shows a close-up view of the visualization shown in Fig. 8.12. The
retracted article by Nakao N et al. on the left, for example, has a sizable red disc,
indicating its numerous citations. Its position on a densly connected island of other
articles indicates its relevant to a significant topic. Hwang WS (slightly to the right)
and Potti A at the lower right corner of the image have similar citation context
profiles. More profound impacts are likely to be found in interconnected citation
contexts of multiple retracted articles.
Figure 8.14 shows an extensive representation of the citation context of the
retracted 2003 article by Nakao et al. First, 609 articles that cited the Nakao paper
were identified in the Web of Science. Next, 9,656 articles were retrieved because
300
8 Mapping Science
Fig. 8.13 Red dots are retracted articles. Labeled ones are highly cited. Clusters are formed by
co-citation strengths
Fig. 8.14 An extensive citation context of a retracted 2003 article by Nakao et al. The co-citation
network contains 27,905 cited articles between 2003 and 2011. The black dot in the middle of
the dense network represents the Nakao paper. Red dots represent 340 articles that directly cited
the Nakao paper (there are 609 such articles in the Web of Science). Cyan dots represent 2,130 of
the 9,656 articles that bibliographically coupled with the direct citers
they have at least one common references with the 609 direct citing articles. Top
6,000 most cited references per year between 2003 and 2011 were chosen to form
a co-citation network of 27,905 references and 2,162,018 co-citation links. The
retracted Nakao paper is shown as the black dot in the middle of the map. The red
8.3 Retraction
301
dots are 340 direct citers of the total of 609 available in the Web of Science. The cyan
dots share common references with the direct citers, not necessarily the retracted
article. The labels are the most cited articles in this topic area, which are not retracted
articles themselves.
8.3.4 Autism and Vaccine

The most cited retracted article among all the retracted articles in the Web of Science
is the 1998 Lancet article by Wakefield et al. A citation burst of 0.05 was detected
for this article. The article was partially retracted in 2004 and fully retracted in 2010.
The Lancets retraction notice in February 2010 noted that several elements of the
1998 paper are incorrect, contrary to the findings of an earlier investigation, and that
the paper made false claims of an approval of the local ethics committee.
In order to find out what exactly was said when researchers cited the controversial
article, we studied citation sentences, which are the sentences that contain references
to the Wakefield paper. A set of full text articles were obtained from Elseviers
Content Syndication (ConSyn), which contains 3,359 titles of scholarly journals
and 6,643 non-serial titles. Since the Wakefield paper is concerned with a claimed
causal relation between a combined MMR vaccine and autism, we searched for full
text journal articles on autism and vaccine in ConSyn and found 1,250 relevant
articles. The Wakefield paper was cited by 156 full text articles in the 1,250 articles
from the ConSyn collection. A total of 706 citation sentences are found in the 156
citing articles. We used the Lingo clustering method provided by Carrot2, an open
source framework for building search clustering engines,3 to cluster these citation
sentences into 69 clusters.
Figure 8.15 is a visualization of the 69 clusters formed by 706 sentences that
cited the 1998 Lancet paper. The visualization is called Foam Tree in Carrot. See
Chap. 9 for more details on Carrot. Clusters with the largest areas represent the
most prominent clusters of phrases used when researchers cited the 1998 paper. For
example, inflammatory bowel disease, mumps and rubella, and association between
MMR vaccine and autism are the central topics of the citations. These topics indeed
characterize the role of the retracted Lancet paper, although in this study we did
not differentiate positive and negative citations. Identifying the orientation of an
instance of citation from a citation context, for example, the citing sentence and
its surrounding sentences, is a very challenging task even for an intelligent reader
because the position of the argument becomes clear only when a broader context is
taken into account, for example, after reading the entire paragraph in many cases.
In addition to aggregate citation sentences into clusters at a higher level of
abstraction, we further developed a timeline visualization that can be used to depict
year-by-year flows of topics to facilitate analytics to discern changes associated with
http://project.carrot2.org/
302
8 Mapping Science
Fig. 8.15 69 clusters formed by 706 sentences that cited the 1998 Wakefield paper
Fig. 8.16 Divergent topics in a topic-transition visualization of the 1998 Wakefield et al. article
citations to the retracted article. The topic-flow visualization was constructed as

follows. First, we group the citation sentences into groups defined by their publication time. Citation sentences made in each year are clustered into topics. Similarities
between topics in adjacent years are computed in terms of the overlapping topic
terms between them. Topic flows connect topics in adjacent years that meet a user
defined similarity threshold (See Fig. 8.16).
Each topic in the flow map can be characterized as convergent and divergent
topics as well as steady topics. A convergent topic in a particular year is defined
8.3 Retraction
303
Table 8.14 Specific sentences that cite the eventually retracted 1998 Lancet paper by Wakefield
et al.
Year of citation
1998
Ref
1
1998
1998
1998
2001
34
2007
2007
2010
Sentence
The report by Andrew Wakefield and colleagues confirms the clinical
observations of several paediatricians, including myself, who have
noted an association between the onset of the autistic spectrum and
the development of disturbed bowel habit
Looking at the ages of the children in Wakefields study, it seems that
most of them would have been at an age when they could well have
been vaccinated with the vaccine that has since been withdrawn
We are concerned about the potential loss of confidence in the mumps,
measles, and rubella (MMR) vaccine after publication of Andrew
Wakefield and colleagues report (Feb 28, p 637), in which these
workers postulate adverse effects of measles-containing vaccines
We were surprised and concerned that the Lancet published the paper
by Andrew Wakefield and colleagues in which they alluded to an
association between MMR vaccine and a nonspecific syndrome,
yet provided no sound scientific evidence
In 1998, Wakefield et al. [34] have published a second paper including
two ideas: that autism may be linked to a form of inflammatory
bowel disease and that this new syndrome is associated with
measlesmumpsrubella (MMR) immunization
Vaccine scares in recent years have linked MMR vaccination with
autism and a variety of bowel conditions, and this has had an
adverse impact on MMR uptake [5]
When comparing MMR uptake rates before (19941997) and after
(19992000) the 1998 Wakefield et al. article [5] it is seen that
prior to 1998 Asian children had the highest uptake
This addresses a concern raised by a now-retracted article by
Wakefield et al. and adds to the body of evidence that has failed to
show a relationship between measles vaccination and autism (1, 2)
in terms of the number of related topics in the previous year. The convergent topic
sums up elements from multiple previously separated topics. In 1999, the topic of
Rubella MMR Vaccination is highlighted by an explicit label because it is associated
with several distinct topics in 1998. In 2004, the year Lancet partially retracted
the Wakefield paper, the prominent convergent topic was Developmental Disorders.
The visualization shows that numerous distinct topics in 2003 were converged into
the convergent topic in 2004. We expect that this type of topic-flow visualizations
can enable new ways of analyzing and studying the dynamics of topic transitions in
specific citations to a particular article.
Table 8.14 lists examples of sentences that cited the 1998 Lancet paper by
Wakefield et al. For example, as early as 1998, researchers were concerned about the
lack of sound scientific evidence to support the claimed association between MMR
vaccine and inflammatory bowel disease. The adverse impact on MMR uptake is
also evident in these citation sentences. Many more analytic tasks may become
feasible with this type of text and pattern-driven analyses at multiple levels of
granularity.
304
8 Mapping Science
Using visualization and science mapping techniques we have demonstrated

that many high-profile retracted articles belong to vibrant lines of research. Such
complex attachments make it even more challenging to restore the validity of
the scientific literature in a timeliness manner. We introduced a set of novel
and intuitive tools to facilitate the analysis and exploration of the influence of
a retracted article in terms of how they are specifically cited in the scientific
literature. We have demonstrated that topic-transition visualizations derived from
citation sentences can bridge the cognitive and conceptual gap between macroscopic
patterns and microscopic individual instances. The topic flow of citation sentences is
characterized in terms of convergent and divergent topics, which serve as conceptual
touchstones for analysts to discern the dynamics of topic transitions associated with
the perceived role of a retracted article.
8.3.5 Summary
The perceived risk introduced by retracted articles alone is the tip of an iceberg.
Many high-profile retracted articles are interwoven deeply with the scientific
literature and in many cases they are embedded in fast-moving significant lines of
research. It is essential to raise the awareness that much of the potential damages
introduced by a retracted article are hidden and likely to grow quietly for a long time
after the retraction via indirect citations. The original awareness of the invalidity of
a retracted article may be lost in subsequent citations. New tools and services are
needed so that researchers and analysts can easily verify the status of a citation
genealogy to ensure that the current status of the origin of the genealogy is clearly
understood. Such tools should become part of the workflow of journal editors and
publishers.
From a visual analytic point of view, it is essential to bring in more techniques
and tools that can support analytic and sense making tasks from the dynamic and
unstructured information and allow analysts and researchers to move back and forth
freely across multiple levels of analytic and decision making tasks. The ability of
trailblazing evidence and arguments through an evolving space of knowledge is a
critical step for the creation of scientific knowledge and maintaining a trustworthy
documentation of the collective intelligence.
8.4 Global Science Maps and Overlays

Science mapping has made remarkable advances in the past decade. Powerful
techniques have become increasingly accessible to researchers and analysts. In this
chapter, we present some of the most representative efforts towards generating maps
of science. At the highest level, the goal is to identify how scientific disciplines
are interrelated, for example, how medicine and physics are connected, what topics
305
are shared by chemistry and geology, how federal funding is distributed across the
landscape of disciplines. Drawing a boundary line for a disciplinary is challenging;
drawing a boundary line for a constantly evolving disciplinary is even more so. We
will highlight some recent examples of how researchers deal with such challenges.
8.4.1 Mapping Scientific Disciplines

Derek de Solla Price is probably the first person to anticipate that the Science
Citation Index (SCI) may contain the information for revealing the structure of
science. Price suggested that the appropriate units of analysis would be journals and
aggregations of journals by journal-journal citations would reveal the disciplinary
structure of science. An estimation mentioned in (Leydesdorff and Rafols 2009)
sheds light on the density of a science map at the journal level. Among the 6,164
unique journals in the 2006 SCI, there were only 1,201,562 pairs of journal citation
relations out of the possible 37,994,896 connections. In other words, the density
of the global science structure is 3.16 %.4 How stable is such a structure at the
level of journal? How volatile is the structure of science at the document level or a
topic level? Where are the activities concentrated or distributed with reference to a
discipline, an institution, or an individual?
A widely seen global map of science is the USCD map, depicting 554 clusters
of journals and how they are interconnected as sub-disciplines of science (See
Fig. 8.17). The history of the UCSD map is described in (Borner et al. 2012).
The map was first created by Richard Klavans and Kevin Boyack in 2007 for the
University of California San Diego (UCSD). The source data for the map was
a combination of Thomson Reuters Web of Science (20012004) and Elseviers
Scopus (20012005). Similarities between journals were computed in 18 different
ways to form matrices of journal-journal connections. These matrices were then
combined to form a single network of 554 sub-disciplines in terms of clusters of
journals. The layout of the map was generated using the 3D Fruchterman-Reingold
layout function in Pajek. The spherical map was then unfolded to a 2D map on a
flat surface with a Mercator projection. Each cluster was manually labeled based on
journal titles in the cluster. The 2D version of the map was further simplified to a
1D circular map the circle map. The 13 labeled regions were ordered using factor
analysis. The circle map is used in Elseviers SciVal Spotlight.
The goal of the UCSD map was to provide a base map for research evaluation.
With 554 clusters, it provides more categories than the subject categories of the Web
of Science. While the original goal was for research evaluation, the map is being
used as a base map to superimpose overlays of additional information in systems
such as Sci2 and VIVO.5 Soon after the creation of the UCSD map, Richard Klavans
4
5
Assume this is a directed graph of 6,146 journals.

http://ivl.cns.iu.edu/km/pres/2012-borner-portfolio-analysis-nih.pdf
306
8 Mapping Science
Fig. 8.17 The UCSD map of science. Each node in the map is a cluster of journals. The clustering
was based on a combination of bibliographic couplings between journals and between keywords.
Thirteen regions are manually labeled (Reproduced with permission)
and Kevin Boyack came to the conclusion that research evaluation requires maps
with clusters at the article level rather than at the journal level.
The UCSD map was generated for UCSD to show their research strengths and
competencies. Although the discipline-level map characterizes the global structure
of scientific literature, much more details are necessary to quantify research
strengths at UCSD. The similar procedure was applied to generate an articlelevel map as opposed to a journal-level map. Clusters of articles were calculated
based on co-citations. In addition to the discipline-level circle map, the paper-level
clustering provides much more detailed classification information. In contrast to the
554 journal clusters, the paper-level clustering of co-cited references identified over
84,000 clusters, which are called paradigms (Fig. 8.18).
In a 2009 Scientometrics paper (Boyack 2009), Boyack described how a
disciplinary-level map can be used for collaboration. He collected 1.35 million
papers from 7,506 journals and 1,206 conference proceedings. These papers contain
29.23 million references. Similarities between references were calculated in terms
of bibliographic coupling. These reference-level similarities were then aggregated to
obtain similarities between journals. For each journal, the top 15 most similar journals in terms of bibliographic coupling were retained for generating the final map.
The map layout step served two purposes: one is to optimize the arrangement
of the journals so that the distance between journals on the map is proportional to
307
Fig. 8.18 Areas of research leadership for China. Left: A discipline-level circle map. Right:
A paper-level circle map embedded in a discipline circle map. Areas of research leadership are
located at the average position of corresponding disciplines or paradigms. The intensity of the
nodes indicates the number of leadership types found, Relative Publication Share (RPS), Relative
Reference Share (RRS), or state-of-the art (SOA) (Reprinted from Klavans and Boyack 2010 with
permission)
their dissimilarity; the other is to group individual journals into clusters based on
the distance generated by the layout process.
The map layout was made using the VxOrd algorithm, which ignores long-range
links in its layout process. The proximity of nodes in the resultant graph layout
was used to identify clusters using a modified single-linkage clustering algorithm.
In single linkage, the distance between two clusters is computed as the distance
between the two closest elements in the two clusters. The resultant map contains
812 clusters of journals and conference proceedings (See Fig. 8.19). The map was
used as a base map for a variety of overlays. In particular, the presence of an
institution can be depicted with this map. A cluster with a clear circle contains
journal papers only. In contrast, a cluster with a shaded circle contains proceeding
papers. As shown in the map, the majority of proceeding papers are located between
computer science (CS) and Physics. Disciplines such as Virology are almost entirely
dominated by journal papers.
More recently, Klavans and Boyack created a new global map of science based on
Scopus 2010. The new Scopus 2010 map is a paper-level map, representing 116,000
clusters of 1.7 million papers (See Fig. 8.20). The Scopus 2010 map is hybrid in
that clusters were generated from citations and the layout was done based on text
similarity. The similarities between clusters were calculated based on words from
titles and abstracts of papers in each cluster using the Okapi BM25 text similarity.
The clustering step did not use a hybrid similarity based on both text and citation
simultaneously. For each cluster, 515 clusters with the strongest connections were
retained. Labels of clusters were manually added.
308
8 Mapping Science
Fig. 8.19 A discipline-level map of 812 clusters of journals and proceedings. Each node is a
cluster. The size of a node represents the number of papers in the cluster (Reprinted from Boyack
2009 with permission)
Just as what we have described earlier in the book about a geographic base map
and thematic overlays, global maps of scientific disciplines provide a convenient
base map to depict additional thematic features. Figure 8.21 shows an example of
adding a thematic overlay to the Scopus 2010 base map. The overlay superimposes
a layer of orange dots on clusters in the Scopus 2010 map. The orange dots mark the
papers that acknowledged the support of grants from the National Cancer Institute
(NCI). The overlay provides an intuitive overview of the scope of NCI grants in the
context of research areas.
8.4.2 Interdisciplinarity and Interactive Overlays

In parallel to the efforts we introduced earlier, researchers have been developing another promising approach to generate global science maps and use them
to facilitate the analysis of issues concerning interrelated disciplines and the
interdisciplinarity of a research program.
309
Fig. 8.20 The Scopus 2010 global map of 116,000 clusters of 1.7 million articles (Courtesy of
Richard Klavans and Kevin Boyack, reproduced with permission)
Ismael Rafols, a researcher of Science and Technology Policy Research (SPRU)

at the University of Sussex in England, Alan Porter, a professor at the Technology
Policy and Assessment Center of Georgia Institute of Technology in the U.S.A,
and Loet Leydesdorff, a professor in the Amsterdam School of Communication
Research (ASCoR) at the University of Amsterdam, The Netherlands, have been
studying interdisciplinary research, especially topics that have profound societal
challenges such as climate change and the diabetes pandemic. Addressing such
societal challenges requires communications and incorporations of different bodies
of knowledge, both from disparate parts of academia and from social stakeholders.
Interdisciplinary research involves a great deal of cognitive diversity. How can
we measure and convey such cognitive diversity to researchers and evaluators
in individual disciplines? Rafols, Porter, and Leydesdorff developed what they
called science overlay mapping method to study a number of issues concerning
interdisciplinary research (Rafols et al. 2010).
Figure 8.22 shows a global science overlay base map. Each node represents
a Web of Science Category. Loet Leydesdorff provides a set of tools that one
can use to generate an overlay on the base map. One of the earlier papers on
310
8 Mapping Science
Fig. 8.21 An overlay on the Scopus 2010 map shows papers that acknowledge NCI grants
(Courtesy of Kevin Boyack, reproduced with permission)
science overlay maps, a paper published in February 2009 (Leydesdorff and Rafols
2009), was featured as a fast breaking paper by Thomson Rueters ScienceWatch
in December 2009.6 Fast breaking papers are publications that have the largest
percentage increase in citations in their field from one bimonthly update to the next.
The overlay method has two steps: (1) creating a global map of science as the
base map and (2) superimposing a specific set of publications, for example, from a
given institution or topic. Along with the method, the researchers have made a set
of tools available so that everyone could use their tools and generate his or her own
science overlay maps. The toolkit is freely available.7
6
7
http://archive.sciencewatch.com/dr/fbp/2009/09decfbp/09decfbpLeydET/
http://www.leydesdorff.net/overlaytoolkit

Agri Sci
311
Ecol Sci
Geosciences
Infectious
Diseases
Environ Sci & Tech
Clinical Med
Mech Eng
Chemistry
Materials Sci
Biomed Sci
Psychological Sci.
Physics
Health & Social Issues

Computer Sci
Clinical
Psychology
Math Methods
Social Studies
Business & MGT
Econ Polit & Geography
Fig. 8.22 A global science overlay base map. Nodes represent Web of Science Categories. Grey
links represent degree of cognitive similarity (Reprinted from Rafols et al. 2010 with permission)
A collection of interactive science overlay maps are maintained on a web site.8

These interactive maps allow us to explore how disciplines are related and how
individual publications from an organization are distributed across the landscape.
Figure 8.23 is a screenshot of one of the interactive maps. The mouse-over feature
highlights GSKs publications associated with the discipline of clinical medicine in
circled red dots.
Initially, the science overlay map was based only on the Science Citation Index
(SCI). The Social Science Citation Index (SSCI) was incorporated in later versions.
In spite of well-known inaccuracies in the assignation of articles to the Web of
Science Categories, Rafols and Leydesdorff have shown in a series of publications
that the overall structure is quite robust to changes in classifications, to degree of
aggregation using journals rather than subject categories, and over the time period
so far studied (20062010).
In the overlay step, an overlay map superimposes the areas of activity of a
given source of publications, for example, an organization or team, as seen from
its publication and referencing practices, on top of the global science base map.
One can use any document set downloaded from the Web of Science and use it as
http://idr.gatech.edu/maps.php
312
8 Mapping Science
Fig. 8.23 An interactive science overlay map of Glaxo-SmithKlines publications between 2000
and 2009. The red circles are GSKs publications in clinical medicine (as moving mouse-over the
Clinical Medicine label) (Reprinted from Rafols et al. 2010 with permission, available at http://idr.
gatech.edu/usermapsdetail.php?id=61)
an overlay. The strength of this overlay approach is that one can easily identify the
activity of an institution with references spreading over multiple disciplinary regions
as well as an institution with a much focused discipline.
The flexibility of the science overlay maps has been demonstrated in studies
of interdisciplinarity of fields over time (Porter and Rafols 2009), comparing
departments, universities and R&D bases of large corporations (Rafols et al. 2010),
and tracing the diffusion of research topics over science (Leydesdorff and Rafols
2011). Figure 8.24 shows a more recent base map generated by Loet Leydesdorff in
VOSViewer.
8.4.3 Dual-Map Overlays

Many citation maps are designed to show either the sources or the targets of citations
in a single display but not both. The primary reason is that a representation with
a mixed of citing and cited articles may considerably increase the complexity of
its structure and dynamics. There doesnt seem to be a clear gain if we combine
them together in a single view. Although it is conceivable that a combined structure
may be desirable in situations such as a heated debate, researchers are in general
more concerned with differentiating various arguments before considering how to
combine them.
313
Fig. 8.24 A similarity map of JCR journals shown in VOSViewer
The Butterfly designed by Jock Mackinlay and his colleagues at Xerox shows
both ends in the same view, but the focus is at the individual paper level rather than
at a macroscopic level of thousands of journals (Mackinlay et al. 1995). Eugene
Garfields HistCite depicts direct citations in the literature. However, as the number
of citations increase, the network tends to become cluttered, which is a common
problem to network representations.
We introduce a dual-map overlay design that depicts both the citing overlay
and the cited overlay maps in the same view. The dual-map overlay has several
advantages over a single overlay map. First, it represents a citation instance
completely. One can see where it is originated and where it points to at a glance.
Second, it makes it easy to compare patterns of citations made by distinct groups of
authors, for example, authors from different organizations, or authors from the same
organization at different points of time. Third, it opens up more research questions
that can be addressed in new ways of analysis. For example, it becomes possible
to study the interdisciplinarity at both source and target sides. It becomes possible
to track the movements of scientific frontiers in terms of their footprints in both
base maps.
The construction of a dual-map base shares the initial steps but differs in later
steps. Once the coordinates are available for both citing and cited matrices of
journals, a dual-map overlay can be constructed. It is not necessary to have cluster
information, but additional functions are possible if cluster information is available.
In the rest of the description, we assume that at least one set of clusters are available
314
8 Mapping Science
Fig. 8.25 The Blondel clusters in the citing journal map (left) and the cited journal map (right).
The overlapping polygons suggest that the spatial layout and the membership of clusters still
contain a considerable amount of uncertainty. Metrics calculated based on the coordinates need
to take the uncertainty into account
for each matrix. In this example, clusters are obtained by applying the Blondel
clustering algorithm. Figure 8.25 is a screenshot of the dual-map display, containing
a base map of citing journals (left) and a base map of cited journals (right).
For each journal in the citing network, its cluster membership is stored with
the journal along with its coordinates. The coordinates may be obtained from a
network visualization program such as VOSViewer, Gephi, or Pajek. Members of
each cluster are painted in the map with the same color.
A number of overlays can be added to the dual-map base. Each overlay requires
a set of bibliographic records that contain citation information, i.e. like the records
retrieved from the Web of Science. The smallest set may contain a single article.
There is no limit to the size of the largest set. With journal overlay maps, each
citation instance is represented by an arc from its source journal in the citing base
map to its target journal on the cited base map. Arcs from the same set are displayed
in the same color chosen by the user so that citation patterns from distinct sets can
be distinguished by their unique colors.
Figure 8.26 shows a dual-map display of citations found in publications of two
iSchools between 2003 and 2012. The citation arcs made by the iSchool at Drexel
University are colored in blue, whereas the arcs made by the School of Information
Studies at Syracuse are in magenta. At a glance, the blue arcs on the upper part of
the map suggest that Drexel researchers published in these areas, whereas Syracuse
researchers made few publications in these areas. The dual-map overlay shows that
Drexel researchers not only published in the areas that correspond to mathematics
and systems journals, Drexel researchers publications in journals in other areas are
also influenced by journals related to systems, computing, and mathematics. The
overlapping arcs in the lower half of the map indicate that the two institutions share
their core journals in terms of where they publish.
315
Fig. 8.26 Citation arcs from the publications of Drexels iSchool (blue arcs) and Syracuse School
of Information Studies (magenta arcs) reveal where they differ in terms of both intellectual bases
and research frontiers
Fig. 8.27 h-index papers (cyan) and citers to CiteSpace (red)
As one more example, Fig. 8.27 shows a comparison between two sets of records.
One is a set of papers on h-index (green, mostly appeared in the upper half)
and the other is a set of papers citing the 2006 JASIST paper on CiteSpace II,
mostly originated from the lower right part of the base map of citing journals. This
image shows that research in h-index is widespread, especially published in physics
journals (Blondel cluster #5) and cited journals in similar categories. In contrast,
papers citing CiteSpace II concentrated on a few journals, but they cited journals in
a wide range of clusters of journals.
316
8 Mapping Science
In summary, global science maps provide base maps that enable interactive
overlays. Dual-map overlays display the citing and cited journals in the same view,
which makes it easier to compare the citation behaviors of different groups in terms
of their source journals and target journals.
References
Aksnes DW (2003) Characteristics of highly cited papers. Res Eval 12(3):159170
Ben-David U, Benvenisty N (2011) The tumorigenicity of human embryonic and induced
pluripotent stem cells. Nat Rev Cancer 11(4):268277. doi:10.1038/nrc3034
Bjornson CRR, Rietze RL, Reynolds BA, Magli MC, Vescovi AL (1999) Turning brain into blood:
a hematopoietic fate adopted by adult neural stem cells in vivo. Science 283(5401):534537
Bock C, Kiskinis E, Verstappen G, Gu H, Boulting G, Smith ZD et al (2011) Reference maps of
human ES and iPS cell variation enable high-throughput characterization of pluripotent cell
lines. Cell 144(3):439452
Boland MJ, Hazen JL, Nazor KL, Rodriguez AR, Gifford W, Martin G et al (2009) Adult mice generated from induced pluripotent stem cells. Nature 461(7260):9194. doi:10.1038/nature08310
Borner K, Klavans R, Patek M, Zoss AM, Biberstine JR, Light RP et al (2012) Design and update
of a classification system. The UCSD map of science. PLoS One 7(7):e39464
Bornmann L, Daniel H-D (2006) What do citation counts measure? A review of studies on citing
behavior. J Doc 64(1):4580
Boulting GL, Kiskinis E, Croft GF, Amoroso MW, Oakley DH, Wainger BJ et al (2011) A
functionally characterized test set of human induced pluripotent stem cells. Nat Biotechnol
29(3):279286. doi:10.1038/nbt.1783
Boyack KW (2009) Using detailed maps of science to identify potential collaborations. Scientometrics 79(1):2744
Boyack KW, Klavans R (2010) Co-citation analysis, bibliographic coupling, and direct citation:
which citation approach represents the research front most accurately? J Am Soc Info Sci
Technol 61(12):23892404
Boyack KW, Klavans R, Ingwersen P, Larsen B (2005) Predicting the importance of current papers.
Paper presented at the proceedings of the 10th international conference of the International
Society for Scientometrics and Informetrics. Retrieved from https://cfwebprod.sandia.gov/
cfdocs/CCIM/docs/kwb rk ISSI05b.pdf
Budd JM, Sievert M, Schultz TR (1998) Phenomena of retraction: reasons for retraction and
citations to the publications. JAMA 280:296297
Burt RS (2004) Structural holes and good ideas. Am J Sociol 110(2):349399
Buter R, Noyons E, Van Raan A (2011) Searching for converging research using field to field
citations. Scientometrics 86(2):325338
Chen C (2003) Mapping scientific frontiers: the quest for knowledge visualization. Springer,
London
Chen C (2006) CiteSpace II: detecting and visualizing emerging trends and transient patterns in
scientific literature. J Am Soc Info Sci Technol 57(3):359377
Chen C (2011) Turning points: the nature of creativity. Springer, New York
Chen C (2012) Predictive effects of structural variation on citation counts. J Am Soc Info Sci
Technol 63(3):431449
Chen C, Chen Y, Horowitz M, Hou H, Liu Z, Pellegrino D (2009) Towards an explanatory and
computational theory of scientific discovery. J Informetr 3(3):191209
Chen C, Ibekwe-SanJuan F, Hou J (2010) The structure and dynamics of co-citation clusters: a
multiple-perspective co-citation analysis. J Am Soc Info Sci Technol 61(7):13861409
References
317
Chin MH, Mason MJ, Xie W, Volinia S, Singer M, Peterson C et al (2009) Induced pluripotent stem
cells and embryonic stem cells are distinguished by gene expression signatures. Cell Stem Cell
5(1):111123
Chubin DE (1994) Grants peer-review in theory and practice. Eval Rev 18(1):2030
Chubin DE, Hackett EJ (1990) Paperless science: peer review and U.S. science policy. State
University of New York Press, Albany
Cobo MJ, Lopez-Herrera AG, Herrera-Viedma E, Herrera F (2011) Science mapping software
tools: review, analysis, and cooperative study among tools. [Review]. J Am Soc Info Sci
Technol 62(7):13821402
Cuhls K (2001) Foresight with Delphi surveys in Japan. [Article]. Technol Anal Strateg Manag
13(4):555569
Dewett T, Denisi AS (2004) Exploring scholarly reputation: its more than just productivity.
[Article]. Scientometrics 60(2):249272
Discher DE, Mooney DJ, Zandstra PW (2009) Growth factors, matrices, and forces combine and
control stem cells. Science 324(5935):16731677
Ebert AD, Yu J, Rose FF, Mattis VB, Lorson CL, Thomson JA et al (2009) Induced pluripotent
stem cells from a spinal muscular atrophy patient. Nature 457(7227):277280. doi:10.1038/
nature07677
Fauconnier G, Turner M (1998) Conceptual integration networks. Cognit Sci 22(2):133187
Feng Q, Lu S-J, Klimanskaya I, Gomes I, Kim D, Chung Y et al (2010) Hemangioblastic
derivatives from human induced pluripotent stem cells exhibit limited expansion and early
senescence. Stem Cells 28(4):704712
Fleming L, Bromiley P (2000) A variable risk propensity model of technological risk taking. Paper
presented at the applied statistics workshop. Retrieved from http://courses.gov.harvard.edu/
gov3009/fall00/fleming.pdf
Garfield E (1955) Citation indexes for science: a new dimension in documentation through
association of ideas. Science 122(3159):108111
Gimble JM, Katz AJ, Bunnell BA (2007) Adipose-derived stem cells for regenerative medicine.
Circ Res 100(9):12491260
Glotzbach JP, Wong VW, Gurtner GC, Longaker MT (2011) Regenerative medicine. Curr Probl
Surg 48(3):148212
Hayrynen M (2007) Breakthrough research: funding for high-risk research at the Academy of
Finland. The Academy of Finland, Helsinki
Hettich S, Pazzani MJ (2006) Mining for proposal reviewers: lessons learned at the National
Science Foundation. Paper presented at the KDD06
Hilbe JM (2011) Negative binomial regression, 2nd edn. Cambridge University Press, Cambridge
Hirsch JE (2007) Does the h index have predictive power? Proc Natl Acad Sci
104(49):1919319198
Hong H, Takahashi K, Ichisaka T, Aoi T, Kanagawa O, Nakagawa M et al (2009) Suppression of induced pluripotent stem cell generation by the p53p21 pathway. Nature 460(7259):11321135.
doi:10.1038/nature08235
Hsieh C (2011) Explicitly searching for useful inventions: dynamic relatedness and the costs of
connecting versus synthesizing. Scientometrics 86(2):381404
Kaji K, Norrby K, Paca A, Mileikovsky M, Mohseni P, Woltjen K (2009) Virus-free induction of
pluripotency and subsequent excision of reprogramming factors. Nature 458(7239):771775.
doi:10.1038/nature07864
Kakuk P (2009) The legacy of the Hwang case: research misconduct in biosciences. Sci Eng Ethics
15:545562
Khang G, Kim SH, Kim MS, Rhee JM, Lee HB (2007) Recent and future directions of stem cells
for the application of regenerative medicine. Tissue Eng Regen Med 4(4):441470
Kim D, Kim C-H, Moon J-I, Chung Y-G, Chang M-Y, Han B-S et al (2009a) Generation of human
induced pluripotent stem cells by direct delivery of reprogramming proteins. Cell Stem Cell
4(6):472476
318
8 Mapping Science
Kim JB, Sebastiano V, Wu G, Arauzo-Bravo MJ, Sasse P, Gentile L et al (2009b) Oct4-induced

pluripotency in adult neural stem cells. Cell 136(3):411419
Kiskinis E, Eggan K (2010) Progress toward the clinical application of patient-specific pluripotent
stem cells. J Clin Invest 120(1):5159
Klavans R, Boyack KW (2010) Toward an objective, reliable and accurate method for measuring
research leadership. Scientometrics 82:539553
Korpela KM (2010) How long does it take for scientific literature to purge itself of fraudulent
material? The Breuning case revisited. Curr Med Res Opin 26:843847
Kostoff R (2007) The difference between highly and poorly cited medical articles in the journal
Lancet. Scientometrics 72:513520
Laflamme MA, Chen KY, Naumova AV, Muskheli V, Fugate JA, Dupras SK et al (2007)
Cardiomyocytes derived from human embryonic stem cells in pro-survival factors enhance
function of infarcted rat hearts. Nat Biotechnol 25(9):10151024. doi:10.1038/nbt1327
Lahiri M, Maiya AS, Sulo R, Habiba Berger-Wolf TY (2008) The impact of structural changes on
predictions of diffusion in networks. Paper presented at the 2008 IEEE international conference
on data mining workshops (ICDMW08). Retrieved from http://compbio.cs.uic.edu/mayank/
papers/LahiriMaiyaSuloHabibaBergerWolf ImpactOfStructuralChanges08.pdf
Lambert D (1992) Zero-infated Poisson regression, with an application to defects in manufacturing.
Technometrics 34:114
Laurent LC, Ulitsky I, Slavin I, Tran H, Schork A, Morey R et al (2011) Dynamic changes in the
copy number of pluripotency and cell proliferation genes in human ESCs and iPSCs during
reprogramming and time in culture. Cell Stem Cell 8(1):106118
Levitt J, Thelwall M (2008) Patterns of annual citation of highly cited articles and the prediction
of their citation ranking: a comparison across subjects. Scientometrics 77(1):4160
Leydesdorff L (2001) The challenge of scientometrics: the development, measurement, and selforganization of scientific communications. Universal-Publishers, Boca Raton
Leydesdorff L, Rafols I (2009) A global map of science based on the ISI subject categories. J Am
Soc Info Sci Technol 60(2):348362
Leydesdorff L, Rafols I (2011) Local emergence and global diffusion of research technologies: an
exploration of patterns of network formation. J Am Soc Info Sci Technol 62(5):846860
Li C, Heidt DG, Dalerba P, Burant CF, Zhang L, Adsay V et al (2007) Identification of pancreatic
cancer stem cells. Cancer Res 67(3):10301037
Lipinski C, Hopkins A (2004) Navigating chemical space for biology and medicine. [Article].
Nature 432(7019):855861
Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J et al (2009)
Human DNA methylomes at base resolution show widespread epigenomic differences. Nature
462(7271):315322. doi:10.1038/nature08514
Mackinlay JD, Rao R, Card SK (1995) An organic user interface for searching citation links. Paper
presented at the SIGCHI95
Martin BR (2010) The origins of the concept of foresight in science and technology: an insiders
perspective. Technol Forecast Soc Change 77(9):14381447
Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G et al (2007)
Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature
448(7153):553560. doi:10.1038/nature06008
Miles I (2010) The development of technology foresight: a review. Technol Forecast Soc Change
77(9):14481456
Naik G (2011) Mistakes in scientific studies surge. Wall Street J. Retrieved March 16 2012, from
http://online.wsj.com/article/SB10001424052702303627104576411850666582080.html
Nakagawa M, Koyanagi M, Tanabe K, Takahashi K, Ichisaka T, Aoi T et al (2008) Generation of
induced pluripotent stem cells without Myc from mouse and human fibroblasts. Nat Biotechnol
26(1):101106. doi:10.1038/nbt1374
Neale AV, Northrup J, Dailey R, Marks E, Abrams J (2007) Correction and use of biomedical
literature affected by scientific misconduct. Sci Eng Ethics 13:524
References
319
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA
103(23):85778582
OBrien CA, Pollett A, Gallinger S, Dick JE (2007) A human colon cancer cell capable of
initiating tumour growth in immunodeficient mice. Nature 445(7123):106110.
doi:10.1038/nature05372
Okita K, Nakagawa M, Hyenjong H, Ichisaka T, Yamanaka S (2008) Generation of mouse induced
pluripotent stem cells without viral vectors. Science 322(5903):949953
Patterson M, Chan DN, Ha I, Case D, Cui Y, Handel BV et al (2012) Defining the nature of human
pluripotent stem cell progeny. Cell Res 22(1):178193
Persson O (2010) Are highly cited papers more international? Scientometrics 83(2):397401
Pfeifer MP, Snodgrass GL (1990) The continued use of retracted, invalid scientific literature. J Am
Med Assoc 263:14201423
Phinney DG, Prockop DJ (2007) Concise review: mesenchymal stem/multipotent stromal cells:
the state of transdifferentiation and modes of tissue repaircurrent views. Stem Cells
25(11):28962902
Pirolli P (2007) Information foraging theory: adaptive interaction with information. Oxford
University Press, Oxford
Pittenger MF, Mackay AM, Beck SC, Jaiswal RK, Douglas R, Mosca JD et al (1999) Multilineage
potential of adult human mesenchymal stem cells. Science 284(5411):143147
Polak DJ (2010) Regenerative medicine. Opportunities and challenges: a brief overview. J R Soc
Interface 7:S777S781
Polykandriotis E, Popescu LM, Horch RE (2010) Regenerative medicine: then and now an update
of recent history into future possibilities. J Cell Mol Med 14(10):23502358
Porter AL, Rafols I (2009) Is science becoming more interdisciplinary? Measuring and mapping
six research fields over time. Scientometrics 81(3):719745
Price DD (1965) Networks of scientific papers. Science 149:510515
Rafols I, Porter AL, Leydesdorff L (2010) Science overlay maps: a new tool for research policy
and library management. J Am Soc Info Sci Technol 61(9):18711887
Ricci-Vitiani L, Lombardi DG, Pilozzi E, Biffoni M, Todaro M, Peschle C et al (2007) Identification and expansion of human colon-cancer-initiating cells. Nature 445(7123):111115.
doi:10.1038/nature05384
Service RF (2002) Bell Labs fires star physicist found guilty of forging data. Science 298:3031
Shibata N, Kajikawa Y, Matsushima K (2007) Topological analysis of citation networks to discover
the future core articles. J Am Soc Info Sci Technol 58(6):872882
Shibata N, Kajikawa Y, Takeda Y, Sakata I, Matsushima K (2011) Detecting emerging research
fronts in regenerative medicine by the citation network analysis of scientific publications.
Technol Forecast Soc Change 78:274282
Slaughter BV, Khurshid SS, Fisher OZ, Khademhosseini A, Peppas NA (2009) Hydrogels in
regenerative medicine. Adv Mater 21(3233):33073329
Small H (1999) Visualizing science by citation mapping. J Am Soc Inf Sci 50(9):799813
Soldner F, Hockemeyer D, Beard C, Gao Q, Bell GW, Cook EG et al (2009) Parkinsons
disease patient-derived induced pluripotent stem cells free of viral reprogramming factors. Cell
136(5):964977
Sox HC, Rennle D (2006) Research misconduct, retraction, and cleansing the medical literature:
lessons from the Poehlman case. Ann Intern Med 144:609613
Stadtfeld M, Apostolou E, Akutsu H, Fukuda A, Follett P, Natesan S et al (2010) Aberrant silencing
of imprinted genes on chromosome 12qF1 in mouse induced pluripotent stem cells. Nature
465(7295):175181. doi:10.1038/nature09017
Steen RG (2011) Retractions in the scientific literature: do authors deliberately commit research
fraud? J Med Ethics 37:113117
Swanson DR (1986a) Fish oil, Raynauds syndrome, and undiscovered public knowledge. Perspect
Biol Med 30:718
Swanson DR (1986b) Undiscovered public knowledge. Libr Q 56(2):103118
320
8 Mapping Science
Takahashi K, Yamanaka S (2006) Induction of pluripotent stem cells from mouse embryonic and
adult fibroblast cultures by defined factors. Cell 126(4):663676
Takahashi K, Tanabe K, Ohnuki M, Narita M, Ichisaka T, Tomoda K et al (2007) Induction of
pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131(5):861872
Takeda Y, Kajikawa Y (2010) Tracking modularity in citation networks. Scientometrics 83(3):783
Thomas J, Cook K (2005) Illuminating the path, the research and development agenda for visual
analytics. IEEE CS Press, Los Alamitos
Thomson JA, Itskovitz-Eldor J, Shapiro SS, Waknitz MA, Swiergiel JJ, Marshall VS et al (1998)
Embryonic stem cell lines derived from human blastocysts. Science 282(5391):11451147
Tichy G (2004) The over-optimism among experts in assessment and foresight. [Article]. Technol
Forecast Soc Change 71(4):341363
Trikalinos NA, Evangelou E, Ioannidis JPA (2008) Falsified papers in high-impact journals were
slow to retract and indistinguishable from nonfraudulent papers. J Clin Epidemiol 61:464470
Upham SP, Rosenkopf L, Ungar LH (2010) Positioning knowledge: schools of thought and new
knowledge creation. Scientometrics 83:555581
van Dalen HP, Kenkens K (2005) Signals in science: on the importance of signaling in gaining
attention in science. Scientometrics 64(2):209233
van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric
mapping. [Article]. Scientometrics 84(2):523538
Vierbuchen T, Ostermeier A, Pang ZP, Kokubu Y, Sudhof TC, Wernig M (2010) Direct conversion of fibroblasts to functional neurons by defined factors. Nature 463(7284):10351041.
doi:10.1038/nature08797
von Luxburg U (2006) A tutorial on spectral clustering. From http://www.kyb.mpg.de/fileadmin/
user upload/files/publications/attachments/Luxburg07 tutorial 4488%5b0%5d.pdf
Wager E, Williams P (2011) Why and how do journals retract articles? An analysis of Medline
retractions 19882008. J Med Ethics 37:567570
Wakefield AJ, Murch SH, Anthony A, Linnell J, Casson DM, Malik M et al (1998) Ileal-lymphoidnodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children
(Retracted article. See vol 375, pg 445, 2010). Lancet 351(9103):637641
Walters GD (2006) Predicting subsequent citations to articles published in twelve crimepsychology journals: author impact versus journal impact. Scientometrics 69(3):499510
Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature
393(6684):440442
Weeber M (2003) Advances in literature-based discovery. J Am Soc Info Sci Technol
54(10):913925
Wernig M, Meissner A, Foreman R, Brambrink T, Ku M, Hochedlinger K et al (2007) In vitro
reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448(7151):318324.
doi:10.1038/nature05944
Woltjen K, Michael IP, Mohseni P, Desai R, Mileikovsky M, Hamalainen R et al (2009)
piggyBac transposition reprograms fibroblasts to induced pluripotent stem cells. Nature
458(7239):766770. doi:10.1038/nature07863
Young RA (2011) Control of the embryonic stem cell state. Cell 144(6):940954
Yu J, Vodyanik MA, Smuga-Otto K, Antosiewicz-Bourget J, Frane JL, Tian S et al (2007) Induced
pluripotent stem cell lines derived from human somatic cells. Science 318(5858):19171920
Yu J, Hu K, Smuga-Otto K, Tian S, Stewart R, Slukvin II et al (2009) Human induced pluripotent
stem cells free of vector and transgene sequences. Science 324(5928):797801
Zeileis A, Kleiber C, Jackman S (2011) Regression models for count data in R. from http://cran.rproject.org/web/packages/pscl/vignettes/countreg.pdf
Zhao T, Zhang Z-N, Rong Z, Xu Y (2011) Immunogenicity of induced pluripotent stem cells.
Nature 474(7350):212215. doi:10.1038/nature10135
Zhou H, Wu S, Joo JY, Zhu S, Han DW, Lin T et al (2009) Generation of induced pluripotent stem
cells using recombinant proteins. Cell Stem Cell 4(5):381384

tmp13D3 TMP

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

tmp13D3 TMP

Enviado por

Direitos autorais:

Formatos disponíveis

Chapter 8

8.1 System Perturbation and Structural Variation

C. Chen, Mapping Scientific Frontiers: The Quest for Knowledge Visualization,

8.1.1 Early Signs

8.1 System Perturbation and Structural Variation

8.1.2 A Structural Variation Model

8.1 System Perturbation and Structural Variation

Fig. 8.1 An overview of the structural variation model

In literature-based discovery, Swanson discovered previously unnoticed linkage

8.1 System Perturbation and Structural Variation

8.1.3 Structural Variation Metrics

8.1.3.1 Modularity Change Rate (MCR)

8.1 System Perturbation and Structural Variation

8.1.3.2 Cluster Linkage (CL)

8.1.3.3 Centrality Divergence (CKL )

8.1 System Perturbation and Structural Variation

8.1.4 Statistical Models

We illustrate the model using global citation counts of scientific publications

8.1 System Perturbation and Structural Variation

8.1.5 Complex Network Analysis (19962004)

Incidence Rate Ratios (IRRs) in NB models

8.1 System Perturbation and Structural Variation

8.2 Regenerative Medicine

8.2 Regenerative Medicine

8.2.1 A Scientometric Review

Regenerative medicine is a rapidly growing and fast-moving interdisciplinary

8.2 Regenerative Medicine

Fig. 8.6 Major areas of regenerative medicine

8.2.2 The Structure and Dynamics

Table 8.2 Major clusters of co-cited references

Gastric cancer 2002

Clusters are referred in terms of the labels selected by LLR

Author (Year) Journal, Volume, Page

Takahashi K (2006) Cell, v126, p663

Takahashi K (2007) Cell, v131, p861

Yu JY (2007) Science, v318, p1917

Okita K (2007) Nature, v448, p313

Wernig M (2007) Nature, v448, p318

Park IH (2008) Nature, v451, p141

Nakagawa M (2008) Nat Biotechnol, v26, p101

Okita K (2008) Science, v322, p949

Maherali N (2007) Cell Stem Cell, v1, p55

Stadtfeld M (2008) Science, v322, p945

Cluster #7 induced pluripotent stem cell

Table 8.3 Cited references and citing articles of Cluster #7 on iPSCs

Archacka, Karolina (2010) induced pluripotent stem cells

Lowry, William E. (2010) roadblocks en route to the clinical

Stadtfeld, Matthias (2010) induced pluripotency: history,

Author (Year) Title

8.2 Regenerative Medicine

8.2 Regenerative Medicine

Table 8.5 References with the strongest citation bursts

Table 8.6 Structurally and temporally significant references

Centrality Citations References

377340.46 124.73 0.11

Takahashi K, 2006, Cell, v126, p663

8.2.3 System-Level Indicators

Ricci-Vitiani et al. (2007)

Laflamme et al. (2007)

Gimble et al. (2007) [R]

OBrien et al. (2007)

Khang et al. (2007) [In