Escolar Documentos
Profissional Documentos
Cultura Documentos
http://www.ucd.ie/library/bibliometrics
bibliometrics@ucd.ie
Bibliometrics -
an introduction
© UCD Library 2010 Version 3 Mar 2010
Section
Bibliometrics:
an overview
• Research impact can be measured in many ways: quantitative approaches
include publication counts, amount of research income, no of PhD students,
size of research group, no of PI projects, views and downloads of online
outputs, number of patents and licenses obtained, and others.
• The ability to apply it and its importance in the overall assessment of research
varies from field to field
1
What and Why
• Bibliometrics are ways of measuring patterns of authorship, publication,
and the use of literature.
“In our view, a quality • The pressure to use bibliometrics stems from the quantitative nature of the
judgment on a research result which could be argued to have advantages It also holds out the
unit or
possibility of an efficiency advantage, producing a variety of statistics quite
institute can only be given
by peers, based on a
quickly in comparison to the resource-intensive nature of peer-review of
detailed quality and innovation of intellectual thought.
insight into content and
nature of the research • Any move to use these bibliometric approaches as proxy indicators of the
conducted by the group …. impact or quality of published researchers
impact and scientific is highly controversial, even in those disciplines where citation analysis
quality “works” in that much research output is indexed in the main citation data
are by no means identical sources
concepts.”
Bibliometric Study of the • Bibliometric analysis has formed one part of the local UCD Research
University College Dublin Excellence Framework strategy, looking at the impact of the research at
1998-2007, CWTS, institutional and unit level.
February 2009
• The 2 other key areas where bibliometrics are commonly used are:
2
The building blocks
A source dataset
• Collecting the citation information is a huge task, and all sources are highly
selective - only 5 Schools at UCD, for example, have 80% or more of their
substantive research outputs indexed in the ISI citation indexes (CI)
• The main source datasets are those of ISI SCOPUS and Google Scholar
plus subject-specialist options in some fields
• Each collects the citation information from the articles in a select range of
publications only – the overlap between the content of these sources has
been shown to be quite modest in particular studies.
Publication counts
Measure productivity but arguably not impact - 28% of the 8,077 items of UCD
research from 1998-2007 indexed in the ISI Citation Indexes were not cited other
than self-citations. Overall as much as 90% of the papers published in scientific
journals are never cited. The SCImago web-based free product provides an easy
graphical presentation of that material for each journal title.
Citation analysis
3
Issues & Limitations
• In some fields it is not the tradition to cite extensively the work that your
“….We publish in
books and scholarship and research is building upon – yet this is the whole principle
monographs, and in of the citation analysis system.
peer-reviewed
journals. However, • Seminal research is also often taken for granted and not cited
we have a range of
real requirements • Where citation is common, the data sources often do not index the
that include official publications where research in a field is typically published – local
reporting to state publications, non-English, monographs, conference and working papers
agencies and are poorly indexed
authorities; public
archaeology and • Negative citations are counted as valid
communication in
regional and local • Manipulation of the system by such means as self-citation, multiple
journals and in authorship, splitting outputs into many articles and journals favouring
interdisciplinary highly cited review articles
publication across
several journals, that • Defining the field and level of granularity at which to benchmark. This can
most bibliometrics
dramatically alter the result for an individual or group when using
are incapable of
measuring“
normalized benchmarked scores
[UCD academic] • Inappropriate use of citation metrics, such as using the Impact Factor of a
journal to evaluate an individual researcher’s output, or comparing h-index
across fields, ignoring the citation pattern variations found
4
Section
5
ISI Journal Citation Reports
• Journal Citation Reports (JCR) forms part of the subscription-based ISI
suite of Products known as Web of Knowledge which also includes
Web of Science. JCR is the original journal ranking tool, first developed in
the 1950s, and it is the current market leader for journal rankings.
SCImago
SCImago is a freely available web resource available at
http://www.scimagojr.com/ . This uses Scopus data to provide metrics
and statistical data for journals.
The SJR is much like the JIF in principle. However it goes a step further by
mimicking the Google PageRank algorithm. As such it assigns higher
value/weight to citations from more prestigious journals. The SJR covers a
3 year citation window.
Note that SCImago also gives a calculation identical to the ISI Impact
Factor to enable comparison with JCR using a different data source
6
eigenfactor.org
1. eigenfactor.org is a freely available web resource that provides metrics for
journals using data from ISI’s JCR. As well as the eigenfactor score, the
website also provides the Article Influence score which is more directly
comparable with the JCR JIF.
3. The eigenfactor score also takes into account other variables like the
disciplinary relationships between citing and cited journals. It also covers a
5-year citation window. Furthermore, the eigenfactor score is a measure of
the overall impact of a journal, not that of its individual articles (as is the
case with JIF and SJR). For these reasons it is considered quite robust.
This metric might be the most useful for some humanities and social
science subjects as GS, generally speaking, covers more material in these
areas. It also has better coverage of conference proceedings which might
benefit subjects like computer science. The date range flexibility of the h-
index might also suit disciplines where published research is slower to
impact on subsequent publications.
7
SCOPUS
SCOPUS have enhanced their Journal Analyzer product in 2009/2010 and
they have made deals with both SCImago and CTWS Leiden. The SJR
calculation from SCImago is now included in the product
Details:
- Measures contextual citation impact by ‘normalizing’ citation values
- Takes a research field’s citation frequency into account
- Considers immediacy - how quickly a paper is likely to have an impact in a
given field
- Accounts for how well the field is covered by the underlying database
- Calculates without use of a journal’s subject classification to avoid delimitation
- Counters any potential for editorial manipulation
More information about SNIP: http://www.journalindicators.com
CONCLUSION
Using different data sources and different metric tools means that journals can
score better or worse in the different products.
The chart below shows that for two Bioethics titles, it varies with each metric as to
which scores the higher
0.0035008 Eigenfactor*
8
Section
• The three main bibliometric tools, Web of Science, Scopus and Google Scholar
(in collaboration with Publish or Perish software or the Scholarometer browser
add-on for Firefix and Chrome), provide automatic metrics for individual
researchers and they also contain the raw data that can be used to manually
calculate or verify metrics. There are also some specialised tools for certain
disciplines.
• The bibliometrics tools each cover a different range of data, and metrics for the
same individual vary across the 3 products. This should be kept in mind when
assessing individual metrics in any of the tools
9
The Metrics
• A huge variety of metrics have been developed to help assess the output of
researchers. Here are some of the most popular:
• The h-index has become the most popular metric for assessing the output
of individuals since it was developed by Hirsch in 2005. The h-index of an
individual is the number of their papers that have been cited at least h times
e.g. a researcher has a h-index of 25 if 25 of their papers have been cited at
least 25 times.
• 4)The age-weighted citation rate which also accounts for the age of
papers.
Web of Science
• Web of Science (WoS) is part of the ISI suite of products and is the current
market leader for bibliometrics.
10
• Wos allows you to use the Author Finder to identify a single author and
view a list of their publications including citations.
• For this list of publications you can also generate a Citation Report. This
provides metrics including the h-index, total number of papers and total
number of citations. Charts and year-by-year citation analysis are also
provided.
• Another product from the ISI suite, Essential Science Indicators, covers
22 fields in science and provides data for ranking scientists.
11
Scopus
• Scopus allows you to conduct an Author Search to identify a single
author. The search contains useful tools for author disambiguation - by
country, affiliation etc
• For each author you can view a list of publications including citations.
Medline - free resource indexing life science and biomedical publications. Includes
citation data. http://medline.cos.com/
12
Section
• The principle citation databases used in this exercise are: Web of Science
(WoS), Scopus and Google Scholar (GS) and the pros and cons of using
each of the three databases to calculate the h-index are discussed below.
• The h-index can be defined as: A scientist has an index h if h of his/her Np papers
have at least h citations each, and the other (Np – h) papers have no more than h citations
each * e.g. a researcher had a h-index of 25 if 25 of their papers have been cited
at least 25 times.
• These databases are selective in their journal coverage and some disciplines are
better served than others. Also, conference proceedings and monographs,
which are key research outlets in some subjects, are not adequately covered.
These factors should be kept in mind when assessing the h-indices of
researchers in such disciplines.
13
Web of Science
Pros
• Can view stray and orphan records using the “cited references” search
Cons
• Facilities for finding and distinguishing between authors are not great
14
Google Scholar
Pros
• Covers not only journals but academic websites, grey literature, pre-prints,
theses etc
• Always has a master record or creates one from citations (so stray and
orphan records are directly visible in the results list)
Cons
• Does not automatically calculate the h-index (but can use “Publish or
Perish” software to do this)
• Covers some suspect material e.g. course reading lists, student projects etc
• Hit counts and citation counts can be suspect as they are often inflated
15
Scopus
Pros
• The ‘more’ feature allows you quickly view stray and orphan records
Cons
• Depth of coverage is not as impressive as the width, many journals are only
covered for the last 5 years.
• The citation enhanced part of Scopus only dates back to 1995. This results
in a very skewed h-index for researchers with longer careers than this.
• Even citations to pre-1995 articles in articles published after 1995 are not
included in the h-index calculation.
16
Section
Bibliometric toolkit:
the ISI product suite
• Thomson Reuters ISI product set is the market leader for bibliometrics and
their range of products is the most widely used
• Journal Citation Reports is the original journal ranking tool, first developed in
the 1950s, and it is the current market leader for journal rankings
• There are serious issues with use of these tools in many fields, particularly
humanities, applied technologies and multidisciplinary areas, due to lack of
coverage of the literature and inadequate categorisation in these areas
17
ISI – key facts
Key facts
• Citations date back to 1900, the longest set available in any product
• Highly selective citation sets as the building block of all the metrics, given
estimates of 100,000+ journal titles in existence
• Lack of standard author names – all result sets must be checked and
pruned – tools are provided to pick all forms of author name
18
Web of Science
Cited Reference Search for an author or group –
Find the articles that cite a person's work, analyse citing material by geography,
discipline, document type. Includes all citations to the author’s work, not just those
in the selected titles indexed by ISI
19
ISI Journal Citation Reports
Key points
• JCR covers over 6950 Science and over 1980 Social Science journals
• Provides 171 subject categories for science and technology and 55 in the
social sciences –multidisciplinary area are categorised poorly or not at all so
the product is of little use in such areas
• It takes 3 years for a journal to appear in JCR and this time lag is
problematic in fast moving areas – and for new journals
Get a list of top ranked journals in your field – sort to highlight a number
of different aspects: count of citations, 2 year impact, 5 year impact; immediacy
index which measures how soon articles are cited in a journal, and half-life
which measures whether citing continues over time for a journal’s content.
Also provides 2 eigenfactor metrics which take into account the impact of the
citing journals and if they in turn are heavily cited as a weighting factor
Obtain similar metrics for an individual title; also includes analysis year by year
and graphical presentation, details of citing and cited journals, and ranking of
the journal title in all relevant categories in JCR
20
Essential Science Indicators
• Offers data for ranking scientists, institutions, countries, and journals
Science Watch
• Tracks trends and performance in basic research using select data from
Essential Science Indicators
21
Section
Bibliometric toolkit:
SCOPUS
• SCOPUS has improved its Journal Analyzer package and now provides SJR
and SNIP metrics in the product as alternatives to the ISI JIF
22
SCOPUS – key facts
Key facts
The main calculations such as author h-index are also only worked out for
publications from 1996
There are 15,800 peer reviewed titles in SCOPUS - a lot more than in ISI
and this makes the product better for some fields such as Engineering
There is more European content and it does include more languages other
than English than ISI – 60% of coverage is outside the USA
Open Access titles, proceedings, web pages, patents, book series are all
included
23
Authors
Citation Tracker
Use the tools provided to select one author or a group of authors – you can
filter by affiliation, country and subject area and pick variant forms of your
name. This is a strong feature of the product providing assistance in isolating
the correct research outputs, particularly where names are common
• Graphs are provided for h-index, publication count and citation count
24
Journals
Journal Analyzer
Use Journal Analyzer to select a journal title or build up a group of titles. The
analysis includes both graphs and chart displays of:
• Number of articles published per year in the title(s) from 1996 to date
• 2 key metrics for journals ranking are provided – the SJR and the SNIP
which can give quite varying results
• Unlike JCR, this analyzer does not provide ranked lists of journals for your
field. However, the free SCImago website uses the Scopus data set to
provide this type of ranked listing comparable to the ISI JCR product
http://www.scimagojr.com/
25
Articles
Each document returned in a search has an indication of the number of
citations to it found in Scopus
Or, for any particular article in the set, citation tracker can be used to get a
more detailed analysis of the citations by year for that item - as well as
viewing the citing articles
26
Section
Bibliometric toolkit:
Google Scholar woth
Publish or Perish or
Scholarometer
• Google Scholar is one of three principle tools (the others being ISI and
Scopus) used to generate bibliometrics for researchers and for published
research material.
27
Google Scholar
About Google Scholar
It was launched in beta version in November 2004 and is still in its beta or
test phase.
• Covers a diverse range of sources which can lead to higher citation counts
for some articles.
• Provides citation counts for each document that it returns. You can click
on the citation count to view the citing documents.
28
Some Disadvantages of GS for Bibliometrics
•Covers some suspect material e.g. course reading lists, student projects etc
• Results often contain duplicates of the same article (usually as pre-prints and
post-prints) or even false hits.
29
Publish or Perish
About Publish or Perish
PoP results can be copied into Windows applications like Excel or saved as
text files for further analysis.
Results lists should be checked for errors and false hits deselected.
30
Scholarometer
SCHOLAROMETER is a browser add-on for Firefox or Chrome
and provides an alternative to Publish or Perish if using Google
Scholar as the underlying data source for bibliometric analysis.
Notable features are the ability to input various name forms, to remove
unwanted forms of name, to merge duplicate articles and to remove
individual items from the result set that are not the right author – this
improves on the limited abilities Publish or Perish provides which
make that extremely difficult to use for common names
Another feature of this product is the use of Web 2.0 – when searching
for an author, searchers can allocate tags for the research area of
interest and these then appear in twitter and are used in some metrics.
This is an extremely experimental aspect of the product
Indiana University
Bloomington
31
Section
A brief bibliography
General
Editorial: rating research performance
Watson, Roger
2009, October
Journal of Clinical Nursing 18(20), pp. 2781-2782
http://dx.doi.org/10.1111/j.1365-2702.2009.02926.x
Good brief 2 pager on impact factors, h and g index and some pitfalls
Whose metrics? On building citation, usage and access metrics as information services
for scholars
Armbruster, Chris
2009
Working Paper Series, Research Network 1989, Berlin, Germany
http://ssrn.com/abstract=1464706
32
Journal Rankings
Eigenfactor
Crisp, Michael G
2009
Collection Management,34:1,53 — 56
http://dx.doi.org/10.1080/01462670802577279
33
Individual author ranking
34