Universality of citation distributions: Towards an objective measure of scientific impact

Article Info

Authors: Filippo Radicchi, Santo Fortunato, and Claudio Castellano
Publication, Year: PNAS, 2008
Link to Paper

General Findings

The probability that an article is cited $c$ times has large variations between different disciplines
That being said, all distributions can be represented on a universal curve after rescaling by a relative indicator $c_f = c/c_0$ where $c_0$ is the avg. number of citations per article for that discipline
- i.e. there are large variations of the probability of an article being cited, however, the distributions shape is similar, and can be ovelaid onto one curve across all disciplines - if they are properly rescaled (normalized by the average number of citations)
This universality is also shown when comparing publications in different years, within the same field
A generalized h index is presented as well
- This generalzed h index is built on top of the previous work

Lit Review and Intro

Citation analysis has a long history and many potential problems have been identified
- Most critical: often a citation does not - nor is it intended to - reflect the scientific quality/relevance of the citied work
- Additional bias
  - Self-citations
  - Implicity citations
  - Increase in total number of citations with time
  - Correlation b/w number of authors of an article and the number of citations it receives
Field variance: the fact that papers in certain fields are cited much more (or much less) than other fields
- This is a large problem with respect to fairly evaluating scientific performance across fields
Many methods have been proposed to try and alleviate this problem
- Typically, they are based on some sort of normalization step, however, how exactly to do this is contentious
- One option requires the use of relative indicators
  - relative indicators: ratios between the bare number of citations $c$ and someaverage measure of the citation frequency in the reference field
  - B/c empircal studies have shown that the number of article citations varies greatly by field, one may wonder whether the use of a simple normalization factor - like the average number of citations - is an appropriate approach.
Understanding whether this approach is appropriate - with respect to individual publications - is the purpose of this paper
The normalizing constant used in this paper is:
- $c_0 =$ average number of citations received by all articles in a specific discipline for the same year

Variability of Citation Statistics in Different Disciplines

The chance of a publication being cited strongly depends on the field to which it belongs
- E.g. a publication with 100 citations is ~50x more common in Developmental Biology when compared with Aerospace Engineering
- Thus,
  
  ...the simple count of the number of citations is patently misleading to assess whether a article in Developmental Biology is more successful than on in Aerospace Engineering

Distribution of the Relative Indicator $c_f$

If you scale all citation values across all fields by the average number of citations within that field ( $c_f$ ) and the plot those values - they collapse very nicely to a single line/shape (see below)
- See the original publication if you are curious about the lognormal curve in equation form
This allows us to confidently declare a universal curve - independent of specific discipline
See table 1 below for more details on fit quality ( $\chi^2$ )

As another test, Radicchi et al. show publication rankings when
1. ranked by the normal citation count ( $c$ ) and
2. ranked after normalizing by $c_f$
We see that, when ranked after normalization (by $c_f$ ), each discipline has a relatively equal amount of representation in the ranking.

Checking the Longitudinal Viability of this Universality...

The above analysis was done only for the year of 1999. Does it hold with respect to a longitudinal analysis?

By comparing multiple disciplines across many years we can see evidence of the relative indicators robustness.

Towards a Generalized $h$ Index

The $h$ index of an author is $h$ if $h$ of his $N$ articles have at least $h$ citations each, and the other $N - h$ articles have, at most, $h$ citations each.
- This measure has adopted spectacular popularity for quantifying the success of academics and has been repurposed for many various purposes.
  - See here for some more information on this measure.
However, this measure is notoriously limited across disciplines as it is influences by the number of articles that anauthor publishes - which varies between disciplines
Yet, this variability can also be rescaled away if the number $N$ of publications in a year by an author is divided by the average value of publication in that discipline $N_0$

This allows for the definition of a generalized $h$ index --> $h_f$
- This generalize $h$ index factors out the additional bias due to different publication rates, thus allowing comparisons among scientists working in different fields
Calculating the $h_f$ index of an author is then found by:
1. Ranking the articles by their $c_f$ citations (where $c_f = c/c_0$ )
2. Rank them from $1\ ...\ n$ where $n =$ the total number of publications
  - Note: Pub. with the most citations gets number 1, second most gets 2, and so on increasing by 1 for each publication
3. Calculate the reduced rank for each publication
  - Reduced rank $= r/N_0$
  - $r =$ rank
  - $N =$ avg. # of publications per own field
4. The final generalized $h$ index ( $h_f$ ) = the last reduced rank value such that the corresponding $c_f$ value is larger than the reduced rank

Generalized $h$ Index Example

$c_f$	Rank ( $r$ )	Avg. Pubs. ( $N_0$ )	Reduced Rank ( $r/N_0$ )	Keep Going? ( $c_f > r/N_0$ )
4.1	1	2	.5	Yes
2.8	2	2	1	Yes
2.2	3	2	1.5	Yes
1.6	4	2	2	No
.8	5	2
.4	6	2

Universality of citation distributions: Towards an objective measure of scientific impact

Article Info

General Findings

Lit Review and Intro

Variability of Citation Statistics in Different Disciplines

Distribution of the Relative Indicator cfc_fcf​

Checking the Longitudinal Viability of this Universality...

Towards a Generalized hhh Index

Generalized hhh Index Example

Distribution of the Relative Indicator $c_f$

Towards a Generalized $h$ Index

Generalized $h$ Index Example