# Universality of citation distributions: Towards an objective measure of scientific impact

## Article Info

• Authors: Filippo Radicchi, Santo Fortunato, and Claudio Castellano
• Publication, Year: PNAS, 2008

## General Findings

• The probability that an article is cited $c$ times has large variations between different disciplines
• That being said, all distributions can be represented on a universal curve after rescaling by a relative indicator $c_f = c/c_0$ where $c_0$ is the avg. number of citations per article for that discipline
• i.e. there are large variations of the probability of an article being cited, however, the distributions shape is similar, and can be ovelaid onto one curve across all disciplines - if they are properly rescaled (normalized by the average number of citations)
• This universality is also shown when comparing publications in different years, within the same field
• A generalized h index is presented as well
• This generalzed h index is built on top of the previous work

## Lit Review and Intro

• Citation analysis has a long history and many potential problems have been identified
• Most critical: often a citation does not - nor is it intended to - reflect the scientific quality/relevance of the citied work
• Self-citations
• Implicity citations
• Increase in total number of citations with time
• Correlation b/w number of authors of an article and the number of citations it receives
• Field variance: the fact that papers in certain fields are cited much more (or much less) than other fields
• This is a large problem with respect to fairly evaluating scientific performance across fields
• Many methods have been proposed to try and alleviate this problem
• Typically, they are based on some sort of normalization step, however, how exactly to do this is contentious
• One option requires the use of relative indicators
• relative indicators: ratios between the bare number of citations $c$ and someaverage measure of the citation frequency in the reference field
• B/c empircal studies have shown that the number of article citations varies greatly by field, one may wonder whether the use of a simple normalization factor - like the average number of citations - is an appropriate approach.
• Understanding whether this approach is appropriate - with respect to individual publications - is the purpose of this paper
• The normalizing constant used in this paper is:
• $c_0 =$ average number of citations received by all articles in a specific discipline for the same year

## Variability of Citation Statistics in Different Disciplines

• The chance of a publication being cited strongly depends on the field to which it belongs
• E.g. a publication with 100 citations is ~50x more common in Developmental Biology when compared with Aerospace Engineering
• Thus,

...the simple count of the number of citations is patently misleading to assess whether a article in Developmental Biology is more successful than on in Aerospace Engineering

## Distribution of the Relative Indicator $c_f$

• If you scale all citation values across all fields by the average number of citations within that field ($c_f$) and the plot those values - they collapse very nicely to a single line/shape (see below)
• See the original publication if you are curious about the lognormal curve in equation form
• This allows us to confidently declare a universal curve - independent of specific discipline
• See table 1 below for more details on fit quality ($\chi^2$)

• As another test, Radicchi et al. show publication rankings when
1. ranked by the normal citation count ($c$) and
2. ranked after normalizing by $c_f$
• We see that, when ranked after normalization (by $c_f$), each discipline has a relatively equal amount of representation in the ranking.

## Checking the Longitudinal Viability of this Universality...

• The above analysis was done only for the year of 1999. Does it hold with respect to a longitudinal analysis?

• By comparing multiple disciplines across many years we can see evidence of the relative indicators robustness.

## Towards a Generalized $h$ Index

• The $h$ index of an author is $h$ if $h$ of his $N$ articles have at least $h$ citations each, and the other $N - h$ articles have, at most, $h$ citations each.
• This measure has adopted spectacular popularity for quantifying the success of academics and has been repurposed for many various purposes.
• However, this measure is notoriously limited across disciplines as it is influences by the number of articles that anauthor publishes - which varies between disciplines
• Yet, this variability can also be rescaled away if the number $N$ of publications in a year by an author is divided by the average value of publication in that discipline $N_0$

• This allows for the definition of a generalized $h$ index --> $h_f$
• This generalize $h$ index factors out the additional bias due to different publication rates, thus allowing comparisons among scientists working in different fields
• Calculating the $h_f$ index of an author is then found by:
1. Ranking the articles by their $c_f$ citations (where $c_f = c/c_0$)
2. Rank them from $1\ ...\ n$ where $n =$ the total number of publications
• Note: Pub. with the most citations gets number 1, second most gets 2, and so on increasing by 1 for each publication
3. Calculate the reduced rank for each publication
• Reduced rank $= r/N_0$
• $r =$ rank
• $N =$ avg. # of publications per own field
4. The final generalized $h$ index ($h_f$) = the last reduced rank value such that the corresponding $c_f$ value is larger than the reduced rank

## Generalized $h$ Index Example

$c_f$Rank ($r$)Avg. Pubs. ($N_0$)Reduced Rank ($r/N_0$)Keep Going? ($c_f > r/N_0$)
4.112.5Yes
2.8221Yes
2.2321.5Yes
1.6422No
.852
.462