Utilizing real-world social network data, the authors compare the performance of common topological measures in order to identify the best way to locate the most influential users within a network.
For all of these networks they show that k-core does the best job of identifying superspreaders of information
They also develop a "local proxy for influence" which appears to perform similarly to k-core — the sum of the nearest neighbors' degree
Central motivation for this research: previous work seeking to develop methods for identifying superspreaders of information did not look at real-world data and instead rely on simulations based on models of infection and or rumor spreading such as SIR.
recognition ratefor each of the three network topology metrics, for all of the different networks.
average influence, , is calculated for nodes with a given combination of (k-core) and (in-degree) as:
recognition rate is defined as
Where and are sets of nodes ranking in the top fraction by influence and predictor, respectively, and is the number of nodes in .
So, basically what they do is:
Thus, if and were the exact same set of nodes, they value would equal .
They illustrate a number of different ways that
k-core does the best for all networks in pretty much all ways. I simply paste the figures and their descriptions below to illustrate the results.
Since k-core is a global measure it's usefulness is quite limited — rarely to we have a full picture of all the connections within a social network.
Thus, the authors offer a local proxy which seeks to stand in for k-core by utilizing only partial network information.
Because k-core appears to not only take into account the degree of an individual node but also the degree of those around it, the authors create and test two simple metrics:
We can see below that performs on-par with k-core and doesn't add much additional value.
Note: While this would certainly require much less Twitter data than reconstructing the entire network, gathering this data on just the nearest neighbors for even a small sample of Twitter users (say a few hundred) is still extremely temporally expensive and can require days to simply gather that data.