Searching for superspreaders of information in real-world social mediaOverview and Major FindingsBackgroundGeneral approach taken in this studyaverage influence
recognition rate
ResultsLocal Proxy
Utilizing real-world social network data, the authors compare the performance of common topological measures in order to identify the best way to locate the most influential users within a network.
Metrics compared:
Networks examined:
For all of these networks they show that k-core does the best job of identifying superspreaders of information
They also develop a "local proxy for influence" which appears to perform similarly to k-core — the sum of the nearest neighbors' degree
Central motivation for this research: previous work seeking to develop methods for identifying superspreaders of information did not look at real-world data and instead rely on simulations based on models of infection and or rumor spreading such as SIR.
average influence
and recognition rate
for each of the three network topology metrics, for all of the different networks.average influence
The average influence
, , is calculated for nodes with a given combination of (k-core) and (in-degree) as:
Where,
recognition rate
The recognition rate
is defined as
Where and are sets of nodes ranking in the top fraction by influence and predictor, respectively, and is the number of nodes in .
So, basically what they do is:
Thus, if and were the exact same set of nodes, they value would equal .
They illustrate a number of different ways that k-core
does the best for all networks in pretty much all ways. I simply paste the figures and their descriptions below to illustrate the results.
Since k-core is a global measure it's usefulness is quite limited — rarely to we have a full picture of all the connections within a social network.
Thus, the authors offer a local proxy which seeks to stand in for k-core by utilizing only partial network information.
Because k-core appears to not only take into account the degree of an individual node but also the degree of those around it, the authors create and test two simple metrics:
We can see below that performs on-par with k-core and doesn't add much additional value.
Note: While this would certainly require much less Twitter data than reconstructing the entire network, gathering this data on just the nearest neighbors for even a small sample of Twitter users (say a few hundred) is still extremely temporally expensive and can require days to simply gather that data.