Searching for superspreaders of information in real-world social media

Authors: Sen Pei, Lev Muchnik, José S. Andrade Jr., Zhiming Zheng, & Hernáan A. Makse
Publication, Year: Scientific Reports, 2014
Link to Paper

Searching for superspreaders of information in real-world social mediaOverview and Major FindingsBackgroundGeneral approach taken in this studyaverage influencerecognition rateResultsLocal Proxy

Overview and Major Findings

Utilizing real-world social network data, the authors compare the performance of common topological measures in order to identify the best way to locate the most influential users within a network.
Metrics compared:
1. k-core/k-shell: see the wikipedia page for a quick crash course or this paper for more details.
2. PageRank: Wikipedia for details
3. Degree: the degree of a node in a network is the number of connections it has to other nodes. Wikipedia for details
Networks examined:
1. LiveJournal.com connections
2. Scientific publishing in American Physical Society (APS)
3. Twitter mention networks
4. Facebook friend/post networks from a regional network corresponding to the city of New Orleans, LA, USA
For all of these networks they show that k-core does the best job of identifying superspreaders of information
They also develop a "local proxy for influence" which appears to perform similarly to k-core — the sum of the nearest neighbors' degree

Background

Central motivation for this research: previous work seeking to develop methods for identifying superspreaders of information did not look at real-world data and instead rely on simulations based on models of infection and or rumor spreading such as SIR.
- As a result, they highlight numerous times that the conclusions within this work is model dependent and has led to conflicting results
- Thus, they collected large real-world network data to test the methods most commonly utilized (PageRank, k-core, degree)

General approach taken in this study

Essentially, what they then do is calculate the average influence and recognition rate for each of the three network topology metrics, for all of the different networks.

`average influence`

average influence $M(k_s, k_{in})$ $k_s$ $k_{in}$ (in-degree) as:

M(k_s,k_{in})=\sum_{i\in\Upsilon(k_s,k_{in})}\frac{M_i}{N(k_s,k_{in})}

Where,

$\Upsilon(k_s,k_{in})$ is the collection of all the users participating in a diffusion of information
$N(k_s,k_{in})$ is the number of those users

`recognition rate`

The recognition rate is defined as

r(f)=\frac{|I_f \cap P_f|}{|I_f|}

$I_f$ $P_f$ $f$ $|I_f|$ $I_f$ .

So, basically what they do is:
1. Create a ranked list of all nodes from highest to lowest influence.
2. $f$
3. $f$ $I_f$ and $P_f$
4. $f$ fraction of the most influential nodes
$I_f$ $P_f$ $r(f)$ $1$ .

Results

They illustrate a number of different ways that k-core does the best for all networks in pretty much all ways. I simply paste the figures and their descriptions below to illustrate the results.

Local Proxy

Since k-core is a global measure it's usefulness is quite limited — rarely to we have a full picture of all the connections within a social network.

Thus, the authors offer a local proxy which seeks to stand in for k-core by utilizing only partial network information.

Because k-core appears to not only take into account the degree of an individual node but also the degree of those around it, the authors create and test two simple metrics:

$k_{sum}$ $i$ $k_{sum}$ $i$ .
$k_{2sum}$ $k_{sum}$ $i$ $i$ 's nearest neighbor as well as the next nearest neighbor.

$k_{sum}$ $k_{2sum}$ doesn't add much additional value.

Note: While this would certainly require much less Twitter data than reconstructing the entire network, gathering this data on just the nearest neighbors for even a small sample of Twitter users (say a few hundred) is still extremely temporally expensive and can require days to simply gather that data.