Identification of influential spreaders in complex networks

Authors: Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, Fredrik Liljeros, Lev Muchnik, H. Eugene Stanley, and Hernán A. Makse
Publication, Year: Nature Physics, 2010
Link to Article

Identification of influential spreaders in complex networksSummaryIntroductionArgumentGeneral ApproachWhen hubs may not be good spreadersk-shell predicts spreadingk-shell structureSIS Spreading

Summary

The most highly connected individuals within a network are not always the most influential within that network
The most efficient spreaders are those located within the core of the network, as identified by a k—shell decomposition analysis
When multiple spreaders are considered simultaneously, the distance between them becomes the most crucial parameter in determining the extent of spread
Infections persist in the high-k shells in the case where recovered individuals do not develop immunity

Introduction

It was often believed that the most connected people (hubs) are the key players, responsible for the largest scale of the spreading process.
In the context of social network theory, the importance of a node in the spreading process is often associated with the betweenness centrality, a measure of how many shortest paths cross through this node, which is believed to determine who has more `interpersonal influence' on others

Argument

"Here we argue that the topology of the network organization plays an important role such that there are plausible circumstances under which the highly connected nodes or the highest-betweenness nodes have little effect on the range of a given spreading process. For example, if a hub exists at the end of a branch at the periphery of a network, it will have a minimal impact in the spreading process through the core of the network, whereas a less connected person who is strategically placed in the core of the network will have a significant effect that leads to dissemination through a large fraction of the population."

General Approach

$k_S$ ) see Fig. 1k $C_B$ ), on number of real-world networks. The networks utilized are:
1. Friendship network between 3.4 million members of the LiveJournal.com community
2. Network of email contacts in the Computer Science Department of University College London
3. Contact network of inpatients (CNI) collected from hospitals in Sweden
4. Network of actors who have costarred in movies labelled by imdb.com as adult
Utilize the SIR and SIS epidemic models to estimate influence of different nodes

When hubs may not be good spreaders

The size of the population infected in a spreading process is not necessarily related the degree of the node (k) where the spreading started
- Spreading may be different even when starting from nodes of similar degree (Fig. 1 b—d)
$k_S{}$ predicts more accurately the size of the infected population (Fig. 1 b—d)

k-shell predicts spreading

$i$ $i$ $k_S$ $k$ $k_S$ $k$ ) values:
$M(k_S,k) = \sum_{i \epsilon\Upsilon(k_S,k)} \frac{M_i}{N(k_S,k)}$
$\Upsilon(k_S,k)$ $N(k_S,k)$ $(k_S,k)$ values."

$M(k_S,k)$ lead to three general results (illustrated in Fig. 2 (a,c,e,g))

fixed degree $M(k_S,k)$ values.
1. $k$ $k_S$ ) that are poor spreaders.
k $M(k_S,k)$ is approximately independent of the degree of the nodes.
1. $M(k_S,k)$ , k $k$ at the infection origin.
$k_S$ region), fairly independently of their degree

$k$ )] of spreading influence. $k_S$ ) there exist many pathways through which a virus can infect the rest of the network; this result is valid regardless of the node degree."
$k_S$ $M(k_S,C_B)$ Fig. 2 (b,d,f,h) $C_B$ is the betweenness centrality of a node in the network: $C_B$ is not a good predictor for spreading efficiency."

k-shell structure

See the original text for details on the "imprecision functions" covered in [Fig. 3a] — the general gist can be gathered from the figure description
There are many hubs which exist in the periphery of a network [Fig. 3b] and therefore contribute poorly to spreading
- This is a result of the rich topological structure of real-networks — in a fully randomized graph, all hubs are placed in the center of the network [Fig. 3c]
If spreading starts in multiple locations, nodes with the highest degree can spread significantly more than the set of highest k-shell nodes. [Fig. 3d]
- This is because high k-shell nodes are typically connected to one another
- If we select only nodes that are not connected, we see that there is not much difference in the ability to spread and that, in both cases, this leads to the most efficient spreading process
  - $k_S$ $k$ $k$ shell results in a significantly increased spreading."

SIS Spreading

$i$ $\rho_{i}(t)$ $i$ $t$ .

Previous studies have shown that the largest persistence is found in the network hubs.

$k_S$ layers instead, almost irrespectively of the degree of the nodes in the core." Fig. 4 a, b

$\beta$ (spreading probability), the average virus persistence is consistently higher in the inner k shells. Fig. 4c
Nodes in higher-k shells are consistently the most efficient at spreading infection, $\beta$ value (SIR model) Fig. 4d)

$\beta$ $\rho$ $M$ $k$ $k$ $\beta$ range that we studied (Fig. 4c,d). Thus, the k-shell measure is a robust indicator for the spreading efficiency of a node."

Notes by Matthew R. DeVerna