The spread of low-credibility content by social botsOverview of PaperMore specific findingsIntroResultsLow-Credibility ContentSpreading Patterns and ActorsBot StrategiesBot ImpactRetweet NetworksBot Dissemination Based on Type of Low-Credibility ContentDiscussionFindingsPotential Solutions for this ProblemSuggested Questions for Future Work
Utilizing supervised machine learning tools to detect a Twitter accounts likelihood of being a bot, an analysis of over 14 million messages, more than 400 thousand articles, over the course of ten months, provides evidence that social bots played a disproportionate role in spreading content from low-credibility sources.
Relatively few accounts of bots are responsible for a large share of the traffic that carries misinformation
Two manipulation strategies were identified:
Bots are active in the early moments after a tweet, prior to the article going "viral"
Bots target influential users/accounts through mentions and replies in hopes of increasing the visibility to the content that they are spreading
The level of impact of that bots have on pushing low-credibility content appears equal to that of organically spreading fact-checking content
Low credibility sources appear to be heavily supported by social bots
A complex mix of cognitive, social, and algorithmic biases make us vulnerable to misinformation
Public opinion can be influenced thanks to the low-cost of building fraudulent websites and high-volumes of software controlled profiles — called social bots
Social bots can target those most likely to believe it, taking advantage of our tendencies to pay attention to what's popular, trust information in a social context, and trust our social connections [source]
The fight against online misinformation requires a grounded assessment of the relative impact of different mechanisms by which it spreads. If the problem is mainly driven by cognitive limitations, we need to invest in news literacy education; if social media platforms are fostering the creation of echo chambers, algorithms can be tweaked to broaden exposure to diverse views; and if malicious bots are responsible for many of the falsehoods, we can focus attention on detecting this kind of abuse. Here we focus on gauging the latter effect.
They track low-credibility content.
These are stories which have been shared by domains which have been reported by reputable third parties as routinely publishing various types of low-credibility information
They do this for two reasons:
Track all tweets which link to any of 120 low-credibility sites
Track all tweets which link to stories shared by credible fact-checking organizations
Time Period: mid-May 2016 to the end of March 2017
The popularity distribution of low-credibility article tweets is almost identical to that of fact-checking article tweets (Fig. 1)
This suggests that veracity has no effect on the likelihood to spread virally
Even though these are similar, there are some distinctive patterns in the spread of low-credibility content.
Most tweets with low-credibility articles spread through original tweets and retweets, while few are shared in replies (Fig 2a)
This is different from tweets with fact-checking articles, which are shared mostly via retweets but also replies
In other words, the spreading patterns of low-credibility content are less "conversational."
The more a story was tweeted, the more tweets were concentrated in the hands of a few accounts. These accounts act as "super-spreaders" (Fig 2c)
We hypothesize that the "super-spreaders" of low-credibility content are social bots which are automatically posting links to articles, retweeting other accounts, or performing more sophisticated autonomous tasks, like following and replying to other users.
They use Botometer v3 (so this is actually not the version linked to here), to get a measure of a "Bot score"
They take a random sample of users from their population (with at least one link to a low-credibility article), as well as a selection of the most active users ("super-spreaders")
Of the random sample, only 6% of accounts were labeled bots
Of the super-spreaders, 33% were labeled as bots — 5x as many accounts
See Fig 2d for the comparison of the distributions of these two groups
This confirms that the super-spreaders are significantly more likely to be bots compared to the general population of accounts who share low-credibility content
To investigate how bots may influence Twitter conversations, they examine how whether bots tend to act at certain times
In fact, they find that likely bots are more prevalent in the first few seconds after an article is first published on Twitter than at later times (Fig 3a)
Another strategy found is to mention influential users while linking to low-credibility content
A possible explanation for this strategy is that bots (or rather, their operators) target influential users with content from low-credibility sources, creating the appearance that it is widely shared. The hope is that these targets will then reshare the content to their followers, thus boosting its credibility.
The above figure illustrates who retweets whom
Fig 4b —> Humans do most of the retweeting
Fig 4c —> They retweet articles posted by bots almost as much as tweets from humans
What this figure tells us is that:
People don't discriminate between human accounts and bot accounts when retweeting
Humans are exposed to a high degree of low-credibility information, it is not just bots retweeting bots in a bubble
In fact, we can see in the supplementary figures below that:
Nodes = accounts
Connections = retweets of messages with links to stories. (This type of retweet network is visualized in Fig 1)
They then apply a network dismantling procedure (source)
Disconnect one node at a time
The more the quantities decrease, the more critical those nodes are to the network
They dismantle the network by prioritizing certain accounts to disconnect over others — based on either:
Fig. 5 shows that
Influential nodes are most critical, as may be expected — yet these are unlikely to be bots
Disconnecting nodes with high bot scores is the second best strategy to reduce low-credibility articles (Fig 5a)
These results show that bots are critical in the diffusion network, and that targeting them would significantly improve the quality of information in the network. The spread of links to low-credibility content can be virtually eliminated by disconnecting a small percentage of accounts that are most likely to be bots.
In Fig 6. we can see clearly that bots appear to favor specific types of accounts
Relatively few accounts of bots are responsible for a large share of the traffic that carries misinformation
Two manipulation strategies were identified:
Bots are active in the early moments after a tweet, prior to the article going "viral"
Bots target influential users/accounts through mentions and replies in hopes of increasing the visibility to the content that they are spreading
The level of impact of that bots have on pushing low-credibility content appears equal to that of organically spreading fact-checking content
Low credibility sources appear to be heavily supported by social bots
The present findings complement the recent work by Vosoughi et al. 2018 who argue that bot alone do not entirely explain the success of false news. Their analysis is based on a small subset of articles that are fact-checked, whereas the present work considers a much broader set of articles from low-credibility sources, most of which are not fact-checked. In addition, the analysis of Vousoughi et al. does not consider an important mechanism by which bots can amplify the spread of an article, namely, by reshaping links originally posted by human accounts. Because of these two methodological differences, the present analysis provides new evidence about the role played by bots.
Bot scores might be a useful signal for platforms to utilize to prioritize accounts for further review
The use of CAPTCHs could be useful to increase the difficult of employing social bots
Most potential solutions lead to some sort of trade-off which must be studied carefully
Notes by Matthew R. DeVerna