The spread of true and false news online

Article Info

Contents

The spread of true and false news onlineArticle InfoContentsMajor FindingsLit Review and InfoDefinitionsQuantifying and Comparing Rumor CascadesCharacteristics of CascadesStatic MeasuresDetails on Collected DataFindingsAttempting to understand why falsehoods spread more oftenSummary of Findings from figure above...Quantifying NoveltyMeasuring Emotional Content of Tweet RepliesChecking the Robustness of their findings

Major Findings

Lit Review and Info

Definitions

Fake news

News / Rumors

Rumor Cascades

So, if a rumor “A” is tweeted by 10 people separately, but not retweeted, it would have 10 cascades, each of size one. Conversely, if a second rumor “B” is independently tweeted by two people and each of those two tweets is retweeted 100 times, the rumor would consist of two cascades, each of size 100.

Quantifying and Comparing Rumor Cascades

image-20200921211129804

image-20200921211148686

From S3.1: Time-Inferred Diffusion of Rumor Cascades

Fortunately, we can infer the true retweet path of a tweet by using Twitter’sfollower graph. Figure S5 shows how this is achieved. The left panel in the figureshows the retweet path provided by Twitter’s API. The middle panel shows thatthe bottom user is a follower of the middle user but not of the top user (the userwho tweeted the original tweet). Finally, the right panel shows that using thisinformation, and the fact that the bottom user retweeted after the middle user, itcan be inferred that the bottom user retweeted the middle user and not the top user. If the bottom user was a follower of the top user, then the original diffusionpattern shown in the left panel would stand (i.e., it would have been inferred thatboth the middle and bottom users were retweeting the top user).

Characteristics of Cascades

Static Measures

Since the static measures of reconstructed cascades are not dependent on time, they reorganize diffusion networks in the fashion shown below. Figure S6a shows the "true" diffusion network and S6b shows the reorganized network.

image-20200921213145310

With this example they define:

where denotes the depth of node .

Cascade size: the number of unique users involved in the cascade over time.

Cascade maximum breadth: The maximum number of users involved in the cascade at any depth

Where denotes the breadth of a cascade at depth .

Cascade structural virality: a measure that interpolates between content spread through a single, large broadcast and that which spreads through multiple generations, with any one individual directly responsible for only a fraction of the total spread

Where denotes the length of the shortest path between nodes and .

The figure below visualizes the calculation of each measure detailed above...

image-20200921224057143

 

Details on Collected Data

How rumor cascades were collected

Findings

image-20200922122317293

Main Points:

  1. A greater fraction of false rumors experienced between 1 and 1000 cascades, whereas a greater fraction of true rumors experiencedmore than 1000 cascades (Fig. 1B)

    • This was also true for political rumors (Fig. 1D)
  2. Total false rumors peaked in 2013, 2015, and end of 2016 (around U.S. election)

  3. Political rumors jump during the 2012 and 2016 elections

  4. Political rumors had the largest number of cascades at ~45,000

image-20200922134720596

Main Points:

 

Here is the breakdown for false political vs. other cascades

image-20200922135603082

 

Attempting to understand why falsehoods spread more often

image-20200922165022080

Summary of Findings from figure above...

While we might assume that the network structure of users and/or their individual account characteristics would be fueling the increased spread of falsehoods. For example, they have more followers, they are verified (and thus more trusted) they are much more active, etc. - however, the opposite turned out to be true. These users responsible for spreading false news...

After estimating a model for the likelihood of retweeting they found that:

Since none of the common-sense assumptions could explain the findings, they looked into information theory and bayesian decision theory - specifically, the idea that novel information is more likely to be shared. The theory can be summarized as follows:

Quantifying Novelty

Measuring Emotional Content of Tweet Replies

In order to check whether people perceived these more novel false news tweets, they then measured the emotional content of tweet replies.

To do this, they:

"... we do find that false news is more novel and that novel information is more likely to be retweeted. "

Checking the Robustness of their findings

  1. Checking robustness...

First, as there were multiplecascades for every true and false rumor, the variance of and error terms associated with cascades corresponding to the same rumor will be correlated. We therefore specified cluster-robust standard errors and calculated all variance statistics clustered at the rumor level. We tested the robustness of our findings to this specification by comparing analyses with and without clustered errors and found that, although clustering reduced the precision of our estimates as expected, the directions, magnitudes, and significance of our results did not change, and chi-square () and deviance () goodness-of-fit tests (, ) indicate that the models are well specified (see supplementary materials for more detail).

Checking for selection bias based on types of rumors selected by fact-checkers...

To validate the robustness of our analysis to this selection and the generalizability of our results to all true and false rumor cascades, we independently verified a second sample of rumor cascades that were not verified by any fact-checking organization. These rumors were fact checked by three undergraduate students at Massachusetts Institute of Technology (MIT) and Wellesley College. The annotators, who worked independently and were not aware of one another, agreed on the veracity of 90%of the 13,240 rumor cascades that they investigated and achieved a Fleiss’ kappa of 0.88.

Checking for bots...

We therefore used a sophisticated bot-detection algorithm (35) to identify and remove all bots before running the analysis. When we added bot traffic back into the analysis, we found that none of our main conclusions changed—false news still spread farther, faster, deeper, and more broadly than the truth in all categories of information. The results remained the same when we removed all tweet cascades started by bots, including human retweets of original bot tweets (see supplementary materials, section S8.3) and when we used a second, independent bot-detection algorithm.

 


Notes by: Matthew R. DeVerna