SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
Everyone’s an Influencer:
                                      Quantifying Influence on Twitter

                                               Eytan Bakshy∗                             Jake M. Hofman
                                       University of Michigan, USA                  Yahoo! Research, NY, USA
                                         ebakshy@umich.edu                          hofman@yahoo-inc.com
                                           Winter A. Mason                               Duncan J. Watts
                                       Yahoo! Research, NY, USA                     Yahoo! Research, NY, USA
                                           winteram@yahoo-                           djw@yahoo-inc.com
                                                inc.com

ABSTRACT                                                                            Keywords
In this paper we investigate the attributes and relative influ-                      Communication networks, Twitter, diffusion, influence, word
ence of 1.6M Twitter users by tracking 74 million diffusion                          of mouth marketing.
events that took place on the Twitter follower graph over
a two month interval in 2009. Unsurprisingly, we find that                           1.   INTRODUCTION
the largest cascades tend to be generated by users who have
been influential in the past and who have a large number                                Word-of-mouth diffusion has long been regarded as an im-
of followers. We also find that URLs that were rated more                            portant mechanism by which information can reach large
interesting and/or elicited more positive feelings by workers                       populations, possibly influencing public opinion [14], adop-
on Mechanical Turk were more likely to spread. In spite of                          tion of innovations [26], new product market share [4], or
these intuitive results, however, we find that predictions of                        brand awareness [15]. In recent years, interest among re-
which particular user or URL will generate large cascades                           searchers and marketers alike has increasingly focused on
are relatively unreliable. We conclude, therefore, that word-                       whether or not diffusion can be maximized by seeding a
of-mouth diffusion can only be harnessed reliably by tar-                            piece of information or a new product with certain spe-
geting large numbers of potential influencers, thereby cap-                          cial individuals, often called “influentials” [34, 15] or sim-
turing average effects. Finally, we consider a family of hy-                         ply “influencers,” who exhibit some combination of desirable
pothetical marketing strategies, defined by the relative cost                        attributes—whether personal attributes like credibility, ex-
of identifying versus compensating potential “influencers.”                          pertise, or enthusiasm, or network attributes such as connec-
We find that although under some circumstances, the most                             tivity or centrality—that allows them to influence a dispro-
influential users are also the most cost-effective, under a                           portionately large number of others [10], possibly indirectly
wide range of plausible assumptions the most cost-effective                          via a cascade of influence [31, 16].
performance can be realized using “ordinary influencers”—                               Although appealing, the claim that word-of-mouth diffu-
individuals who exert average or even less-than-average in-                         sion is driven disproportionately by a small number of key
fluence.                                                                             influencers necessarily makes certain assumptions about the
                                                                                    underlying influence process that are not based directly on
                                                                                    empirical evidence. Empirical studies of diffusion are there-
Categories and Subject Descriptors                                                  fore highly desirable, but historically have suffered from two
H.1.2 [Models and Principles]: User/Machine Systems;                                major difficulties. First, the network over which word-of-
J.4 [Social and Behavioral Sciences]: Sociology                                     mouth influence spreads is generally unobservable, hence
                                                                                    influence is difficult to attribute accurately, especially in in-
                                                                                    stances where diffusion propagates for multiple steps [29,
General Terms                                                                       21]. And second, observational data on diffusion are heavily
                                                                                    biased towards “successful” diffusion events, which by virtue
Human Factors
                                                                                    of being large are easily noticed and recorded; thus infer-
∗
  Part of this research was performed while the author was                          ences regarding the attributes of success may also be biased
visiting Yahoo! Research, New York.                                                 [5, 9], especially when such events are rare [8].
                                                                                       For both of these reasons, the micro-blogging service Twit-
                                                                                    ter presents a promising natural laboratory for the study
                                                                                    of diffusion processes. Unlike other user-declared networks
Permission to make digital or hard copies of all or part of this work for           (e.g. Facebook), Twitter is expressly devoted to dissem-
personal or classroom use is granted without fee provided that copies are           inating information, in that users subscribe to broadcasts
not made or distributed for profit or commercial advantage and that copies           of other users; thus the network of “who listens to whom”
bear this notice and the full citation on the first page. To copy otherwise, to      can be reconstructed by crawling the corresponding “follower
republish, to post on servers or to redistribute to lists, requires prior specific   graph”. In addition, because users frequently wish to share
permission and/or a fee.
WSDM’11, February 9–12, 2011, Hong Kong, China.                                     web-content, and because tweets are restricted to 140 char-
Copyright 2011 ACM 978-1-4503-0493-1/11/02 ...$10.00.                               acters in length, a popular strategy has been to use URL
shorteners (e.g. bit.ly, TinyURL, etc.), which effectively         insights that we believe are of general relevance to word-of-
tag distinct pieces of content with unique, easily identifi-       mouth marketing and related activities.
able tokens. Together these features allow us to track the           The remainder of the paper is organized as follows. We
diffusion patterns of all instances in which shortened URLs        review related work on modeling diffusion and quantifying
are shared on Twitter, regardless of their success, thereby       influence in Section 2. In Sections 3 and 4 we provide an
addressing both the observability and sampling difficulties         overview of the collected data, summarizing the structure of
outlined above.                                                   URL cascades on the Twitter follower graph. In Section 5,
   The Twitter ecosystem is also well suited to studying the      we present a predictive model of influence, in which cascade
role of influencers. In general, influencers are loosely defined     sizes of posted URLs are predicted using the individuals’ at-
as individuals who disproportionately impact the spread of        tributes and average size of past cascades. Section 6 explores
information or some related behavior of interest [34, 10, 15,     the relationship between content as characterized by workers
11]. Unfortunately, however, this definition is fraught with       on Amazon’s Mechanical Turk and cascade size. Finally, in
ambiguity regarding the nature of the influence in ques-           Section 7 we use our predictive model of cascade size to ex-
tion, and hence the type of individual who might be con-          amine the cost-effectiveness of targeting individuals to seed
sidered special. Ordinary individuals communicating with          content.
their friends, for example, may be considered influencers,
but so may subject matter experts, journalists, and other         2.   RELATED WORK
semi-public figures, as may highly visible public figures like
                                                                     A number of recent empirical papers have addressed the
media representatives, celebrities, and government officials.
                                                                  matter of diffusion on networks in general, and the attributes
Clearly these types of individuals are capable of influencing
                                                                  and roles of influencers specifically. In early work, Gruhl et
very different numbers of people, but may also exert quite
                                                                  al [13] attempted to infer a transmission network between
different types of influence on them, and even transmit in-
                                                                  bloggers, given time-stamped observations of posts and as-
fluence through different media. For example, a celebrity
                                                                  suming that transmission was governed by an independent
endorsing a product on television or in a magazine adver-
                                                                  cascade model. Contemporaneously, Adar and Adamic [1]
tisement presumably exerts a different sort of influence than
                                                                  used a similar approach to reconstruct diffusion trees among
a trusted friend endorsing the same product in person, who
                                                                  bloggers, and shortly afterwards Leskovec et al. [20] used re-
in turn exerts a different sort of influence than a noted ex-
                                                                  ferrals on an e-commerce site to infer how individuals are
pert writing a review.
                                                                  influenced as a function of how many of their contacts have
   In light of this definitional ambiguity, an especially useful
                                                                  recommended a product.
feature of Twitter is that it not only encompasses various
                                                                     A limitation of these early studies was the lack of “ground
types of entities, but also forces them all to communicate
                                                                  truth” data regarding the network over which the diffusion
in roughly the same way: via tweets to their followers. Al-
                                                                  was taking place. Addressing this problem, more recent
though it remains the case that even users with the same
                                                                  studies have gathered data both on the diffusion process
number of followers do not necessarily exert the same kind
                                                                  and the corresponding network. For example, Sun et al. [29]
of influence, it is at least possible to measure and compare
                                                                  studied diffusion trees of fan pages on Facebook, Bakshy et
the influence of individuals in a standard way, by the ac-
                                                                  al. [3] studied the diffusion of “gestures” between friends in
tivity that is observable on Twitter itself. In this way, we
                                                                  Second Life, and Aral et al. [2] studied adoption of a mobile
avoid the need to label individuals as either influencers or
                                                                  phone application over the Yahoo! messenger network. Most
non-influencers, simply including all individuals in our study
                                                                  closely related to the current research is a series of recent pa-
and comparing their impact directly.
                                                                  pers that examine influence and diffusion on Twitter specif-
   We note, however, that our use of the term influencer cor-
                                                                  ically. Kwak et al. [18] compared three different measures
responds to a particular and somewhat narrow definition of
                                                                  of influence—number of followers, page-rank, and number
influence, specifically the user’s ability to post URLs which
                                                                  of retweets—finding that the ranking of the most influential
diffuse through the Twitter follower graph. We restrict our
                                                                  users differed depending on the measure. Cha et al. [7] also
study to users who “seed” content, meaning they post URLs
                                                                  compared three different measures of influence—number of
that they themselves have not received through the follower
                                                                  followers, number of retweets, and number of mentions—
graph. We quantify the influence of a given post by the num-
                                                                  and also found that the most followed users did not neces-
ber of users who subsequently repost the URL, meaning that
                                                                  sarily score highest on the other measures. Finally, Weng et
they can be traced back to the originating user through the
                                                                  al. [35] compared number of followers and page rank with a
follower graph. We then fit a model that predicts influence
                                                                  modified page-rank measure that accounted for topic, again
using an individual’s attributes and past activity and ex-
                                                                  finding that ranking depended on the influence measure.
amine the utility of such a model for targeting users. Our
                                                                     The present work builds on these earlier contributions in
emphasis on prediction is particularly relevant to our mo-
                                                                  three key respects. First, whereas previous studies have
tivating question. In marketing, for example, the practical
                                                                  quantified influence either in terms of network metrics (e.g.
utility of identifying influencers depends entirely on one’s
                                                                  page rank) or the number of direct, explicit retweets, we
ability to do so in advance. Yet in practice, it is very often
                                                                  measure influence in terms of the size of the entire diffusion
the case that influencers are identified only in retrospect,
                                                                  tree associated with each event (Kwak et al [18] also compute
usually in the aftermath of some outcome of interest, such
                                                                  what they call “retweet trees” but they do not use them as a
as the unexpected success of a previously unknown author
                                                                  measure of influence). While related to other measures, the
or the sudden revival of a languishing brand [10]. By em-
                                                                  size of the diffusion tree is more directly associated with dif-
phasizing ex-ante prediction of influencers over ex-post ex-
                                                                  fusion and the dissemination of information (Goyal et al [12],
planation, our analysis highlights some simple but useable
                                                                  it should be noted, do introduce a similar metric to quantify
                                                                  influence; however, their interest is in identifying community
the same two-month period. We did this by querying the
                                                                  Twitter API to find the followers of every user who posted
                     2
                                                                  a bit.ly URL. Subsequently, we placed those followers in a
               10
                                                                  queue to be crawled, thereby identifying their followers, who
                                                                  were then also placed in the queue, and so on. In this way,
               10    4
                                                                  we obtained a large fraction of the Twitter follower graph
                                                                  comprising all active bit.ly posters and anyone connected to
                                                                  these users via one-way directed chains of followers. Specifi-
     Density




                     6
               10
                                                                  cally, the subgraph comprised approximately 56M users and
                     8
                                                                  1.7B edges.
               10
                                                                     Consistent with previous work [7, 18, 35], both the in-
                                                                  degree (‘followers”) and out-degree (“friends”) distributions
               10   10
                                                                  are highly skewed, but the former much more so—whereas
                                                                  the maximum # of followers was nearly 4M, the maximum
                                                                  # of friends was only about 760K—reflecting the passive
                         101     102     103   104                and one-way nature of the “follow” action on Twitter (i.e.
                               URLs posted                        A can follow B without any action required from B). We
                                                                  emphasize, moreover, that because the crawled graph was
                                                                  seeded exclusively with active users, it is almost certainly
Figure 1: Probability density of number of bit.ly                 not representative of the entire follower graph. In particular,
URLs posted per user                                              active users are likely to have more followers than average,
                                                                  in which case we would expect that the average in-degree
                                                                  will exceed the average out-degree for our sample—as indeed
                                                                  we observe. Table 1 presents some basic statistics of the
“leaders,” not on prediction.) Second, whereas the focus of       distributions of the number of friends, followers and number
previous studies has been largely descriptive (e.g. compar-       of URLs posted per user.
ing the most influential users), we are interested explicitly in
predicting influence; thus we consider all users, not merely
the most influential. Third, in addition to predicting diffu-       Table 1: Statistics of the Twitter follower graph and
sion as a function of the attributes of individual seeds, we      seed activity
also study the effects of content. We believe these differ-                     # Followers # Friends # Seeds Posted
ences bring the understanding of diffusion on Twitter closer          Median      85.00         82.00        11.00
to practical applications, although as we describe later, ex-         Mean      557.10        294.10        46.33
perimental studies are still required.                                Max. 3,984,000.00 759,700.00          54,890

3.     DATA
   To study diffusion on Twitter, we combined two separate
but related sources of data. First, over the two-month pe-        4.   COMPUTING INFLUENCE ON TWITTER
riod of September 13 2009 - November 15 2009 we recorded             To calculate the influence score for a given URL post,
all 1.03B public tweets broadcast on Twitter, excluding Oc-       we tracked the diffusion of the URL from its origin at a
tober 14-16 during which there were intermittent outages in       particular “seed” node through a series of reposts—by that
the Twitter API. Of these, we extracted 87M tweets that           user’s followers, those users’ followers, and so on—until the
included bit.ly URLs and which corresponded to distinct           diffusion event, or cascade, terminated. To do this, we used
diffusion “events,” where each event comprised a single ini-       the time each URL was posted: if person B is following
tiator, or “seed,” followed by some number of repostings of       person A, and person A posted the URL before B and was
the same URL by the seed’s followers, their followers, and so     the only of B’s friends to post the URL, we say person A
on1 . Finally, we identified a subset of 74M diffusion events       influenced person B to post the URL. As Figure 2 shows,
that were initiated by seed users who were active in both         if B has more than one friend who has previously posted
the first and second months of the observation period; thus        the same URL, we have three choices for how to assign the
enabling us to train our regression model on first month           corresponding influence: first, we can assign full credit to the
performance in order to predict second-month performance          friend who posted it first; second we can assign full credit to
(see Section 5). In total, we identified 1.6M seed users who       the friend who posted it most recently (i.e. last); and third,
seeded an average of 46.33 bit.ly URLs each. Figure 1 shows       we can split credit equally among all prior-posting friends.
the distribution of bit.ly URL posts by seed.                        These three assignments effectively make different assump-
   Second, we crawled the portion of the follower graph com-      tions about the influence process: “first influence” rewards
prising all users who had broadcast at least one URL over         primacy, assuming that individuals are influenced when they
                                                                  first see a new piece of information, even if they fail to im-
1
 Our decision to restrict attention to bit.ly URLs was made       mediately act on it, during which time they may see it again;
predominantly for convenience, as bit.ly was at the time          “last influence” assumes the opposite, instead attributing in-
by far the dominant URL shortener on Twitter. Given the
                                                                  fluence to the most recent exposure; and “split influence”
size of the population of users who rely on bit.ly, which is
comparable to the size of all active Twitter users, it seems      assumes either that the likelihood of noticing a new piece
unlikely to differ systematically from users who rely on other     of information, or equivalently the inclination to act on it,
shorteners.                                                       accumulates steadily as the information is posted by more
Figure 2: Three ways of assigning influence to mul-
tiple sources




friends. Having defined immediate influence, we can then
construct disjoint influence trees for every initial posting of a
URL. The number of users in these influence trees—referred
to as “cascades”—thus define the influence score for every
seed. See Figure 3 for some examples of cascades. To check         Figure 3:     Examples of information cascades on
that our results are not an artifact of any particular assump-     Twitter.
tion about how individuals are influenced to repost infor-
mation, we conducted our analysis for all three definitions.
Although particular numerical values varied slightly across        there are many reasons why individuals may choose to pass
the three definitions, the qualitative findings were identical;      along information other than the number and identity of
thus for simplicity we report results only for first influence.      the individuals from whom they received it—in particular,
   Before proceeding, we note that our use of reposting to         the nature of the content itself. Moreover, influencing an-
indicate influence is somewhat more inclusive than the con-         other individual to pass along a piece of information does not
vention of “retweeting” (e.g. using the terminology “RT            necessarily imply any other kind of influence, such as influ-
@username”) which explicitly attributes the original user.         encing their purchasing behavior, or political opinion. Our
An advantage of our approach is that we can include in our         use of the term “influencer” should therefore be interpreted
observations all instances in which a URL was reposted re-         as applying only very narrowly to the ability to consistently
gardless of whether it was acknowledged by the user, thereby       seed cascades that spread further than others. Nevertheless,
greatly increasing the coverage of our observations. (Since        differences in this ability, such as they do exist, can be con-
our study, Twitter has introduced a “retweet” feature that         sidered a certain type of influence, especially when the same
arguably increases the likelihood that reposts will be ac-         information (in this case the same original URL) is seeded
knowledged, but does not guarantee that they will be.) How-        by many different individuals. Moreover, the terms “influen-
ever, a potential disadvantage of our definition is that it         tials” and “influencers” have often been used in precisely this
may mistakenly attribute influence to what is in reality a se-      manner [3]; thus our usage is also consistent with previous
quence of independent events. In particular, it is likely that     work.
users who follow each other will have similar interests and
so are more likely to post the same URL in close succession
than random pairs of users. Thus it is possible that some          5.   PREDICTING INDIVIDUAL INFLUENCE
of what we are labeling influence is really a consequence of           We now investigate an idealized version of how a mar-
homophily [2]. From this perspective, our estimates of in-         keter might identify influencers to seed a word-of-mouth
fluence should be viewed as an upper bound.                         campaign [16], where we note that from a marketer’s per-
   On the other hand, there are reasons to think that our          spective the critical capability is to identify attributes of
measure underestimates actual influence, as re-broadcasting         individuals that consistently predict influence. Reiterating
a URL is a particularly strong signal of interest. A weaker        that by “influence” we mean a user’s ability to seed content
but still relevant measure might be to observe whether a           containing URLs that generate large cascades of reposts, we
given user views the content of a shortened URL, imply-            therefore begin by describing the cascades we are trying to
ing that they are sufficiently interested in what the poster         predict.
has to say that they will take some action to investigate             As Figure 4a shows, the distribution of cascade sizes is
it. Unfortunately click-through data on bit.ly URLs are of-        approximately power-law, implying that the vast majority
ten difficult to interpret, as one cannot distinguish between        of posted URLs do not spread at all (the average cascade
programmatic unshortening events—e.g., from crawlers or            size is 1.14 and the median is 1), while a small fraction
browser extensions—and actual user clicks. Thus we instead         are reposted thousands of times. The depth of the cascade
relied on reposting as a conservative measure of influence,         (Figure 4b) is also right skewed, but more closely resembles
acknowledging that alternative measures of influence should         an exponential distribution, where the deepest cascades can
also be studied as the platform matures.                           propagate as far as nine generations from their origin; but
   Finally, we reiterate that the type of influence we study        again the vast majority of URLs are not reposted at all,
here is of a rather narrow kind: being influenced to pass           corresponding to cascades of size 1 and depth 0 in which
along a particular piece of information. As we discuss later,      the seed is the only node in the tree. Regardless of whether
where the left (right) child is followed if the condition is sat-
                      G                                                                                        G                                       isfied (violated). Leaf nodes give the predicted influence—as
           10−1                                                                                       107
                                                                                                                                                       measured by (log) mean cascade size— for the corresponding
           10−2
                              G

                                  G
                                                                                                      106          G
                                                                                                                                                       partition. Thus, for example, the right-most leaf indicates
                                      G
                                                                                                                                                       that users with upwards of 1870 followers who had on aver-
           10−3




                                                                                          Frequency
                                                                                                      105
                                          G                                                                            G


                                                                                                                                                       age 6.2 reposts by direct followers (past local influence) are
 Density




                                               G

                −4                                 G                                                       4               G
           10                                                                                         10
                −5
                                                       G

                                                            G                                              3                   G
                                                                                                                                                       predicted to have the largest average total influence, gener-
           10                                                                                         10
                                                                G                                                                  G                   ating cascades of approximately 8.7 additional posts.
                −6                                                  G
                                                                                                           2
           10
                                                                         G
                                                                                                      10                               G                  Unsurprisingly, the model indicates that past performance
           10−7                                                                                       101
                                                                                                                                           G
                                                                             G
                                                                                  G                                                            G
                                                                                                                                                   G
                                                                                                                                                       provides the most informative set of features, although it is
                     10   0
                                      10   1
                                                       10   2
                                                                    10   3
                                                                                 10   4
                                                                                                               0       2       4       6       8
                                                                                                                                                       the local, not the total influence that is most informative;
                                               Size                                                                        Depth                       this is likely due to the fact that most non-trivial cascades
                                                                                                                                                       are of depth 1, so that past direct adoption is a reliable pre-
                     (a) Cascade Sizes                                                                     (b) Cascade Depths
                                                                                                                                                       dictor of total adoption. Also unsurprisingly, the number of
                                                                                                                                                       followers is an informative feature. Notably, however, these
Figure 4: (a). Frequency distribution of cascade                                                                                                       are the only two features present in the regression tree, en-
sizes. (b). Distribution of cascade depths.                                                                                                            abling us to visualize influence as a function of these features,
                                                                                                                                                       as shown in Figure 6. This result is somewhat surprising,
                                                                                                                                                       as one might reasonably have expected that individuals who
we study size or depth, therefore, the implication is that                                                                                             follow many others, or very few others, would be distinct
most events do not spread at all, and even moderately sized                                                                                            from the average user. Likewise, one might have expected
cascades are extremely rare.                                                                                                                           that activity level, quantified by the number of tweets, would
   To identify consistently influential individuals, we aggre-                                                                                          also be predictive.
gated all URL posts by user and computed individual-level                                                                                                 Figure 7 shows the fit of the regression tree model for all
influence as the logarithm of the average size of all cascades                                                                                          five cross-validation folds. The location of the circles indi-
for which that user was a seed. We then fit a regression                                                                                                cates the mean predicted and actual values at each leaf of
tree model [6], in which a greedy optimization process recur-                                                                                          the trees, with leaves from different cross-validation folds ap-
sively partitions the feature space, resulting in a piecewise-                                                                                         pearing close to each other; the size of the circles indicates
constant function where the value in each partition is fit to                                                                                           the number of points in each leaf, while the bars show the
the mean of the corresponding training data. An important                                                                                              standard deviation of the actual values at each leaf. The
advantage of regression trees over ordinary linear regression                                                                                          model is extremely well calibrated, in the sense that the
(OLR) in this context is that unlike OLR, which tends to                                                                                               prediction of the average value at each cut of the regres-
fit the vast majority of small cascades at the expense of                                                                                               sion tree is almost exactly the actual average (R2 = 0.98).
larger ones, the piecewise constant nature of the regression                                                                                           This appearance, however, is deceiving. In fact, the model
tree function allows cascades of different sizes to be fit in-                                                                                           fit without averaging predicted and actual values at the leaf
dependently. The result is that the regression tree model                                                                                              nodes is relatively poor (R2 = 0.34), reflecting that although
is much better calibrated than the equivalent OLR model.                                                                                               large cascades tend to be driven by previously successful in-
Moreover, we used folded cross-validation [25] to terminate                                                                                            dividuals with many followers, the extreme scarcity of such
partitioning to prevent over-fitting. Our model included the                                                                                            cascades means that most individuals with these attributes
following features as predictors:                                                                                                                      are not successful either. Thus, while large follower count
                                                                                                                                                       and past success are likely necessary features for future suc-
    1. Seed user attributes
                                                                                                                                                       cess, they are far from sufficient.
                 (a) # followers                                                                                                                          These results place the usual intuition about influencers
                                                                                                                                                       in perspective: individuals who have been influential in the
                 (b) # friends                                                                                                                         past and who have many followers are indeed more likely to
                     (c) # tweets,                                                                                                                     be influential in the future; however, this intuition is cor-
                                                                                                                                                       rect only on average. We also emphasize that these results
                 (d) date of joining                                                                                                                   are based on far more observational data than is typically
    2. Past influence of seed users                                                                                                                     available to marketers—in particular, we have an objec-
                                                                                                                                                       tive measure of influence and extensive data on past per-
                 (a) average, minimum, and maximum total influence                                                                                      formance. Our finding that individual-level predictions of
                                                                                                                                                       influence nevertheless remain relatively unreliable therefore
                 (b) average, minimum, and maximum local influence,                                                                                     strongly suggests that rather than attempting to identify ex-
                                                                                                                                                       ceptional individuals, marketers seeking to exploit word-of-
where past local influence refers to the average number of
                                                                                                                                                       mouth influence should instead adopt portfolio-style strate-
reposts by that user’s immediate followers in the first month
                                                                                                                                                       gies, which target many potential influencers at once and
of the observation period, and past total influence refers to
                                                                                                                                                       therefore rely only on average performance [33].
average total cascade size over the same period. Followers,
friends, number of tweets, and influence (actual and past)
were all log-transformed to account for their skewed distri-                                                                                           6.   THE ROLE OF CONTENT
butions. We then compared predicted influence with actual                                                                                                 An obvious objection to the above analysis is that it fails
influence computed from the second month of observations.                                                                                               to account for the nature of the content that is being shared.
   Figure 5 shows the regression tree for one of the folds.                                                                                            Clearly one might expect that some types of content (e.g.
Conditions at the nodes indicate partitions of the features,                                                                                           YouTube videos) might exhibit a greater tendency to spread
log10(pastLocalInfluence + 1)< 0.1763
                                                                                           |



                log10(pastLocalInfluence + 1)< 0.0416                                                                                         log10(followers + 1)< 3.272




log10(followers + 1)< 2.157               log10(followers + 1)< 2.562                             log10(followers + 1)< 2.519                                                       log10(pastLocalInfluence + 1)< 0.4921




                                                  log10(pastLocalInfluence + 1)< 0.09791                    log10(pastLocalInfluence + 1)< 0.3028 log10(pastLocalInfluence + 1)< 0.3027 log10(pastLocalInfluence + 1)< 0.856
0.0124             0.03631           0.05991                                                  0.1229




                                                        0.09241            0.1452                                      0.1929               0.3045                    0.275              0.4118                0.6034       0.9854



Figure 5: Regression tree fit for one of the five cross-validation folds. Leaf nodes give the predicted influence
for the corresponding partition, where the left (right) child is followed if the node condition is satisfied
(violated).



                                                                                                                                                                              britneyspears                 BarackObama
                                                                                                                                                                                                             cnnbrk


                                                                                                                      106                                                                stephenfry

                                                                                                                                                                                   TreySongz

                                                                                                                                                                                                       pigeonPOLL
                                                                                                                                                                   nprnews
                                                                                                                      105
                                                                                                                                                                                    MrEdLover           disneypolls
                                                                                                                                                                                                          iphone_dev
                                                                                                                                                             Orbitz      geohot
                                                                                                          Followers



                                                                                                                                                                               riskybusinessmb
                                                                                                                                                  mslayel                     marissamayer
                                                                                                                      10   4                                      billprady
                                                                                                                                                                               michelebachmann
                                                                                                                                                                        TreysAngels




                                                                                                                                garagemkorova
                                                                                                                      103
                                                                                                                                       wealthtv




                                                                                                                                                       OFA_TX
                                                                                                                      102


                                                                                                                                10-1                        100                   101                 102
                                                                                                                                                            Past Local Influence

                                          (a) All users                                                                                                (b) Top 25 users

Figure 6: Influence as a function of past local influence and number of followers for (a) all users and (b)
users with the top 25 actual influence. Each circle represents a single seed user, where the size of the circle
represents that user’s actual average influence.


than others (e.g. news articles of specialized interest), or                                                               First, we filtered URLs that we knew to be spam or in a lan-
that even the same type of content might vary considerably                                                                 guage other than English. Second, we binned all remaining
in interestingness or suitability for sharing. Conceivably, one                                                            URLs in logarithmic bins choosing an exponent such that
could do considerably better at predicting cascade size if in                                                              we obtained ten bins in total, and the top bin contained the
addition to knowing the attributes of the seed user, one also                                                              100 largest cascades. Third, we sampled all 100 URLs in
knew something about the content of the URL being seeded.                                                                  the top bin, and randomly sampled 100 URLs from each of
   To test this idea, we used humans to classify the content                                                               the remaining bins. In this way, we ensured that our sample
of a sample of 1000 URLs from our study. An advantage                                                                      would reflect the full distribution of cascade sizes.
of this approach over an automated classifier or a topic-                                                                      Given this sample of URLs, we then used Amazon’s Me-
model [35] is that humans can more easily rate content on                                                                  chanical Turk (AMT) to recruit human classifiers. AMT
attributes like “interestingness” or “positive feeling” which                                                              is a system that allows one to recruit workers to do small
are often quite difficult for a machine. A downside of us-                                                                   tasks for small payments, and has been used extensively for
ing humans, however, is that the number of URLs we can                                                                     survey and experimental research [17, 24, 23, 22] and to ob-
classify in this way is necessarily small relative to the total                                                            tain labels for data [28, 27]. We asked the workers to go
sample. Moreover, because the distribution of cascade sizes                                                                to the web page associated with the URL and answer ques-
is so skewed, a uniform random sample of 1000 URLs would                                                                   tions about it. Specifically, we asked them to classify the
almost certainly not contain any large cascades; thus we in-                                                               site as “Spam / Not Spam / Unsure”, as “Media Sharing /
stead obtained a stratified sample in the following manner.                                                                 Social Networking, Blog / Forum, News / Mass Media, or
1. Rated interestingness

                                                                                                        2. Perceived interestingness to an average person
                     1.2
                                                                                               G        3. Rated positive feeling
                     1.0                                                             G
                                                                                     G
                                                                                      G
                                                                                           G

                                                                                                        4. Willingness to share via Email, IM, Twitter, Facebook
  Actual Influence




                     0.8                                                                                   or Digg.
                                                                       G
                                                                           G

                                                                                                        5. Indicator variables for type of URL (see Figure 8a)
                     0.6                                          G
                                                                  G
                                                                   G




                                                            GG                                          6. Indicator variable for category of content (see Fig-
                     0.4                               G
                                                       G
                                                                                                           ure 8b)
                                                G
                                                GGG
                                              GG
                                              GG


                     0.2                G
                                        G
                                        G                                                             Figure 10 shows the model fit including content. Surpris-
                                    G
                               G
                                   G
                                   G                                                               ingly, none of the content features were informative rela-
                           G
                     0.0
                           G
                           G                                                                       tive to the seed user features (hence we omit the regression
                                                                                                   tree itself, which is essentially identical to Figure 5), nor
                                        0.2           0.4        0.6           0.8   1.0           was the model fit (R2 = 0.31) improved by the addition of
                                            Predicted Influence                                    the content features. We note that the slight decrease in
                                                                                                   fit and calibration compared to the content-free model can
                                                                                                   be attributed to two main factors: first, the training set
Figure 7: Actual vs. predicted influence for regres-                                                size is orders of magnitude smaller for the content model
sion tree. The model assigns each seed user to a leaf                                              as we have fewer hand-labeled URLs, and second, here we
in the regression tree. Points representing the av-                                                are making predictions at the single post level, which has
erage actual influence values are placed at the pre-                                                higher variance than the user-averaged influence predicted
diction made by each leaf, with vertical lines rep-                                                in the content-free model.
resenting one standard deviation above and below.                                                     These results are initially surprising, as explanations of
The size of the point is proportional to the num-                                                  success often do invoke the attributes of content to account
ber of diffusion events assigned to the leaf. Note                                                  for what is successful. However, the reason is essentially the
that all five folds are represented here, including the                                             same as above—namely that most explanations of success
fold represented in Figure 5, where proximate lines                                                tend to focus only on observed successes, which invariably
correspond to leaves from different cross-validation                                                represent a small and biased sample of the total population
folds.                                                                                             of events. When the much larger number of non-successes
                                                                                                   are also included, it becomes difficult to identify content-
                                                                                                   based attributes that are consistently able to differentiate
                                                                                                   success from failure at the level of individual events.
Other”, and then to specify one of 10 categories it fell into
(see Figure 8b). We then asked them to rate how broadly
relevant the site was, on a scale from 0 (very niche) to 100                                       7.     TARGETING STRATEGIES
(extremely broad). We also gauged their impression of the                                             Although content was not found to improve predictive per-
site, including how interesting they felt the site was (7-point                                    formance, it remains the case that individual-level attributes—
Likert scale), how interesting the average person would find                                        in particular past local influence and number of followers—
it (7-point Likert scale), and how positively it made them                                         can be used to predict average future influence. Given this
feel (7-point Likert scale). Finally, we asked them to indi-                                       observation, a natural next question is how a hypothetical
cate if they would share the URL using any of the following                                        marketer might exploit available information to optimize the
services: Email, IM, Twitter, Facebook, or Digg.                                                   diffusion of information by systematically targeting certain
   To ensure that our ratings and classifications were reliable,                                    classes of individuals. In order to answer such a question,
we had each URL rated at least 3 times—the average URL                                             however, one must make some assumptions regarding the
was rated 11 times, and the maximum was rated 20 times.                                            costs of targeting individuals and soliciting their coopera-
If more than three workers marked the URL as a bad link                                            tion.
or in a foreign language, it was excluded. In addition, we                                            To illustrate this point we now evaluate the cost-effectiveness
excluded URLs that were marked as spam by the majority of                                          of a hypothetical targeting strategy based on a simple but
workers. This resulted in 795 URLs that we could analyze.                                          plausible family of cost functions ci = ca +fi cf , where ca rep-
   As Figure 9 shows, content that is rated more interesting                                       resents a fixed “acquisition cost” ca per individual i, and cf
tends to generate larger cascades on average, as does content                                      represents a “cost per follower” that each individual charges
that elicits more positive feelings. In addition, Figure 8                                         the marketer for each “sponsored” tweet. Without loss of
shows that certain types of URLs, like those associated with                                       generality we have assumed a value of cf = $0.01, where
shareable media, tend to spread more than URLs associated                                          the choice of units is based on recent news reports of paid
with news sites, while some types of content (e.g “lifestyle”)                                     tweets (http://nyti.ms/atfmzx). For convenience we express
spread more than others.                                                                           the acquisition cost as multiplier α of the per-follower cost;
   To evaluate the additional predictive power of content, we                                      hence ca = αcf .
repeat the regression tree analysis of Section 5 for this subset                                      Because the relative cost of targeting potential “influencers”
of URLs, adding the following content-based features:                                              is an unresolved empirical question, we instead explore a
Lifestyle
                   Media Sharing /
                 Social Networking                                                                         Technology

                                                                                                                 Offbeat




                                                                  General Category
                                                                                                     Entertainment
                     Blog / Forum
   Type of URL




                                                                                                              Gaming

                                                                                                             Science

                            Other
                                                                                          Write In / Other

                                                                                                                  News

                                                                                 Business and Finance
             News / Mass Media
                                                                                                                 Sports


                                     0        50      100   150                                                                0              50       100                          150
                                           Cascade Size                                                                                 Cascade Size
                                         (a) Type of URL                                                          (b) Content Category for URL

Figure 8: (a). Average cascade size for different type os URLs (b).                                                                        Average cascade size for different
categories of content. Error bars are standard errors


wide range of possible assumptions by varying α. For ex-
ample, choosing α to be small corresponds to a cost func-                                            170

tion that is biased towards individuals with relatively few                                                                                                           160
                                                                                                     160

followers, who are cheap and numerous. Conversely when                                                                                                                150


α is sufficiently large, the acquisition cost will tilt toward                                         150
                                                                                      Cascade Size




                                                                                                                                                       Cascade Size
                                                                                                                                                                      140

targeting a small number of highly influential users, mean-                                           140



ing users with a larger number of followers and good track                                           130
                                                                                                                                                                      130




records. Regardless, one must trade off between the number                                            120
                                                                                                                                                                      120



of followers per influencer and the number of individuals who                                         110
                                                                                                                                                                      110


can be targeted, where the optimal tradeoff will depend on
                                                                                                                                                                      100

α. To explore the full range of possibilities allowed by this                                                1       2     3        4     5    6   7                        1   2         3       4     5   6   7


family of cost functions, for each value of α we binned users                                                                  Interest                                                       Feeling

according to their influence as predicted by the regression                                                        (a) Interesting                                           (b) Positive Feeling
tree model and computed the average influence-per-dollar
of the targeted subset for each bin.                                                 Figure 9: (a). Average cascade size for different in-
   As Figure 11 shows, when α = 0—corresponding to a                                 terest ratings (b). Average cascade size for different
situation in which individuals can be located costlessly—                            ratings of positive feeling. Error bars are standard
we find that by far the most cost-effective category is to                             errors.
target the least influential individuals, who exert over fif-
teen times the influence-per-dollar of the most influential
category. Although these individuals are much less influ-
ential (average influence score ≈ 0.01) than average, they
                                                                                     8.                     CONCLUSIONS
also have relatively few followers (average ≈ 14); thus are                             In light of the emphasis placed on prominent individuals
relatively inexpensive. At the other extreme, when α be-                             as optimal vehicles for disseminating information [19], the
come sufficiently large—here α          100, 000, corresponding                        possibility that “ordinary influencers”—individuals who ex-
to an acquisition cost ca = $1, 000—we recover the result                            ert average, or even less-than-average influence—are under
that highly influential individuals are also the most cost-                           many circumstances more cost-effective, is intriguing. We
effective. Although expensive, these users will be preferred                          emphasize, however, that these results are based on statisti-
simply because the acquisition cost prohibits identifying and                        cal modeling of observational data and do not imply causal-
managing large numbers of influencers.                                                ity. It is quite possible, for example, that content seeded
   Finally, Figure 11 reveals that although the most cost-                           by outside sources—e.g., marketers—may diffuse quite dif-
efficient category of influencers corresponds to increasingly                           ferently than content selected by users themselves. Like-
influential individuals as α increases, the transition is sur-                        wise, while we have considered a wide range of possible cost
prisingly slow. For example, even for values of α as high as                         functions, other assumptions about costs are certainly possi-
10,000, (i.e. equivalent to ca = $100) the most cost-efficient                         ble and may lead to different conclusions. For reasons such
influencers are still relatively ordinary users, who exhibit                          as these, our conclusions therefore ought to be viewed as
approximately average influence and connectivity.                                     hypotheses to be tested in properly designed experiments,
                                                                                     not as verified causal statements. Nevertheless, our find-
0.035                                        0
                     2.5                                                                                                     G
                                                                                                                             G
                                                                                                                             G
                                                                                                                                                                                                                             10




                                                                                                                                            Mean actual influence / total cost
                                                                                                                         G

                                                                                                                                 G
                                                                                                                                                                                 0.030                                      100
                                                                                                       G

                     2.0                                                                                                                                                                                                   1000
                                                                                                           G
                                                                                                                                                                                 0.025
                                                                                                                                                                                                                          10000
  Actual Influence




                                                    G                                                          G
                                                                                                       G       G
                     1.5                                                                                                                                                         0.020                                    1e+05
                                                                                             G
                                                                                 G       G
                                                                                                  G



                                                                            G
                                                                                     G
                                                                                                                                                                                 0.015
                     1.0                    G
                                                                           G G
                                                                                     G

                                                                 G
                                                             G                  G
                                                        GG
                                                         G
                                                          G
                                                                 G G
                                                                       G                                                                                                         0.010
                                               G
                     0.5                G
                                        G
                                              G
                                            G G
                                                G


                                            G
                                    G

                                      G
                                       G
                                       G                                                                                                                                         0.005
                                      G
                                  GG
                                  GGG
                                    G

                             G
                             G
                             G
                             G   GG G
                                 GG
                           G
                           GG
                            G
                     0.0   G
                           G                            G




                           0.0              0.5                        1.0                       1.5               2.0       2.5                                                         10−1   10−0.5    100     100.5
                                                Predicted Influence                                                                                                                        Mean predicted influence


Figure 10: Actual vs. predicted influence for regres-
sion tree including content                                                                                                          Figure 11: Return on investment where targeting
                                                                                                                                     cost is defined as ci = ca + fi cf , ca = αcf and α ∈
                                                                                                                                     {0, 10, 100, 1000, 10000, 100000}.
ing regarding the relative efficacy of ordinary influencers is
consistent with previous theoretical work [32] that has also
questioned the feasibility of word-of-mouth strategies that                                                                                 Stanford, California, 2009. Association of Computing
depend on triggering “social epidemics” by targeting special                                                                                Machinery.
individuals.                                                                                                                          [4]   F. M. Bass. A new product growth for model consumer
   We also note that although Twitter is in many respects a                                                                                 durables. Management Science, 15(5):215–227, 1969.
special case, our observation that large cascades are rare is                                                                         [5]   R. A. Berk. An introduction to sample selection bias
likely to apply in other contexts as well. Correspondingly,                                                                                 in sociological data. American Sociological Review,
our conclusion that word-of-mouth information spreads via                                                                                   48(3):386–398, 1983.
many small cascades, mostly triggered by ordinary individ-                                                                            [6]   L. Breiman, J. Friedman, R. Olshen, and C. Stone.
uals, is also likely to apply generally, as has been suggested                                                                              Classification and regression trees. Chapman &
elsewhere [33]. Marketers, planners and other change agents                                                                                 Hall/CRC, 1984.
interested in harnessing word-of-mouth influence could there-
                                                                                                                                      [7]   M. Cha, H. Haddadi, F. Benevenuto, and K. P.
fore benefit first by adopting more precise metrics of influ-
                                                                                                                                            Gummad. Measuring user influence on twitter: The
ence; second by collecting more and better data about poten-
                                                                                                                                            million follower fallacy. In 4th Int’l AAAI Conference
tial influencers over extended intervals of time; and third, by
                                                                                                                                            on Weblogs and Social Media, Washington, DC, 2010.
potentially exploiting ordinary influencers, where the opti-
mal tradeoff between the number of individuals targeted and                                                                            [8]   R. M. Dawes. Everyday irrationality: How
their average level of influence will depend on the specifics                                                                                 pseudo-scientists, lunatics, and the rest of us
of the cost function in question.                                                                                                           systematically fail to think rationally. Westview Pr,
                                                                                                                                            2002.
                                                                                                                                      [9]   J. Denrell. Vicarious learning, undersampling of
9.                   ACKNOWLEDGMENTS                                                                                                        failure, and the myths of management. Organization
     We thank Sharad Goel for helpful conversations.                                                                                        Science, 14(3):227–243, 2003.
                                                                                                                                     [10]   M. Gladwell. The Tipping Point: How Little Things
10.                   REFERENCES                                                                                                            Can Make a Big Difference. Little Brown, New York,
 [1] E. Adar and A. Adamic, Lada. Tracking information                                                                                      2000.
     epidemics in blogspace. In 2005 IEEE/WIC/ACM                                                                                    [11]   J. Goldenberg, S. Han, D. R. Lehmann, and J. W.
     International Conference on Web Intelligence,                                                                                          Hong. The role of hubs in the adoption process.
     Compiegne University of Technology, France, 2005.                                                                                      Journal of Marketing, 73(2):1–13, 2009.
 [2] S. Aral, L. Muchnik, and A. Sundararajan.                                                                                       [12]   A. Goyal, F. Bonchi, and L. V. S. Lakshmanan.
     Distinguishing influence-based contagion from                                                                                           Discovering leaders from community actions. pages
     homophily-driven diffusion in dynamic networks.                                                                                         499–508. ACM, 2008. Proceeding of the 17th ACM
     Proceedings of the National Academy of Sciences,                                                                                       conference on Information and knowledge
     106(51):21544, 2009.                                                                                                                   management.
 [3] E. Bakshy, B. Karrer, and A. Adamic, Lada. Social                                                                               [13]   D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins.
     influence and the diffusion of user-created content. In                                                                                  Information diffusion through blogspace. pages
     10th ACM Conference on Electronic Commerce,                                                                                            491–501. ACM New York, NY, USA, 2004.
[14] E. Katz and P. F. Lazarsfeld. Personal influence; the     [24] S. A. Munson and P. Resnick. Presenting diverse
     part played by people in the flow of mass                      political opinions: how and how much. pages
     communications. Free Press, Glencoe, Ill. 1955.               1457–1466. ACM, 2010.
                                               ”
[15] E. Keller and J. Berry. The Influentials: One             [25] R. R. Picard and R. D. Cook. Cross-validation of
     American in Ten Tells the Other Nine How to Vote,             regression models. Journal of the American Statistical
     Where to Eat, and What to Buy. Free Press, New                Association, 79(387):575–583, 1984.
     York, NY, 2003.                                          [26] E. M. Rogers. Diffusion of innovations. Free Press,
[16] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing             New York, 4th edition, 1995.
     the spread of influence through a social network. In      [27] V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get
     9th ACM SIGKDD International Conference on                    another label? improving data quality and data
     Knowledge Discovery and Data Mining, Washington,              mining using multiple, noisy labelers. pages 614–622.
     DC, USA., 2003. Association of Computing Machinery.           ACM, 2008.
[17] A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user     [28] R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng.
     studies with mechanical turk. Proceedings of the              Cheap and fast-but is it good? evaluating non-expert
     twenty-sixth annual SIGCHI conference on Human                annotations for natural language tasks, 2008.
     factors in computing systems, 2008.                      [29] E. S. Sun, I. Rosenn, C. A. Marlow, and T. M. Lento.
[18] H. Kwak, C. Lee, H. Park, and S. Moon. What is                Gesundheit! modeling contagion through facebook
     twitter, a social network or a news media? pages              news feed. In International Conference on Weblogs
     591–600. ACM, 2010.                                           and Social Media, San Jose, CA, 2009. AAAI.
[19] A. Leavitt, E. Burchard, D. Fisher, and S. Gilbert.      [30] B. Tomlinson and C. Cockram. Sars: Experience at
     The influentials: New approaches for analyzing                 prince of wales hospital, hong kong. The Lancet,
     influence on twitter.                                          361(9368):1486–1487, 2003.
[20] J. Leskovec, A. Adamic, Lada, and A. Huberman,           [31] D. J. Watts. A simple model of information cascades
     Bernardo. The dynamics of viral marketing. ACM                on random networks. Proceedings of the National
     Trans. Web, 1(1):5, 2007.                                     Academy of Science, U.S.A., 99:5766–5771, 2002.
[21] D. Liben-Nowell and J. Kleinberg. Tracing                [32] D. J. Watts and P. S. Dodds. Influentials, networks,
     information flow on a global scale using internet              and public opinion formation. Journal of Consumer
     chain-letter data. Proceedings of the National Academy        Research, 34:441–458, 2007.
     of Sciences, 105(12):4633, 2008.                         [33] D. J. Watts and J. Peretti. Viral marketing for the real
[22] W. Mason and S. Suri. Conducting Behavioral                   world. Harvard Business Review, May:22–23, 2007.
     Research on Amazon’s Mechanical Turk. SSRN               [34] G. Weimann. The Influentials: People Who Influence
     eLibrary, 2010.                                               People. State University of New York Press, Albany,
[23] W. Mason and D. J. Watts. Financial incentives and            NY, 1994.
     the performance of crowds. Proceedings of the ACM        [35] J. Weng, E. P. Lim, J. Jiang, and Q. He. Twitterrank:
     SIGKDD Workshop on Human Computation, pages                   finding topic-sensitive influential twitterers. pages
     77–85, 2009.                                                  261–270. ACM, 2010.

Weitere ähnliche Inhalte

Was ist angesagt?

Slideshare Agnes Jumah, Word Of Mouth Marketing
Slideshare   Agnes Jumah, Word Of Mouth MarketingSlideshare   Agnes Jumah, Word Of Mouth Marketing
Slideshare Agnes Jumah, Word Of Mouth MarketingiQ Student Accommodation
 
Acm tist-v3 n4-tist-2010-11-0317
Acm tist-v3 n4-tist-2010-11-0317Acm tist-v3 n4-tist-2010-11-0317
Acm tist-v3 n4-tist-2010-11-0317StephanieLeBadezet
 
1999 ACM SIGCHI - Counting on Community in Cyberspace
1999   ACM SIGCHI - Counting on Community in Cyberspace1999   ACM SIGCHI - Counting on Community in Cyberspace
1999 ACM SIGCHI - Counting on Community in CyberspaceMarc Smith
 
2000 - CSCW - Conversation Trees and Threaded Chats
2000  - CSCW - Conversation Trees and Threaded Chats2000  - CSCW - Conversation Trees and Threaded Chats
2000 - CSCW - Conversation Trees and Threaded ChatsMarc Smith
 
An Online Social Network for Emergency Management
An Online Social Network for Emergency ManagementAn Online Social Network for Emergency Management
An Online Social Network for Emergency ManagementConnie White
 
Temporal_Patterns_of_Misinformation_Diffusion_in_Online_Social_Networks
Temporal_Patterns_of_Misinformation_Diffusion_in_Online_Social_NetworksTemporal_Patterns_of_Misinformation_Diffusion_in_Online_Social_Networks
Temporal_Patterns_of_Misinformation_Diffusion_in_Online_Social_NetworksHarry Gogonis
 
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...Connie White
 
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...
Visible Effort: A Social Entropy Methodology for  Managing Computer-Mediated ...Visible Effort: A Social Entropy Methodology for  Managing Computer-Mediated ...
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...Sorin Adam Matei
 
2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network AnalysisMarc Smith
 
Abraham
AbrahamAbraham
Abrahamanesah
 
2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networksMarc Smith
 
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyComprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyCSCJournals
 
Van der merwe
Van der merweVan der merwe
Van der merweanesah
 
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc SmithMarc Smith
 
Peace on Facebook? Problematising social media as spaces for intergroup conta...
Peace on Facebook? Problematising social media as spaces for intergroup conta...Peace on Facebook? Problematising social media as spaces for intergroup conta...
Peace on Facebook? Problematising social media as spaces for intergroup conta...Paul Reilly
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorIAEME Publication
 

Was ist angesagt? (19)

Slideshare Agnes Jumah, Word Of Mouth Marketing
Slideshare   Agnes Jumah, Word Of Mouth MarketingSlideshare   Agnes Jumah, Word Of Mouth Marketing
Slideshare Agnes Jumah, Word Of Mouth Marketing
 
Acm tist-v3 n4-tist-2010-11-0317
Acm tist-v3 n4-tist-2010-11-0317Acm tist-v3 n4-tist-2010-11-0317
Acm tist-v3 n4-tist-2010-11-0317
 
1999 ACM SIGCHI - Counting on Community in Cyberspace
1999   ACM SIGCHI - Counting on Community in Cyberspace1999   ACM SIGCHI - Counting on Community in Cyberspace
1999 ACM SIGCHI - Counting on Community in Cyberspace
 
Jf2516311637
Jf2516311637Jf2516311637
Jf2516311637
 
2000 - CSCW - Conversation Trees and Threaded Chats
2000  - CSCW - Conversation Trees and Threaded Chats2000  - CSCW - Conversation Trees and Threaded Chats
2000 - CSCW - Conversation Trees and Threaded Chats
 
An Online Social Network for Emergency Management
An Online Social Network for Emergency ManagementAn Online Social Network for Emergency Management
An Online Social Network for Emergency Management
 
Temporal_Patterns_of_Misinformation_Diffusion_in_Online_Social_Networks
Temporal_Patterns_of_Misinformation_Diffusion_in_Online_Social_NetworksTemporal_Patterns_of_Misinformation_Diffusion_in_Online_Social_Networks
Temporal_Patterns_of_Misinformation_Diffusion_in_Online_Social_Networks
 
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...
 
Prevalence of Deception in Online Customer Reviews
Prevalence of Deception in Online Customer ReviewsPrevalence of Deception in Online Customer Reviews
Prevalence of Deception in Online Customer Reviews
 
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...
Visible Effort: A Social Entropy Methodology for  Managing Computer-Mediated ...Visible Effort: A Social Entropy Methodology for  Managing Computer-Mediated ...
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...
 
Thesis Shaw
Thesis ShawThesis Shaw
Thesis Shaw
 
2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis2009 - Connected Action - Marc Smith - Social Media Network Analysis
2009 - Connected Action - Marc Smith - Social Media Network Analysis
 
Abraham
AbrahamAbraham
Abraham
 
2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks2009-Social computing-Analyzing social media networks
2009-Social computing-Analyzing social media networks
 
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyComprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
 
Van der merwe
Van der merweVan der merwe
Van der merwe
 
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
2010-November-8-NIA - Smart Society and Civic Culture - Marc Smith
 
Peace on Facebook? Problematising social media as spaces for intergroup conta...
Peace on Facebook? Problematising social media as spaces for intergroup conta...Peace on Facebook? Problematising social media as spaces for intergroup conta...
Peace on Facebook? Problematising social media as spaces for intergroup conta...
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behavior
 

Andere mochten auch

2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1Dr.-Ing. Thomas Hartmann
 
さあ、はじめよう。Application Partner
さあ、はじめよう。Application Partnerさあ、はじめよう。Application Partner
さあ、はじめよう。Application PartnerKazuki Nakajima
 
การส่งข้อมูล 5 ภาคเรียน ด้วยโปรแกรม ecitizen
การส่งข้อมูล 5 ภาคเรียน ด้วยโปรแกรม ecitizenการส่งข้อมูล 5 ภาคเรียน ด้วยโปรแกรม ecitizen
การส่งข้อมูล 5 ภาคเรียน ด้วยโปรแกรม ecitizenMamazaki Kasookie
 
Getting to Value: Eleven Chronic Disease Technologies to Watch
Getting to Value: Eleven Chronic Disease Technologies to WatchGetting to Value: Eleven Chronic Disease Technologies to Watch
Getting to Value: Eleven Chronic Disease Technologies to WatchPath of the Blue Eye Project
 
2012.10 - DDI Lifecycle - Moving Forward - 3
2012.10 - DDI Lifecycle - Moving Forward - 32012.10 - DDI Lifecycle - Moving Forward - 3
2012.10 - DDI Lifecycle - Moving Forward - 3Dr.-Ing. Thomas Hartmann
 

Andere mochten auch (8)

Understanding The Participatory News Consumer
Understanding The Participatory News ConsumerUnderstanding The Participatory News Consumer
Understanding The Participatory News Consumer
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
 
さあ、はじめよう。Application Partner
さあ、はじめよう。Application Partnerさあ、はじめよう。Application Partner
さあ、はじめよう。Application Partner
 
WEGO Health: Health Activists Speak Up
WEGO Health: Health Activists Speak UpWEGO Health: Health Activists Speak Up
WEGO Health: Health Activists Speak Up
 
J U A N K D U Q U E
J U A N K  D U Q U EJ U A N K  D U Q U E
J U A N K D U Q U E
 
การส่งข้อมูล 5 ภาคเรียน ด้วยโปรแกรม ecitizen
การส่งข้อมูล 5 ภาคเรียน ด้วยโปรแกรม ecitizenการส่งข้อมูล 5 ภาคเรียน ด้วยโปรแกรม ecitizen
การส่งข้อมูล 5 ภาคเรียน ด้วยโปรแกรม ecitizen
 
Getting to Value: Eleven Chronic Disease Technologies to Watch
Getting to Value: Eleven Chronic Disease Technologies to WatchGetting to Value: Eleven Chronic Disease Technologies to Watch
Getting to Value: Eleven Chronic Disease Technologies to Watch
 
2012.10 - DDI Lifecycle - Moving Forward - 3
2012.10 - DDI Lifecycle - Moving Forward - 32012.10 - DDI Lifecycle - Moving Forward - 3
2012.10 - DDI Lifecycle - Moving Forward - 3
 

Ähnlich wie Everyone's an Influencer: Quantifying Influence on Twitter

Estudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social MediaEstudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social Mediaeliasvillagran
 
Selas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedSelas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedZiya NISANOGLU
 
Who says what to whom on Twitter? - Twitter flow
Who says what to whom on Twitter? - Twitter flowWho says what to whom on Twitter? - Twitter flow
Who says what to whom on Twitter? - Twitter flowJuan Sarasua
 
Towards Decision Support and Goal AchievementIdentifying Ac.docx
Towards Decision Support and Goal AchievementIdentifying Ac.docxTowards Decision Support and Goal AchievementIdentifying Ac.docx
Towards Decision Support and Goal AchievementIdentifying Ac.docxturveycharlyn
 
Detecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionDetecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionSuresh S
 
Who Says What to Whom on Twitter
Who Says What to Whom on TwitterWho Says What to Whom on Twitter
Who Says What to Whom on TwitterAlcance Media Group
 
¿Quién dice qué a quién en Twitter?
¿Quién dice qué a quién en Twitter?¿Quién dice qué a quién en Twitter?
¿Quién dice qué a quién en Twitter?Alcance Media Group
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
Recommender systems in the scope of opinion formation: a model
Recommender systems in the scope of opinion formation: a modelRecommender systems in the scope of opinion formation: a model
Recommender systems in the scope of opinion formation: a modelMarcel Blattner, PhD
 
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKSSECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKSZac Darcy
 
Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)Han Woo PARK
 
APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING
APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING
APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING ijcsit
 
{White Paper} Measuring Global Attention | Appinions
{White Paper} Measuring Global Attention | Appinions{White Paper} Measuring Global Attention | Appinions
{White Paper} Measuring Global Attention | AppinionsAppinions
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Axel Bruns
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Biasgloriakt
 

Ähnlich wie Everyone's an Influencer: Quantifying Influence on Twitter (20)

Estudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social MediaEstudio Influencia Y Pasividad En Social Media
Estudio Influencia Y Pasividad En Social Media
 
Selas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media ExcerptedSelas Turkiye Influence And Passivity In Social Media Excerpted
Selas Turkiye Influence And Passivity In Social Media Excerpted
 
Who says what to whom on Twitter? - Twitter flow
Who says what to whom on Twitter? - Twitter flowWho says what to whom on Twitter? - Twitter flow
Who says what to whom on Twitter? - Twitter flow
 
Towards Decision Support and Goal AchievementIdentifying Ac.docx
Towards Decision Support and Goal AchievementIdentifying Ac.docxTowards Decision Support and Goal AchievementIdentifying Ac.docx
Towards Decision Support and Goal AchievementIdentifying Ac.docx
 
Laurence Favier, University Charles De Gaulle – Lille 3: Social Influence and...
Laurence Favier, University Charles De Gaulle – Lille 3: Social Influence and...Laurence Favier, University Charles De Gaulle – Lille 3: Social Influence and...
Laurence Favier, University Charles De Gaulle – Lille 3: Social Influence and...
 
Detecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervisionDetecting fake news_with_weak_social_supervision
Detecting fake news_with_weak_social_supervision
 
Twitter flow
Twitter flowTwitter flow
Twitter flow
 
Who Says What to Whom on Twitter
Who Says What to Whom on TwitterWho Says What to Whom on Twitter
Who Says What to Whom on Twitter
 
¿Quién dice qué a quién en Twitter?
¿Quién dice qué a quién en Twitter?¿Quién dice qué a quién en Twitter?
¿Quién dice qué a quién en Twitter?
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
ALTMETRICS
ALTMETRICSALTMETRICS
ALTMETRICS
 
Recommender systems in the scope of opinion formation: a model
Recommender systems in the scope of opinion formation: a modelRecommender systems in the scope of opinion formation: a model
Recommender systems in the scope of opinion formation: a model
 
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKSSECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKS
 
Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)
 
APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING
APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING
APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING
 
204
204204
204
 
Research Proposal Writing
Research Proposal Writing Research Proposal Writing
Research Proposal Writing
 
{White Paper} Measuring Global Attention | Appinions
{White Paper} Measuring Global Attention | Appinions{White Paper} Measuring Global Attention | Appinions
{White Paper} Measuring Global Attention | Appinions
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Bias
 

Mehr von Path of the Blue Eye Project

Just-in-time Information through Mobile Connections
Just-in-time Information through Mobile Connections Just-in-time Information through Mobile Connections
Just-in-time Information through Mobile Connections Path of the Blue Eye Project
 
McKinsey Global Institute - Big data: The next frontier for innovation, compe...
McKinsey Global Institute - Big data: The next frontier for innovation, compe...McKinsey Global Institute - Big data: The next frontier for innovation, compe...
McKinsey Global Institute - Big data: The next frontier for innovation, compe...Path of the Blue Eye Project
 
Young Physicians and their Views Regarding the Future of the U.S. Healthcare ...
Young Physicians and their Views Regarding the Future of the U.S. Healthcare ...Young Physicians and their Views Regarding the Future of the U.S. Healthcare ...
Young Physicians and their Views Regarding the Future of the U.S. Healthcare ...Path of the Blue Eye Project
 
IBM and Cincom: Guiding Smarter Interactions in Healthcare Reform
IBM and Cincom: Guiding Smarter Interactions in Healthcare ReformIBM and Cincom: Guiding Smarter Interactions in Healthcare Reform
IBM and Cincom: Guiding Smarter Interactions in Healthcare ReformPath of the Blue Eye Project
 
MARS Study Outlines Online Habits of Health Consumers
MARS Study Outlines Online Habits of Health Consumers MARS Study Outlines Online Habits of Health Consumers
MARS Study Outlines Online Habits of Health Consumers Path of the Blue Eye Project
 
Most Marketers Unaware of What Digital ROI Means, Fail to Measure it Appropri...
Most Marketers Unaware of What Digital ROI Means, Fail to Measure it Appropri...Most Marketers Unaware of What Digital ROI Means, Fail to Measure it Appropri...
Most Marketers Unaware of What Digital ROI Means, Fail to Measure it Appropri...Path of the Blue Eye Project
 
Overview of FDA Guidance Document Developed by Dose of Digital
Overview of FDA Guidance Document Developed by Dose of DigitalOverview of FDA Guidance Document Developed by Dose of Digital
Overview of FDA Guidance Document Developed by Dose of DigitalPath of the Blue Eye Project
 
FDA Draft Guidance on Responding to Unsolicited Requests for Off-Label Drug I...
FDA Draft Guidance on Responding to Unsolicited Requests for Off-Label Drug I...FDA Draft Guidance on Responding to Unsolicited Requests for Off-Label Drug I...
FDA Draft Guidance on Responding to Unsolicited Requests for Off-Label Drug I...Path of the Blue Eye Project
 

Mehr von Path of the Blue Eye Project (20)

CDC Facebook guidelines
CDC Facebook guidelinesCDC Facebook guidelines
CDC Facebook guidelines
 
Pew Research Center: Cell Internet Use 2012
Pew Research Center: Cell Internet Use 2012Pew Research Center: Cell Internet Use 2012
Pew Research Center: Cell Internet Use 2012
 
Just-in-time Information through Mobile Connections
Just-in-time Information through Mobile Connections Just-in-time Information through Mobile Connections
Just-in-time Information through Mobile Connections
 
CDC's Health Communicator's Toolkit
CDC's Health Communicator's ToolkitCDC's Health Communicator's Toolkit
CDC's Health Communicator's Toolkit
 
CDC's Guide to writing for social media
CDC's Guide to writing for social mediaCDC's Guide to writing for social media
CDC's Guide to writing for social media
 
Productive Social-Marketing
Productive Social-MarketingProductive Social-Marketing
Productive Social-Marketing
 
McKinsey Global Institute - Big data: The next frontier for innovation, compe...
McKinsey Global Institute - Big data: The next frontier for innovation, compe...McKinsey Global Institute - Big data: The next frontier for innovation, compe...
McKinsey Global Institute - Big data: The next frontier for innovation, compe...
 
Young Physicians and their Views Regarding the Future of the U.S. Healthcare ...
Young Physicians and their Views Regarding the Future of the U.S. Healthcare ...Young Physicians and their Views Regarding the Future of the U.S. Healthcare ...
Young Physicians and their Views Regarding the Future of the U.S. Healthcare ...
 
IBM and Cincom: Guiding Smarter Interactions in Healthcare Reform
IBM and Cincom: Guiding Smarter Interactions in Healthcare ReformIBM and Cincom: Guiding Smarter Interactions in Healthcare Reform
IBM and Cincom: Guiding Smarter Interactions in Healthcare Reform
 
Pew Research Center: Digital Differences
Pew Research Center: Digital DifferencesPew Research Center: Digital Differences
Pew Research Center: Digital Differences
 
Recall Bias and Digital Health Market Research
Recall Bias and Digital Health Market ResearchRecall Bias and Digital Health Market Research
Recall Bias and Digital Health Market Research
 
MARS Study Outlines Online Habits of Health Consumers
MARS Study Outlines Online Habits of Health Consumers MARS Study Outlines Online Habits of Health Consumers
MARS Study Outlines Online Habits of Health Consumers
 
Most Marketers Unaware of What Digital ROI Means, Fail to Measure it Appropri...
Most Marketers Unaware of What Digital ROI Means, Fail to Measure it Appropri...Most Marketers Unaware of What Digital ROI Means, Fail to Measure it Appropri...
Most Marketers Unaware of What Digital ROI Means, Fail to Measure it Appropri...
 
Search Engine Use 2012
Search Engine Use 2012Search Engine Use 2012
Search Engine Use 2012
 
Nielsen Digital Consumer Report Q3-Q4 2011
Nielsen Digital Consumer Report Q3-Q4 2011Nielsen Digital Consumer Report Q3-Q4 2011
Nielsen Digital Consumer Report Q3-Q4 2011
 
Overview of FDA Guidance Document Developed by Dose of Digital
Overview of FDA Guidance Document Developed by Dose of DigitalOverview of FDA Guidance Document Developed by Dose of Digital
Overview of FDA Guidance Document Developed by Dose of Digital
 
FDA Draft Guidance on Responding to Unsolicited Requests for Off-Label Drug I...
FDA Draft Guidance on Responding to Unsolicited Requests for Off-Label Drug I...FDA Draft Guidance on Responding to Unsolicited Requests for Off-Label Drug I...
FDA Draft Guidance on Responding to Unsolicited Requests for Off-Label Drug I...
 
Unniched mo tech
Unniched mo techUnniched mo tech
Unniched mo tech
 
Poster instrat final_rev2
Poster instrat final_rev2Poster instrat final_rev2
Poster instrat final_rev2
 
Unniched instrat
Unniched instratUnniched instrat
Unniched instrat
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Everyone's an Influencer: Quantifying Influence on Twitter

  • 1. Everyone’s an Influencer: Quantifying Influence on Twitter Eytan Bakshy∗ Jake M. Hofman University of Michigan, USA Yahoo! Research, NY, USA ebakshy@umich.edu hofman@yahoo-inc.com Winter A. Mason Duncan J. Watts Yahoo! Research, NY, USA Yahoo! Research, NY, USA winteram@yahoo- djw@yahoo-inc.com inc.com ABSTRACT Keywords In this paper we investigate the attributes and relative influ- Communication networks, Twitter, diffusion, influence, word ence of 1.6M Twitter users by tracking 74 million diffusion of mouth marketing. events that took place on the Twitter follower graph over a two month interval in 2009. Unsurprisingly, we find that 1. INTRODUCTION the largest cascades tend to be generated by users who have been influential in the past and who have a large number Word-of-mouth diffusion has long been regarded as an im- of followers. We also find that URLs that were rated more portant mechanism by which information can reach large interesting and/or elicited more positive feelings by workers populations, possibly influencing public opinion [14], adop- on Mechanical Turk were more likely to spread. In spite of tion of innovations [26], new product market share [4], or these intuitive results, however, we find that predictions of brand awareness [15]. In recent years, interest among re- which particular user or URL will generate large cascades searchers and marketers alike has increasingly focused on are relatively unreliable. We conclude, therefore, that word- whether or not diffusion can be maximized by seeding a of-mouth diffusion can only be harnessed reliably by tar- piece of information or a new product with certain spe- geting large numbers of potential influencers, thereby cap- cial individuals, often called “influentials” [34, 15] or sim- turing average effects. Finally, we consider a family of hy- ply “influencers,” who exhibit some combination of desirable pothetical marketing strategies, defined by the relative cost attributes—whether personal attributes like credibility, ex- of identifying versus compensating potential “influencers.” pertise, or enthusiasm, or network attributes such as connec- We find that although under some circumstances, the most tivity or centrality—that allows them to influence a dispro- influential users are also the most cost-effective, under a portionately large number of others [10], possibly indirectly wide range of plausible assumptions the most cost-effective via a cascade of influence [31, 16]. performance can be realized using “ordinary influencers”— Although appealing, the claim that word-of-mouth diffu- individuals who exert average or even less-than-average in- sion is driven disproportionately by a small number of key fluence. influencers necessarily makes certain assumptions about the underlying influence process that are not based directly on empirical evidence. Empirical studies of diffusion are there- Categories and Subject Descriptors fore highly desirable, but historically have suffered from two H.1.2 [Models and Principles]: User/Machine Systems; major difficulties. First, the network over which word-of- J.4 [Social and Behavioral Sciences]: Sociology mouth influence spreads is generally unobservable, hence influence is difficult to attribute accurately, especially in in- stances where diffusion propagates for multiple steps [29, General Terms 21]. And second, observational data on diffusion are heavily biased towards “successful” diffusion events, which by virtue Human Factors of being large are easily noticed and recorded; thus infer- ∗ Part of this research was performed while the author was ences regarding the attributes of success may also be biased visiting Yahoo! Research, New York. [5, 9], especially when such events are rare [8]. For both of these reasons, the micro-blogging service Twit- ter presents a promising natural laboratory for the study of diffusion processes. Unlike other user-declared networks Permission to make digital or hard copies of all or part of this work for (e.g. Facebook), Twitter is expressly devoted to dissem- personal or classroom use is granted without fee provided that copies are inating information, in that users subscribe to broadcasts not made or distributed for profit or commercial advantage and that copies of other users; thus the network of “who listens to whom” bear this notice and the full citation on the first page. To copy otherwise, to can be reconstructed by crawling the corresponding “follower republish, to post on servers or to redistribute to lists, requires prior specific graph”. In addition, because users frequently wish to share permission and/or a fee. WSDM’11, February 9–12, 2011, Hong Kong, China. web-content, and because tweets are restricted to 140 char- Copyright 2011 ACM 978-1-4503-0493-1/11/02 ...$10.00. acters in length, a popular strategy has been to use URL
  • 2. shorteners (e.g. bit.ly, TinyURL, etc.), which effectively insights that we believe are of general relevance to word-of- tag distinct pieces of content with unique, easily identifi- mouth marketing and related activities. able tokens. Together these features allow us to track the The remainder of the paper is organized as follows. We diffusion patterns of all instances in which shortened URLs review related work on modeling diffusion and quantifying are shared on Twitter, regardless of their success, thereby influence in Section 2. In Sections 3 and 4 we provide an addressing both the observability and sampling difficulties overview of the collected data, summarizing the structure of outlined above. URL cascades on the Twitter follower graph. In Section 5, The Twitter ecosystem is also well suited to studying the we present a predictive model of influence, in which cascade role of influencers. In general, influencers are loosely defined sizes of posted URLs are predicted using the individuals’ at- as individuals who disproportionately impact the spread of tributes and average size of past cascades. Section 6 explores information or some related behavior of interest [34, 10, 15, the relationship between content as characterized by workers 11]. Unfortunately, however, this definition is fraught with on Amazon’s Mechanical Turk and cascade size. Finally, in ambiguity regarding the nature of the influence in ques- Section 7 we use our predictive model of cascade size to ex- tion, and hence the type of individual who might be con- amine the cost-effectiveness of targeting individuals to seed sidered special. Ordinary individuals communicating with content. their friends, for example, may be considered influencers, but so may subject matter experts, journalists, and other 2. RELATED WORK semi-public figures, as may highly visible public figures like A number of recent empirical papers have addressed the media representatives, celebrities, and government officials. matter of diffusion on networks in general, and the attributes Clearly these types of individuals are capable of influencing and roles of influencers specifically. In early work, Gruhl et very different numbers of people, but may also exert quite al [13] attempted to infer a transmission network between different types of influence on them, and even transmit in- bloggers, given time-stamped observations of posts and as- fluence through different media. For example, a celebrity suming that transmission was governed by an independent endorsing a product on television or in a magazine adver- cascade model. Contemporaneously, Adar and Adamic [1] tisement presumably exerts a different sort of influence than used a similar approach to reconstruct diffusion trees among a trusted friend endorsing the same product in person, who bloggers, and shortly afterwards Leskovec et al. [20] used re- in turn exerts a different sort of influence than a noted ex- ferrals on an e-commerce site to infer how individuals are pert writing a review. influenced as a function of how many of their contacts have In light of this definitional ambiguity, an especially useful recommended a product. feature of Twitter is that it not only encompasses various A limitation of these early studies was the lack of “ground types of entities, but also forces them all to communicate truth” data regarding the network over which the diffusion in roughly the same way: via tweets to their followers. Al- was taking place. Addressing this problem, more recent though it remains the case that even users with the same studies have gathered data both on the diffusion process number of followers do not necessarily exert the same kind and the corresponding network. For example, Sun et al. [29] of influence, it is at least possible to measure and compare studied diffusion trees of fan pages on Facebook, Bakshy et the influence of individuals in a standard way, by the ac- al. [3] studied the diffusion of “gestures” between friends in tivity that is observable on Twitter itself. In this way, we Second Life, and Aral et al. [2] studied adoption of a mobile avoid the need to label individuals as either influencers or phone application over the Yahoo! messenger network. Most non-influencers, simply including all individuals in our study closely related to the current research is a series of recent pa- and comparing their impact directly. pers that examine influence and diffusion on Twitter specif- We note, however, that our use of the term influencer cor- ically. Kwak et al. [18] compared three different measures responds to a particular and somewhat narrow definition of of influence—number of followers, page-rank, and number influence, specifically the user’s ability to post URLs which of retweets—finding that the ranking of the most influential diffuse through the Twitter follower graph. We restrict our users differed depending on the measure. Cha et al. [7] also study to users who “seed” content, meaning they post URLs compared three different measures of influence—number of that they themselves have not received through the follower followers, number of retweets, and number of mentions— graph. We quantify the influence of a given post by the num- and also found that the most followed users did not neces- ber of users who subsequently repost the URL, meaning that sarily score highest on the other measures. Finally, Weng et they can be traced back to the originating user through the al. [35] compared number of followers and page rank with a follower graph. We then fit a model that predicts influence modified page-rank measure that accounted for topic, again using an individual’s attributes and past activity and ex- finding that ranking depended on the influence measure. amine the utility of such a model for targeting users. Our The present work builds on these earlier contributions in emphasis on prediction is particularly relevant to our mo- three key respects. First, whereas previous studies have tivating question. In marketing, for example, the practical quantified influence either in terms of network metrics (e.g. utility of identifying influencers depends entirely on one’s page rank) or the number of direct, explicit retweets, we ability to do so in advance. Yet in practice, it is very often measure influence in terms of the size of the entire diffusion the case that influencers are identified only in retrospect, tree associated with each event (Kwak et al [18] also compute usually in the aftermath of some outcome of interest, such what they call “retweet trees” but they do not use them as a as the unexpected success of a previously unknown author measure of influence). While related to other measures, the or the sudden revival of a languishing brand [10]. By em- size of the diffusion tree is more directly associated with dif- phasizing ex-ante prediction of influencers over ex-post ex- fusion and the dissemination of information (Goyal et al [12], planation, our analysis highlights some simple but useable it should be noted, do introduce a similar metric to quantify influence; however, their interest is in identifying community
  • 3. the same two-month period. We did this by querying the Twitter API to find the followers of every user who posted 2 a bit.ly URL. Subsequently, we placed those followers in a 10 queue to be crawled, thereby identifying their followers, who were then also placed in the queue, and so on. In this way, 10 4 we obtained a large fraction of the Twitter follower graph comprising all active bit.ly posters and anyone connected to these users via one-way directed chains of followers. Specifi- Density 6 10 cally, the subgraph comprised approximately 56M users and 8 1.7B edges. 10 Consistent with previous work [7, 18, 35], both the in- degree (‘followers”) and out-degree (“friends”) distributions 10 10 are highly skewed, but the former much more so—whereas the maximum # of followers was nearly 4M, the maximum # of friends was only about 760K—reflecting the passive 101 102 103 104 and one-way nature of the “follow” action on Twitter (i.e. URLs posted A can follow B without any action required from B). We emphasize, moreover, that because the crawled graph was seeded exclusively with active users, it is almost certainly Figure 1: Probability density of number of bit.ly not representative of the entire follower graph. In particular, URLs posted per user active users are likely to have more followers than average, in which case we would expect that the average in-degree will exceed the average out-degree for our sample—as indeed we observe. Table 1 presents some basic statistics of the “leaders,” not on prediction.) Second, whereas the focus of distributions of the number of friends, followers and number previous studies has been largely descriptive (e.g. compar- of URLs posted per user. ing the most influential users), we are interested explicitly in predicting influence; thus we consider all users, not merely the most influential. Third, in addition to predicting diffu- Table 1: Statistics of the Twitter follower graph and sion as a function of the attributes of individual seeds, we seed activity also study the effects of content. We believe these differ- # Followers # Friends # Seeds Posted ences bring the understanding of diffusion on Twitter closer Median 85.00 82.00 11.00 to practical applications, although as we describe later, ex- Mean 557.10 294.10 46.33 perimental studies are still required. Max. 3,984,000.00 759,700.00 54,890 3. DATA To study diffusion on Twitter, we combined two separate but related sources of data. First, over the two-month pe- 4. COMPUTING INFLUENCE ON TWITTER riod of September 13 2009 - November 15 2009 we recorded To calculate the influence score for a given URL post, all 1.03B public tweets broadcast on Twitter, excluding Oc- we tracked the diffusion of the URL from its origin at a tober 14-16 during which there were intermittent outages in particular “seed” node through a series of reposts—by that the Twitter API. Of these, we extracted 87M tweets that user’s followers, those users’ followers, and so on—until the included bit.ly URLs and which corresponded to distinct diffusion event, or cascade, terminated. To do this, we used diffusion “events,” where each event comprised a single ini- the time each URL was posted: if person B is following tiator, or “seed,” followed by some number of repostings of person A, and person A posted the URL before B and was the same URL by the seed’s followers, their followers, and so the only of B’s friends to post the URL, we say person A on1 . Finally, we identified a subset of 74M diffusion events influenced person B to post the URL. As Figure 2 shows, that were initiated by seed users who were active in both if B has more than one friend who has previously posted the first and second months of the observation period; thus the same URL, we have three choices for how to assign the enabling us to train our regression model on first month corresponding influence: first, we can assign full credit to the performance in order to predict second-month performance friend who posted it first; second we can assign full credit to (see Section 5). In total, we identified 1.6M seed users who the friend who posted it most recently (i.e. last); and third, seeded an average of 46.33 bit.ly URLs each. Figure 1 shows we can split credit equally among all prior-posting friends. the distribution of bit.ly URL posts by seed. These three assignments effectively make different assump- Second, we crawled the portion of the follower graph com- tions about the influence process: “first influence” rewards prising all users who had broadcast at least one URL over primacy, assuming that individuals are influenced when they first see a new piece of information, even if they fail to im- 1 Our decision to restrict attention to bit.ly URLs was made mediately act on it, during which time they may see it again; predominantly for convenience, as bit.ly was at the time “last influence” assumes the opposite, instead attributing in- by far the dominant URL shortener on Twitter. Given the fluence to the most recent exposure; and “split influence” size of the population of users who rely on bit.ly, which is comparable to the size of all active Twitter users, it seems assumes either that the likelihood of noticing a new piece unlikely to differ systematically from users who rely on other of information, or equivalently the inclination to act on it, shorteners. accumulates steadily as the information is posted by more
  • 4. Figure 2: Three ways of assigning influence to mul- tiple sources friends. Having defined immediate influence, we can then construct disjoint influence trees for every initial posting of a URL. The number of users in these influence trees—referred to as “cascades”—thus define the influence score for every seed. See Figure 3 for some examples of cascades. To check Figure 3: Examples of information cascades on that our results are not an artifact of any particular assump- Twitter. tion about how individuals are influenced to repost infor- mation, we conducted our analysis for all three definitions. Although particular numerical values varied slightly across there are many reasons why individuals may choose to pass the three definitions, the qualitative findings were identical; along information other than the number and identity of thus for simplicity we report results only for first influence. the individuals from whom they received it—in particular, Before proceeding, we note that our use of reposting to the nature of the content itself. Moreover, influencing an- indicate influence is somewhat more inclusive than the con- other individual to pass along a piece of information does not vention of “retweeting” (e.g. using the terminology “RT necessarily imply any other kind of influence, such as influ- @username”) which explicitly attributes the original user. encing their purchasing behavior, or political opinion. Our An advantage of our approach is that we can include in our use of the term “influencer” should therefore be interpreted observations all instances in which a URL was reposted re- as applying only very narrowly to the ability to consistently gardless of whether it was acknowledged by the user, thereby seed cascades that spread further than others. Nevertheless, greatly increasing the coverage of our observations. (Since differences in this ability, such as they do exist, can be con- our study, Twitter has introduced a “retweet” feature that sidered a certain type of influence, especially when the same arguably increases the likelihood that reposts will be ac- information (in this case the same original URL) is seeded knowledged, but does not guarantee that they will be.) How- by many different individuals. Moreover, the terms “influen- ever, a potential disadvantage of our definition is that it tials” and “influencers” have often been used in precisely this may mistakenly attribute influence to what is in reality a se- manner [3]; thus our usage is also consistent with previous quence of independent events. In particular, it is likely that work. users who follow each other will have similar interests and so are more likely to post the same URL in close succession than random pairs of users. Thus it is possible that some 5. PREDICTING INDIVIDUAL INFLUENCE of what we are labeling influence is really a consequence of We now investigate an idealized version of how a mar- homophily [2]. From this perspective, our estimates of in- keter might identify influencers to seed a word-of-mouth fluence should be viewed as an upper bound. campaign [16], where we note that from a marketer’s per- On the other hand, there are reasons to think that our spective the critical capability is to identify attributes of measure underestimates actual influence, as re-broadcasting individuals that consistently predict influence. Reiterating a URL is a particularly strong signal of interest. A weaker that by “influence” we mean a user’s ability to seed content but still relevant measure might be to observe whether a containing URLs that generate large cascades of reposts, we given user views the content of a shortened URL, imply- therefore begin by describing the cascades we are trying to ing that they are sufficiently interested in what the poster predict. has to say that they will take some action to investigate As Figure 4a shows, the distribution of cascade sizes is it. Unfortunately click-through data on bit.ly URLs are of- approximately power-law, implying that the vast majority ten difficult to interpret, as one cannot distinguish between of posted URLs do not spread at all (the average cascade programmatic unshortening events—e.g., from crawlers or size is 1.14 and the median is 1), while a small fraction browser extensions—and actual user clicks. Thus we instead are reposted thousands of times. The depth of the cascade relied on reposting as a conservative measure of influence, (Figure 4b) is also right skewed, but more closely resembles acknowledging that alternative measures of influence should an exponential distribution, where the deepest cascades can also be studied as the platform matures. propagate as far as nine generations from their origin; but Finally, we reiterate that the type of influence we study again the vast majority of URLs are not reposted at all, here is of a rather narrow kind: being influenced to pass corresponding to cascades of size 1 and depth 0 in which along a particular piece of information. As we discuss later, the seed is the only node in the tree. Regardless of whether
  • 5. where the left (right) child is followed if the condition is sat- G G isfied (violated). Leaf nodes give the predicted influence—as 10−1 107 measured by (log) mean cascade size— for the corresponding 10−2 G G 106 G partition. Thus, for example, the right-most leaf indicates G that users with upwards of 1870 followers who had on aver- 10−3 Frequency 105 G G age 6.2 reposts by direct followers (past local influence) are Density G −4 G 4 G 10 10 −5 G G 3 G predicted to have the largest average total influence, gener- 10 10 G G ating cascades of approximately 8.7 additional posts. −6 G 2 10 G 10 G Unsurprisingly, the model indicates that past performance 10−7 101 G G G G G provides the most informative set of features, although it is 10 0 10 1 10 2 10 3 10 4 0 2 4 6 8 the local, not the total influence that is most informative; Size Depth this is likely due to the fact that most non-trivial cascades are of depth 1, so that past direct adoption is a reliable pre- (a) Cascade Sizes (b) Cascade Depths dictor of total adoption. Also unsurprisingly, the number of followers is an informative feature. Notably, however, these Figure 4: (a). Frequency distribution of cascade are the only two features present in the regression tree, en- sizes. (b). Distribution of cascade depths. abling us to visualize influence as a function of these features, as shown in Figure 6. This result is somewhat surprising, as one might reasonably have expected that individuals who we study size or depth, therefore, the implication is that follow many others, or very few others, would be distinct most events do not spread at all, and even moderately sized from the average user. Likewise, one might have expected cascades are extremely rare. that activity level, quantified by the number of tweets, would To identify consistently influential individuals, we aggre- also be predictive. gated all URL posts by user and computed individual-level Figure 7 shows the fit of the regression tree model for all influence as the logarithm of the average size of all cascades five cross-validation folds. The location of the circles indi- for which that user was a seed. We then fit a regression cates the mean predicted and actual values at each leaf of tree model [6], in which a greedy optimization process recur- the trees, with leaves from different cross-validation folds ap- sively partitions the feature space, resulting in a piecewise- pearing close to each other; the size of the circles indicates constant function where the value in each partition is fit to the number of points in each leaf, while the bars show the the mean of the corresponding training data. An important standard deviation of the actual values at each leaf. The advantage of regression trees over ordinary linear regression model is extremely well calibrated, in the sense that the (OLR) in this context is that unlike OLR, which tends to prediction of the average value at each cut of the regres- fit the vast majority of small cascades at the expense of sion tree is almost exactly the actual average (R2 = 0.98). larger ones, the piecewise constant nature of the regression This appearance, however, is deceiving. In fact, the model tree function allows cascades of different sizes to be fit in- fit without averaging predicted and actual values at the leaf dependently. The result is that the regression tree model nodes is relatively poor (R2 = 0.34), reflecting that although is much better calibrated than the equivalent OLR model. large cascades tend to be driven by previously successful in- Moreover, we used folded cross-validation [25] to terminate dividuals with many followers, the extreme scarcity of such partitioning to prevent over-fitting. Our model included the cascades means that most individuals with these attributes following features as predictors: are not successful either. Thus, while large follower count and past success are likely necessary features for future suc- 1. Seed user attributes cess, they are far from sufficient. (a) # followers These results place the usual intuition about influencers in perspective: individuals who have been influential in the (b) # friends past and who have many followers are indeed more likely to (c) # tweets, be influential in the future; however, this intuition is cor- rect only on average. We also emphasize that these results (d) date of joining are based on far more observational data than is typically 2. Past influence of seed users available to marketers—in particular, we have an objec- tive measure of influence and extensive data on past per- (a) average, minimum, and maximum total influence formance. Our finding that individual-level predictions of influence nevertheless remain relatively unreliable therefore (b) average, minimum, and maximum local influence, strongly suggests that rather than attempting to identify ex- ceptional individuals, marketers seeking to exploit word-of- where past local influence refers to the average number of mouth influence should instead adopt portfolio-style strate- reposts by that user’s immediate followers in the first month gies, which target many potential influencers at once and of the observation period, and past total influence refers to therefore rely only on average performance [33]. average total cascade size over the same period. Followers, friends, number of tweets, and influence (actual and past) were all log-transformed to account for their skewed distri- 6. THE ROLE OF CONTENT butions. We then compared predicted influence with actual An obvious objection to the above analysis is that it fails influence computed from the second month of observations. to account for the nature of the content that is being shared. Figure 5 shows the regression tree for one of the folds. Clearly one might expect that some types of content (e.g. Conditions at the nodes indicate partitions of the features, YouTube videos) might exhibit a greater tendency to spread
  • 6. log10(pastLocalInfluence + 1)< 0.1763 | log10(pastLocalInfluence + 1)< 0.0416 log10(followers + 1)< 3.272 log10(followers + 1)< 2.157 log10(followers + 1)< 2.562 log10(followers + 1)< 2.519 log10(pastLocalInfluence + 1)< 0.4921 log10(pastLocalInfluence + 1)< 0.09791 log10(pastLocalInfluence + 1)< 0.3028 log10(pastLocalInfluence + 1)< 0.3027 log10(pastLocalInfluence + 1)< 0.856 0.0124 0.03631 0.05991 0.1229 0.09241 0.1452 0.1929 0.3045 0.275 0.4118 0.6034 0.9854 Figure 5: Regression tree fit for one of the five cross-validation folds. Leaf nodes give the predicted influence for the corresponding partition, where the left (right) child is followed if the node condition is satisfied (violated). britneyspears BarackObama cnnbrk 106 stephenfry TreySongz pigeonPOLL nprnews 105 MrEdLover disneypolls iphone_dev Orbitz geohot Followers riskybusinessmb mslayel marissamayer 10 4 billprady michelebachmann TreysAngels garagemkorova 103 wealthtv OFA_TX 102 10-1 100 101 102 Past Local Influence (a) All users (b) Top 25 users Figure 6: Influence as a function of past local influence and number of followers for (a) all users and (b) users with the top 25 actual influence. Each circle represents a single seed user, where the size of the circle represents that user’s actual average influence. than others (e.g. news articles of specialized interest), or First, we filtered URLs that we knew to be spam or in a lan- that even the same type of content might vary considerably guage other than English. Second, we binned all remaining in interestingness or suitability for sharing. Conceivably, one URLs in logarithmic bins choosing an exponent such that could do considerably better at predicting cascade size if in we obtained ten bins in total, and the top bin contained the addition to knowing the attributes of the seed user, one also 100 largest cascades. Third, we sampled all 100 URLs in knew something about the content of the URL being seeded. the top bin, and randomly sampled 100 URLs from each of To test this idea, we used humans to classify the content the remaining bins. In this way, we ensured that our sample of a sample of 1000 URLs from our study. An advantage would reflect the full distribution of cascade sizes. of this approach over an automated classifier or a topic- Given this sample of URLs, we then used Amazon’s Me- model [35] is that humans can more easily rate content on chanical Turk (AMT) to recruit human classifiers. AMT attributes like “interestingness” or “positive feeling” which is a system that allows one to recruit workers to do small are often quite difficult for a machine. A downside of us- tasks for small payments, and has been used extensively for ing humans, however, is that the number of URLs we can survey and experimental research [17, 24, 23, 22] and to ob- classify in this way is necessarily small relative to the total tain labels for data [28, 27]. We asked the workers to go sample. Moreover, because the distribution of cascade sizes to the web page associated with the URL and answer ques- is so skewed, a uniform random sample of 1000 URLs would tions about it. Specifically, we asked them to classify the almost certainly not contain any large cascades; thus we in- site as “Spam / Not Spam / Unsure”, as “Media Sharing / stead obtained a stratified sample in the following manner. Social Networking, Blog / Forum, News / Mass Media, or
  • 7. 1. Rated interestingness 2. Perceived interestingness to an average person 1.2 G 3. Rated positive feeling 1.0 G G G G 4. Willingness to share via Email, IM, Twitter, Facebook Actual Influence 0.8 or Digg. G G 5. Indicator variables for type of URL (see Figure 8a) 0.6 G G G GG 6. Indicator variable for category of content (see Fig- 0.4 G G ure 8b) G GGG GG GG 0.2 G G G Figure 10 shows the model fit including content. Surpris- G G G G ingly, none of the content features were informative rela- G 0.0 G G tive to the seed user features (hence we omit the regression tree itself, which is essentially identical to Figure 5), nor 0.2 0.4 0.6 0.8 1.0 was the model fit (R2 = 0.31) improved by the addition of Predicted Influence the content features. We note that the slight decrease in fit and calibration compared to the content-free model can be attributed to two main factors: first, the training set Figure 7: Actual vs. predicted influence for regres- size is orders of magnitude smaller for the content model sion tree. The model assigns each seed user to a leaf as we have fewer hand-labeled URLs, and second, here we in the regression tree. Points representing the av- are making predictions at the single post level, which has erage actual influence values are placed at the pre- higher variance than the user-averaged influence predicted diction made by each leaf, with vertical lines rep- in the content-free model. resenting one standard deviation above and below. These results are initially surprising, as explanations of The size of the point is proportional to the num- success often do invoke the attributes of content to account ber of diffusion events assigned to the leaf. Note for what is successful. However, the reason is essentially the that all five folds are represented here, including the same as above—namely that most explanations of success fold represented in Figure 5, where proximate lines tend to focus only on observed successes, which invariably correspond to leaves from different cross-validation represent a small and biased sample of the total population folds. of events. When the much larger number of non-successes are also included, it becomes difficult to identify content- based attributes that are consistently able to differentiate success from failure at the level of individual events. Other”, and then to specify one of 10 categories it fell into (see Figure 8b). We then asked them to rate how broadly relevant the site was, on a scale from 0 (very niche) to 100 7. TARGETING STRATEGIES (extremely broad). We also gauged their impression of the Although content was not found to improve predictive per- site, including how interesting they felt the site was (7-point formance, it remains the case that individual-level attributes— Likert scale), how interesting the average person would find in particular past local influence and number of followers— it (7-point Likert scale), and how positively it made them can be used to predict average future influence. Given this feel (7-point Likert scale). Finally, we asked them to indi- observation, a natural next question is how a hypothetical cate if they would share the URL using any of the following marketer might exploit available information to optimize the services: Email, IM, Twitter, Facebook, or Digg. diffusion of information by systematically targeting certain To ensure that our ratings and classifications were reliable, classes of individuals. In order to answer such a question, we had each URL rated at least 3 times—the average URL however, one must make some assumptions regarding the was rated 11 times, and the maximum was rated 20 times. costs of targeting individuals and soliciting their coopera- If more than three workers marked the URL as a bad link tion. or in a foreign language, it was excluded. In addition, we To illustrate this point we now evaluate the cost-effectiveness excluded URLs that were marked as spam by the majority of of a hypothetical targeting strategy based on a simple but workers. This resulted in 795 URLs that we could analyze. plausible family of cost functions ci = ca +fi cf , where ca rep- As Figure 9 shows, content that is rated more interesting resents a fixed “acquisition cost” ca per individual i, and cf tends to generate larger cascades on average, as does content represents a “cost per follower” that each individual charges that elicits more positive feelings. In addition, Figure 8 the marketer for each “sponsored” tweet. Without loss of shows that certain types of URLs, like those associated with generality we have assumed a value of cf = $0.01, where shareable media, tend to spread more than URLs associated the choice of units is based on recent news reports of paid with news sites, while some types of content (e.g “lifestyle”) tweets (http://nyti.ms/atfmzx). For convenience we express spread more than others. the acquisition cost as multiplier α of the per-follower cost; To evaluate the additional predictive power of content, we hence ca = αcf . repeat the regression tree analysis of Section 5 for this subset Because the relative cost of targeting potential “influencers” of URLs, adding the following content-based features: is an unresolved empirical question, we instead explore a
  • 8. Lifestyle Media Sharing / Social Networking Technology Offbeat General Category Entertainment Blog / Forum Type of URL Gaming Science Other Write In / Other News Business and Finance News / Mass Media Sports 0 50 100 150 0 50 100 150 Cascade Size Cascade Size (a) Type of URL (b) Content Category for URL Figure 8: (a). Average cascade size for different type os URLs (b). Average cascade size for different categories of content. Error bars are standard errors wide range of possible assumptions by varying α. For ex- ample, choosing α to be small corresponds to a cost func- 170 tion that is biased towards individuals with relatively few 160 160 followers, who are cheap and numerous. Conversely when 150 α is sufficiently large, the acquisition cost will tilt toward 150 Cascade Size Cascade Size 140 targeting a small number of highly influential users, mean- 140 ing users with a larger number of followers and good track 130 130 records. Regardless, one must trade off between the number 120 120 of followers per influencer and the number of individuals who 110 110 can be targeted, where the optimal tradeoff will depend on 100 α. To explore the full range of possibilities allowed by this 1 2 3 4 5 6 7 1 2 3 4 5 6 7 family of cost functions, for each value of α we binned users Interest Feeling according to their influence as predicted by the regression (a) Interesting (b) Positive Feeling tree model and computed the average influence-per-dollar of the targeted subset for each bin. Figure 9: (a). Average cascade size for different in- As Figure 11 shows, when α = 0—corresponding to a terest ratings (b). Average cascade size for different situation in which individuals can be located costlessly— ratings of positive feeling. Error bars are standard we find that by far the most cost-effective category is to errors. target the least influential individuals, who exert over fif- teen times the influence-per-dollar of the most influential category. Although these individuals are much less influ- ential (average influence score ≈ 0.01) than average, they 8. CONCLUSIONS also have relatively few followers (average ≈ 14); thus are In light of the emphasis placed on prominent individuals relatively inexpensive. At the other extreme, when α be- as optimal vehicles for disseminating information [19], the come sufficiently large—here α 100, 000, corresponding possibility that “ordinary influencers”—individuals who ex- to an acquisition cost ca = $1, 000—we recover the result ert average, or even less-than-average influence—are under that highly influential individuals are also the most cost- many circumstances more cost-effective, is intriguing. We effective. Although expensive, these users will be preferred emphasize, however, that these results are based on statisti- simply because the acquisition cost prohibits identifying and cal modeling of observational data and do not imply causal- managing large numbers of influencers. ity. It is quite possible, for example, that content seeded Finally, Figure 11 reveals that although the most cost- by outside sources—e.g., marketers—may diffuse quite dif- efficient category of influencers corresponds to increasingly ferently than content selected by users themselves. Like- influential individuals as α increases, the transition is sur- wise, while we have considered a wide range of possible cost prisingly slow. For example, even for values of α as high as functions, other assumptions about costs are certainly possi- 10,000, (i.e. equivalent to ca = $100) the most cost-efficient ble and may lead to different conclusions. For reasons such influencers are still relatively ordinary users, who exhibit as these, our conclusions therefore ought to be viewed as approximately average influence and connectivity. hypotheses to be tested in properly designed experiments, not as verified causal statements. Nevertheless, our find-
  • 9. 0.035 0 2.5 G G G 10 Mean actual influence / total cost G G 0.030 100 G 2.0 1000 G 0.025 10000 Actual Influence G G G G 1.5 0.020 1e+05 G G G G G G 0.015 1.0 G G G G G G G GG G G G G G 0.010 G 0.5 G G G G G G G G G G G 0.005 G GG GGG G G G G G GG G GG G GG G 0.0 G G G 0.0 0.5 1.0 1.5 2.0 2.5 10−1 10−0.5 100 100.5 Predicted Influence Mean predicted influence Figure 10: Actual vs. predicted influence for regres- sion tree including content Figure 11: Return on investment where targeting cost is defined as ci = ca + fi cf , ca = αcf and α ∈ {0, 10, 100, 1000, 10000, 100000}. ing regarding the relative efficacy of ordinary influencers is consistent with previous theoretical work [32] that has also questioned the feasibility of word-of-mouth strategies that Stanford, California, 2009. Association of Computing depend on triggering “social epidemics” by targeting special Machinery. individuals. [4] F. M. Bass. A new product growth for model consumer We also note that although Twitter is in many respects a durables. Management Science, 15(5):215–227, 1969. special case, our observation that large cascades are rare is [5] R. A. Berk. An introduction to sample selection bias likely to apply in other contexts as well. Correspondingly, in sociological data. American Sociological Review, our conclusion that word-of-mouth information spreads via 48(3):386–398, 1983. many small cascades, mostly triggered by ordinary individ- [6] L. Breiman, J. Friedman, R. Olshen, and C. Stone. uals, is also likely to apply generally, as has been suggested Classification and regression trees. Chapman & elsewhere [33]. Marketers, planners and other change agents Hall/CRC, 1984. interested in harnessing word-of-mouth influence could there- [7] M. Cha, H. Haddadi, F. Benevenuto, and K. P. fore benefit first by adopting more precise metrics of influ- Gummad. Measuring user influence on twitter: The ence; second by collecting more and better data about poten- million follower fallacy. In 4th Int’l AAAI Conference tial influencers over extended intervals of time; and third, by on Weblogs and Social Media, Washington, DC, 2010. potentially exploiting ordinary influencers, where the opti- mal tradeoff between the number of individuals targeted and [8] R. M. Dawes. Everyday irrationality: How their average level of influence will depend on the specifics pseudo-scientists, lunatics, and the rest of us of the cost function in question. systematically fail to think rationally. Westview Pr, 2002. [9] J. Denrell. Vicarious learning, undersampling of 9. ACKNOWLEDGMENTS failure, and the myths of management. Organization We thank Sharad Goel for helpful conversations. Science, 14(3):227–243, 2003. [10] M. Gladwell. The Tipping Point: How Little Things 10. REFERENCES Can Make a Big Difference. Little Brown, New York, [1] E. Adar and A. Adamic, Lada. Tracking information 2000. epidemics in blogspace. In 2005 IEEE/WIC/ACM [11] J. Goldenberg, S. Han, D. R. Lehmann, and J. W. International Conference on Web Intelligence, Hong. The role of hubs in the adoption process. Compiegne University of Technology, France, 2005. Journal of Marketing, 73(2):1–13, 2009. [2] S. Aral, L. Muchnik, and A. Sundararajan. [12] A. Goyal, F. Bonchi, and L. V. S. Lakshmanan. Distinguishing influence-based contagion from Discovering leaders from community actions. pages homophily-driven diffusion in dynamic networks. 499–508. ACM, 2008. Proceeding of the 17th ACM Proceedings of the National Academy of Sciences, conference on Information and knowledge 106(51):21544, 2009. management. [3] E. Bakshy, B. Karrer, and A. Adamic, Lada. Social [13] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. influence and the diffusion of user-created content. In Information diffusion through blogspace. pages 10th ACM Conference on Electronic Commerce, 491–501. ACM New York, NY, USA, 2004.
  • 10. [14] E. Katz and P. F. Lazarsfeld. Personal influence; the [24] S. A. Munson and P. Resnick. Presenting diverse part played by people in the flow of mass political opinions: how and how much. pages communications. Free Press, Glencoe, Ill. 1955. 1457–1466. ACM, 2010. ” [15] E. Keller and J. Berry. The Influentials: One [25] R. R. Picard and R. D. Cook. Cross-validation of American in Ten Tells the Other Nine How to Vote, regression models. Journal of the American Statistical Where to Eat, and What to Buy. Free Press, New Association, 79(387):575–583, 1984. York, NY, 2003. [26] E. M. Rogers. Diffusion of innovations. Free Press, [16] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing New York, 4th edition, 1995. the spread of influence through a social network. In [27] V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get 9th ACM SIGKDD International Conference on another label? improving data quality and data Knowledge Discovery and Data Mining, Washington, mining using multiple, noisy labelers. pages 614–622. DC, USA., 2003. Association of Computing Machinery. ACM, 2008. [17] A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user [28] R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. studies with mechanical turk. Proceedings of the Cheap and fast-but is it good? evaluating non-expert twenty-sixth annual SIGCHI conference on Human annotations for natural language tasks, 2008. factors in computing systems, 2008. [29] E. S. Sun, I. Rosenn, C. A. Marlow, and T. M. Lento. [18] H. Kwak, C. Lee, H. Park, and S. Moon. What is Gesundheit! modeling contagion through facebook twitter, a social network or a news media? pages news feed. In International Conference on Weblogs 591–600. ACM, 2010. and Social Media, San Jose, CA, 2009. AAAI. [19] A. Leavitt, E. Burchard, D. Fisher, and S. Gilbert. [30] B. Tomlinson and C. Cockram. Sars: Experience at The influentials: New approaches for analyzing prince of wales hospital, hong kong. The Lancet, influence on twitter. 361(9368):1486–1487, 2003. [20] J. Leskovec, A. Adamic, Lada, and A. Huberman, [31] D. J. Watts. A simple model of information cascades Bernardo. The dynamics of viral marketing. ACM on random networks. Proceedings of the National Trans. Web, 1(1):5, 2007. Academy of Science, U.S.A., 99:5766–5771, 2002. [21] D. Liben-Nowell and J. Kleinberg. Tracing [32] D. J. Watts and P. S. Dodds. Influentials, networks, information flow on a global scale using internet and public opinion formation. Journal of Consumer chain-letter data. Proceedings of the National Academy Research, 34:441–458, 2007. of Sciences, 105(12):4633, 2008. [33] D. J. Watts and J. Peretti. Viral marketing for the real [22] W. Mason and S. Suri. Conducting Behavioral world. Harvard Business Review, May:22–23, 2007. Research on Amazon’s Mechanical Turk. SSRN [34] G. Weimann. The Influentials: People Who Influence eLibrary, 2010. People. State University of New York Press, Albany, [23] W. Mason and D. J. Watts. Financial incentives and NY, 1994. the performance of crowds. Proceedings of the ACM [35] J. Weng, E. P. Lim, J. Jiang, and Q. He. Twitterrank: SIGKDD Workshop on Human Computation, pages finding topic-sensitive influential twitterers. pages 77–85, 2009. 261–270. ACM, 2010.