SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                               Pagerank Increase
Outline

                       under Different Collusion Topologies
Introduction

Collusion and
Pagerank

Experiments in a
                   Ricardo Baeza-Yates, Carlos Castillo and Vicente L´pez
                                                                     o
synthetic graph

Experiments in a
real Web graph
                      ICREA Professor / Dept. of Technology / C´tedra Telef´nica
                                                               a           o
Conclusions
                            Universitat Pompeu Fabra – Barcelona, Spain


                                         May 10th, 2005
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Introduction
                   1
   V. L´pez
       o


Outline

                       Collusion and Pagerank
                   2
Introduction

Collusion and
Pagerank

Experiments in a
                       Experiments in a synthetic graph
                   3
synthetic graph

Experiments in a
real Web graph

                       Experiments in a real Web graph
                   4
Conclusions




                       Conclusions
                   5
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Goal
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Study collusion
Outline

                   Nepotistic linking in a Web graph
Introduction

Collusion and
Pagerank
                          This can be done by bad sites (spam) but also good sites
Experiments in a
synthetic graph
                          Colluding groups could use different topologies
Experiments in a
real Web graph
                          Colluding groups could have different original rankings
Conclusions
                          How much would their ranking increase if ... ?
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We use Pagerank as the ranking function [Page et al., 1998]
Outline
                   Pagerank
Introduction
                   Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
                   Let U a matrix such that Ui,j = 1/N
Experiments in a
                   Let P = (1 − )L + U
synthetic graph

                   Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph

                   Pagerank scores are the probabilities of visiting a page using a
Conclusions

                   process of random browsing, with a “reset” probability of
                     ≈ 0.15.
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We use Pagerank as the ranking function [Page et al., 1998]
Outline
                   Pagerank
Introduction
                   Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
                   Let U a matrix such that Ui,j = 1/N
Experiments in a
                   Let P = (1 − )L + U
synthetic graph

                   Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph

                   Pagerank scores are the probabilities of visiting a page using a
Conclusions

                   process of random browsing, with a “reset” probability of
                     ≈ 0.15.
Framework
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We use Pagerank as the ranking function [Page et al., 1998]
Outline
                   Pagerank
Introduction
                   Let LN×N row-wise normalized link matrix
Collusion and
Pagerank
                   Let U a matrix such that Ui,j = 1/N
Experiments in a
                   Let P = (1 − )L + U
synthetic graph

                   Pagerank scores are given by v such that P T v = v
Experiments in a
real Web graph

                   Pagerank scores are the probabilities of visiting a page using a
Conclusions

                   process of random browsing, with a “reset” probability of
                     ≈ 0.15.
Gain from collusion
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                   Maximum gain [Zhang et al., 2004]:
   V. L´pez
       o


                                        New Pagerank   1
Outline
                                                     ≤
Introduction
                                        Old Pagerank
Collusion and
                        ≈ 0.15, maximum gain ≈ 7.
                   As
Pagerank

Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions




                   First task: improve this bound.
Gain from collusion
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                   Maximum gain [Zhang et al., 2004]:
   V. L´pez
       o


                                        New Pagerank   1
Outline
                                                     ≤
Introduction
                                        Old Pagerank
Collusion and
                        ≈ 0.15, maximum gain ≈ 7.
                   As
Pagerank

Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions




                   First task: improve this bound.
Impact of collusion in Pagerank
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                         The Web: N pages

Outline
                                                 G'
Introduction

Collusion and
                                G
Pagerank

                                                       N-M pages
Experiments in a
                                    M pages
synthetic graph

Experiments in a
real Web graph

Conclusions
Grouping nodes for Pagerank calculation
    Pagerank
 Increase under
                   Links for Pagerank, can be “lumped” together
    Collusion

                   [Clausen, 2004]:
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Collusion and
Pagerank

Experiments in a
synthetic graph

                                                        N-M pages
Experiments in a
                              M pages    Random
real Web graph
                                          jumps
Conclusions
Links for Pagerank calculation
    Pagerank
 Increase under
    Collusion
                   Pagerankcolluding            = Pjump + Pin + Pself
                                        nodes
R. Baeza-Yates,
C. Castillo and
                                                Pin
   V. L´pez
       o


Outline

Introduction

Collusion and
Pagerank
                                                      Pjump
Experiments in a
                               M nodes,
synthetic graph
                                                                 N-M nodes,
                              Pagerank=
                                                                 Pagerank=
Experiments in a
                                    x                              1-x
real Web graph

Conclusions




                            Pself
Pagerank calculation: random jumps
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction
                   There are N nodes in total, M in the colluding set:
Collusion and
Pagerank

                                         Pjump = (M/N)
Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming links
    Pagerank
 Increase under
    Collusion
                                                             Pagerank(a)
R. Baeza-Yates,
                                Pin =
C. Castillo and
                                                               deg (a)
   V. L´pez
       o
                                           (a,b):(a,b)∈Ein
Outline
                                       =              Pagerank(a) p(a)
Introduction
                                           a:a∈G −G
Collusion and
Pagerank
                   Where p(a) is the fraction of links from node a pointing to
Experiments in a
                   the colluding set, possibly 0 for some nodes.
synthetic graph

Experiments in a
real Web graph

Conclusions
                                                     Pagerank(a)p(a)
                                           a:a∈G −G
                               p=
                                              a:a∈G −G Pagerank(a)

                                           a:a∈G −G Pagerank(a)p(a)
                                   =
                                                     1−x
                   Z p is a weighted average of p(a), it reflects how
                   “important” pages in the colluding set are
Pagerank calculation: incoming and self links
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   Pin can be rewritten as:
Outline

Introduction
                                       Pin = (1 − )(1 − x)p
Collusion and
Pagerank
                   Using the same trick for Pself , we can take s as the weighted
Experiments in a
                   average of the fraction of self-links of each page in the
synthetic graph

                   colluding set, and write:
Experiments in a
real Web graph

Conclusions
                                          Pself = (1 − )xs
Pagerank calculation: incoming and self links
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   Pin can be rewritten as:
Outline

Introduction
                                       Pin = (1 − )(1 − x)p
Collusion and
Pagerank
                   Using the same trick for Pself , we can take s as the weighted
Experiments in a
                   average of the fraction of self-links of each page in the
synthetic graph

                   colluding set, and write:
Experiments in a
real Web graph

Conclusions
                                          Pself = (1 − )xs
Pagerank calculation summary
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
                                           Pin= (1-)(1-x)p
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Collusion and
                                          Pjump= (M/N)
Pagerank
                               M nodes,
Experiments in a                                              N-M nodes,
                              Pagerank=
synthetic graph
                                                              Pagerank=
                                  x                             1-x
Experiments in a
real Web graph

Conclusions


                      Pself= (1-)xs
Solving
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   Solving the stationary state Pin + Pjump + Pself = x yields:
Outline

                                                  M
Introduction
                                                     + (1 − ) p
                                                  N
                                    xnormal =
Collusion and
                                                (p − s)(1 − ) + 1
Pagerank

Experiments in a
synthetic graph
                   What happens when colluding ?
Experiments in a
real Web graph
                   Colluding means pointing more links to the inside
Conclusions
                   This means s → s , with s > s, yielding xcolluding
Solving
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   Solving the stationary state Pin + Pjump + Pself = x yields:
Outline

                                                  M
Introduction
                                                     + (1 − ) p
                                                  N
                                    xnormal =
Collusion and
                                                (p − s)(1 − ) + 1
Pagerank

Experiments in a
synthetic graph
                   What happens when colluding ?
Experiments in a
real Web graph
                   Colluding means pointing more links to the inside
Conclusions
                   This means s → s , with s > s, yielding xcolluding
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                                       1−s
                                       xcolluding
                                                  =1+
Outline
                                        xnormal       p + 1−
Introduction

Collusion and
                   Making s = p, originally the set was not colluding:
Pagerank

Experiments in a
                                      xcolluding        1
synthetic graph
                                                 =
                                                   p(1 − ) +
                                       xnormal
Experiments in a
real Web graph

Conclusions




                   Z This is inversely correlated to p, the original weighted
                   fraction of links going to the colluding set
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                                       1−s
                                       xcolluding
                                                  =1+
Outline
                                        xnormal       p + 1−
Introduction

Collusion and
                   Making s = p, originally the set was not colluding:
Pagerank

Experiments in a
                                      xcolluding        1
synthetic graph
                                                 =
                                                   p(1 − ) +
                                       xnormal
Experiments in a
real Web graph

Conclusions




                   Z This is inversely correlated to p, the original weighted
                   fraction of links going to the colluding set
Pagerank increase due to collusion
    Pagerank
 Increase under
    Collusion

                   Making s = 1, all links from the colluding set go inside now:
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                                       1−s
                                       xcolluding
                                                  =1+
Outline
                                        xnormal       p + 1−
Introduction

Collusion and
                   Making s = p, originally the set was not colluding:
Pagerank

Experiments in a
                                      xcolluding        1
synthetic graph
                                                 =
                                                   p(1 − ) +
                                       xnormal
Experiments in a
real Web graph

Conclusions




                   Z This is inversely correlated to p, the original weighted
                   fraction of links going to the colluding set
Expected Pagerank change
    Pagerank
 Increase under
                   xcolluding /xnormal as a function of p
    Collusion

R. Baeza-Yates,
C. Castillo and
                                                  7
   V. L´pez
       o
                                                 1/ε
                       Maximum pagerank change
Outline
                                                 6
Introduction

                                                 5
Collusion and
Pagerank

Experiments in a
                                                 4
synthetic graph

Experiments in a
                                                 3
real Web graph

Conclusions

                                                 2

                                                 1
                                                  10-3           10-2              10-1              100
                                                 Weighted average of fraction of links to colluding nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Experiments in a synthetic graph
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                       Created using the generative model [Kumar et al., 2000]
   V. L´pez
       o


                       Power-law distribution with parameter −2.1 for in-degree
Outline

                       and Pagerank, and −2.7 for out-degree, using parameters
Introduction

                       from [Pandurangan et al., 2002]
Collusion and
Pagerank
                       100,000–nodes scale-free graph
Experiments in a
synthetic graph

Experiments in a
                   Sampling by Pagerank
real Web graph

                   Divided in deciles, each decile has 1/10th of the Pagerank
Conclusions


                   Picked a group of 100 nodes at random from each decile
                   Group 1 are low-ranked nodes, group 10 are high-ranked nodes
Original Pagerank of the nodes
    Pagerank
 Increase under
                   These are the original Pagerank values for each group
    Collusion

R. Baeza-Yates,
C. Castillo and
                                          10-2
   V. L´pez
       o


Outline
                                                                                    Originally very good
                                            -3
Introduction
                                          10
Collusion and
                        Pagerank values


Pagerank

Experiments in a
                                          10-4
synthetic graph

Experiments in a
real Web graph


                                          10-5
Conclusions



                                                     Originally very bad
                                                                                     Average
                                          10-6
                                                 1      2     3      4     5    6      7       8   9       10
                                                                           Group
Modified Pagerank of the nodes
    Pagerank
 Increase under
                   These are the modified Pagerank values when colluding.
    Collusion

R. Baeza-Yates,
C. Castillo and
                                          10-2
   V. L´pez
       o


Outline

                                            -3
Introduction
                                          10
Collusion and
                        Pagerank values


Pagerank

Experiments in a
                                          10-4
synthetic graph

Experiments in a
real Web graph


                                          10-5
Conclusions



                                                                          Original
                                                                          Clique
                                            -6
                                          10
                                                 1   2   3   4   5    6       7      8   9   10
                                                                 Group
Distribution of Pagerank
    Pagerank

                   i But Pagerank values follow a power law distribution ...
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                                  100
   V. L´pez
       o
                                                                    x-2.1
Outline
                                  10-1
Introduction

Collusion and
Pagerank
                                  10-2
                      Frequency




Experiments in a
synthetic graph

                                  10-3
Experiments in a
real Web graph

Conclusions

                                  10-4


                                  10-5 -6
                                            10-5        10-4        10-3    10-2
                                     10
                                                   Pagerank value
Modified Pagerank position of the nodes
    Pagerank
                   These are the modified Pagerank positions (rankings) when
 Increase under
    Collusion
                   colluding.
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                                         1.0
                                         0.9
Outline

Introduction
                                         0.8
Collusion and
                                         0.7
Pagerank
                      Pagerank ranking




                                         0.6
Experiments in a
synthetic graph
                                         0.5
Experiments in a
real Web graph
                                         0.4
Conclusions
                                         0.3
                                         0.2
                                                                        Original
                                         0.1
                                                                        Clique
                                         0.0
                                               1   2   3   4   5    6     7        8   9   10
                                                               Group
Variation of Pagerank when colluding
    Pagerank
 Increase under
                   These are the ratio of xcolluding /xoriginal
    Collusion

R. Baeza-Yates,
C. Castillo and
                                                7
   V. L´pez
       o
                                               1/ε −                    Change in Pagerank value
                                                                              Change in ranking
                                                    6
Outline
                       New value / original value
Introduction
                                                    5
Collusion and
Pagerank
                                                    4
Experiments in a
synthetic graph
                                                    3
Experiments in a
real Web graph

                                                    2
Conclusions


                                                    1

                                                    0
                                                        1   2   3   4     5    6      7      8     9   10
                                                                          Group
It is not necessary to create a clique
    Pagerank
                    Spammers can use a fraction of the links to try to avoid
 Increase under
    Collusion
                    detection
R. Baeza-Yates,
C. Castillo and
                                                           7
                                                                                                   Full clique
   V. L´pez
       o
                                                          1/ε −                                           95%
                                                                                                          90%
                                                                                                          85%
Outline
                                                           6                                              80%
                                                                                                          75%
Introduction
                                                                                                          70%
                       New Pagerank / original Pagerank




                                                                                                          65%
Collusion and                                                                                             60%
                                                           5
                                                                                                          55%
Pagerank
                                                                                                          50%
                                                                                                          45%
Experiments in a
                                                                                                          40%
synthetic graph
                                                           4                                              35%
                                                                                                          30%
Experiments in a                                                                                          25%
real Web graph                                                                                            20%
                                                                                                          15%
                                                           3
                                                                                                          10%
Conclusions
                                                                                                          05%

                                                           2



                                                           1
                                                               1   2   3   4   5           6   7   8         9   10
                                                                                   Group



                    In the paper, other topologies: star and ring
It is not necessary to create a clique
    Pagerank
                    Spammers can use a fraction of the links to try to avoid
 Increase under
    Collusion
                    detection
R. Baeza-Yates,
C. Castillo and
                                                           7
                                                                                                   Full clique
   V. L´pez
       o
                                                          1/ε −                                           95%
                                                                                                          90%
                                                                                                          85%
Outline
                                                           6                                              80%
                                                                                                          75%
Introduction
                                                                                                          70%
                       New Pagerank / original Pagerank




                                                                                                          65%
Collusion and                                                                                             60%
                                                           5
                                                                                                          55%
Pagerank
                                                                                                          50%
                                                                                                          45%
Experiments in a
                                                                                                          40%
synthetic graph
                                                           4                                              35%
                                                                                                          30%
Experiments in a                                                                                          25%
real Web graph                                                                                            20%
                                                                                                          15%
                                                           3
                                                                                                          10%
Conclusions
                                                                                                          05%

                                                           2



                                                           1
                                                               1   2   3   4   5           6   7   8         9   10
                                                                                   Group



                    In the paper, other topologies: star and ring
Experiments in a real Web graph
    Pagerank
 Increase under
                   Hostgraph of 310,486 Websites from Spain
    Collusion

R. Baeza-Yates,
C. Castillo and
                                  100
   V. L´pez
       o
                                                                    x-2.1

                                  10-1
Outline

Introduction

                                    -2
Collusion and
                                  10
Pagerank
                      Frequency




Experiments in a
                                  10-3
synthetic graph

Experiments in a

                                  10-4
real Web graph

Conclusions

                                  10-5

                                  10-6 -6
                                            10-5        10-4        10-3    10-2
                                     10
                                                   Pagerank value
Experiments in a real Web graph
    Pagerank
 Increase under
                   Some of the nodes are already colluding [Fetterly et al., 2004]
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Collusion and
Pagerank

Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
We study a set of Web sites
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o
                   We picked a set of 242 reputable sites with rank ≈ 0.75, and
Outline
                   we modify their links.
Introduction
                       Disconnect the group
Collusion and
Pagerank
                       Create a ring
Experiments in a
                       Add a central page linking to all of them
synthetic graph

Experiments in a
                       Add a central page linking to and from all of them (star)
real Web graph

                       Fully connect the group (clique)
Conclusions


                   Now we measure the new ranking
New rankings under graph modifications
    Pagerank
 Increase under
    Collusion

                                 1
R. Baeza-Yates,
C. Castillo and
                               0.9
   V. L´pez
       o

                               0.8
Outline
                               0.7
Introduction
                    Rankings


                               0.6
Collusion and
Pagerank
                               0.5
Experiments in a
                               0.4
synthetic graph

                               0.3
Experiments in a
real Web graph
                               0.2
Conclusions
                               0.1
                                 0
                                             Normal      Ring      Star
                                     Disconnected Central Inv. Ring     Clique
                                                          Strategy
Adding 5%-50% of complete subgraph
    Pagerank
 Increase under
    Collusion
                                  1.000
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


                                  0.995
Outline

Introduction
                       Rankings


Collusion and
Pagerank
                                  0.990
Experiments in a
synthetic graph

Experiments in a
                                  0.985
real Web graph

Conclusions


                                                               Average ranking
                                  0.980
                                          0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
                                                Percent of links of a complete subgraph


                   The best sites also increase their ranking
Adding 5%-50% of complete subgraph
    Pagerank
 Increase under
    Collusion
                                  1.000
R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


                                  0.995
Outline

Introduction
                       Rankings


Collusion and
Pagerank
                                  0.990
Experiments in a
synthetic graph

Experiments in a
                                  0.985
real Web graph

Conclusions


                                                               Average ranking
                                  0.980
                                          0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
                                                Percent of links of a complete subgraph


                   The best sites also increase their ranking
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Conclusions
    Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
                     V Any group of nodes can increase its Pagerank
   V. L´pez
       o

                     V Nodes with high Pagerank gain less by colluding
Outline

                   Ideas for link spam detection
Introduction

Collusion and
                     X Only detecting regularities can fail to detect randomized
Pagerank

                       structures
Experiments in a
synthetic graph
                     X Only detecting nepotistic links can give false positives
Experiments in a
real Web graph

Conclusions




                             V Use evidence from multiple sources
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o


Outline

Introduction

Collusion and
                   Thank you
Pagerank

Experiments in a
synthetic graph

Experiments in a
real Web graph

Conclusions
Pagerank
 Increase under
    Collusion
                   Clausen, A. (2004).
R. Baeza-Yates,
                   The cost of attack of PageRank.
C. Castillo and
                   In Proceedings of the international conference on agents, Web
   V. L´pez
       o
                   technologies and Internet commerce (IAWTIC), Gold Coast, Australia.
Outline
                   Fetterly, D., Manasse, M., and Najork, M. (2004).
Introduction
                   Spam, damn spam, and statistics: Using statistical analysis to locate spam
                   Web pages.
Collusion and
Pagerank
                   In Proceedings of the seventh workshop on the Web and databases
                   (WebDB), Paris, France.
Experiments in a
synthetic graph
                   Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A.,
Experiments in a
                   and Upfal, E. (2000).
real Web graph
                   Stochastic models for the web graph.
Conclusions
                   In Proceedings of the 41st Annual Symposium on Foundations of
                   Computer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEE
                   CS Press.
                   Page, L., Brin, S., Motwani, R., and Winograd, T. (1998).
                   The Pagerank citation algorithm: bringing order to the web.
                   Technical report, Stanford Digital Library Technologies Project.
Pagerank
 Increase under
    Collusion

R. Baeza-Yates,
C. Castillo and
   V. L´pez
       o

                   Pandurangan, G., Raghavan, P., and Upfal, E. (2002).
Outline
                   Using Pagerank to characterize Web structure.
Introduction
                   In Proceedings of the 8th Annual International Computing and
                   Combinatorics Conference (COCOON), volume 2387 of Lecture Notes in
Collusion and
Pagerank           Computer Science, pages 330–390, Singapore. Springer.
Experiments in a
                   Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004).
synthetic graph
                   Making eigenvector-based reputation systems robust to collusion.
Experiments in a
                   In Proceedings of the third Workshop on Web Graphs (WAW), volume
real Web graph
                   3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy.
Conclusions
                   Springer.

Weitere ähnliche Inhalte

Mehr von Carlos Castillo (ChaTo)

Mehr von Carlos Castillo (ChaTo) (20)

Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
Clustering
ClusteringClustering
Clustering
 
Text similarity and the vector space model
Text similarity and the vector space modelText similarity and the vector space model
Text similarity and the vector space model
 
Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open Invitation
 

Kürzlich hochgeladen

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Pagerank Increase under Collusion Topologies

  • 1. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pagerank Increase Outline under Different Collusion Topologies Introduction Collusion and Pagerank Experiments in a Ricardo Baeza-Yates, Carlos Castillo and Vicente L´pez o synthetic graph Experiments in a real Web graph ICREA Professor / Dept. of Technology / C´tedra Telef´nica a o Conclusions Universitat Pompeu Fabra – Barcelona, Spain May 10th, 2005
  • 2. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Introduction 1 V. L´pez o Outline Collusion and Pagerank 2 Introduction Collusion and Pagerank Experiments in a Experiments in a synthetic graph 3 synthetic graph Experiments in a real Web graph Experiments in a real Web graph 4 Conclusions Conclusions 5
  • 3. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 4. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 5. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 6. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 7. Goal Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Study collusion Outline Nepotistic linking in a Web graph Introduction Collusion and Pagerank This can be done by bad sites (spam) but also good sites Experiments in a synthetic graph Colluding groups could use different topologies Experiments in a real Web graph Colluding groups could have different original rankings Conclusions How much would their ranking increase if ... ?
  • 8. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 9. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 10. Framework Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We use Pagerank as the ranking function [Page et al., 1998] Outline Pagerank Introduction Let LN×N row-wise normalized link matrix Collusion and Pagerank Let U a matrix such that Ui,j = 1/N Experiments in a Let P = (1 − )L + U synthetic graph Pagerank scores are given by v such that P T v = v Experiments in a real Web graph Pagerank scores are the probabilities of visiting a page using a Conclusions process of random browsing, with a “reset” probability of ≈ 0.15.
  • 11. Gain from collusion Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Maximum gain [Zhang et al., 2004]: V. L´pez o New Pagerank 1 Outline ≤ Introduction Old Pagerank Collusion and ≈ 0.15, maximum gain ≈ 7. As Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions First task: improve this bound.
  • 12. Gain from collusion Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Maximum gain [Zhang et al., 2004]: V. L´pez o New Pagerank 1 Outline ≤ Introduction Old Pagerank Collusion and ≈ 0.15, maximum gain ≈ 7. As Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions First task: improve this bound.
  • 13. Impact of collusion in Pagerank Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o The Web: N pages Outline G' Introduction Collusion and G Pagerank N-M pages Experiments in a M pages synthetic graph Experiments in a real Web graph Conclusions
  • 14. Grouping nodes for Pagerank calculation Pagerank Increase under Links for Pagerank, can be “lumped” together Collusion [Clausen, 2004]: R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Pagerank Experiments in a synthetic graph N-M pages Experiments in a M pages Random real Web graph jumps Conclusions
  • 15. Links for Pagerank calculation Pagerank Increase under Collusion Pagerankcolluding = Pjump + Pin + Pself nodes R. Baeza-Yates, C. Castillo and Pin V. L´pez o Outline Introduction Collusion and Pagerank Pjump Experiments in a M nodes, synthetic graph N-M nodes, Pagerank= Pagerank= Experiments in a x 1-x real Web graph Conclusions Pself
  • 16. Pagerank calculation: random jumps Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction There are N nodes in total, M in the colluding set: Collusion and Pagerank Pjump = (M/N) Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 17. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 18. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 19. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 20. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 21. Pagerank calculation: incoming links Pagerank Increase under Collusion Pagerank(a) R. Baeza-Yates, Pin = C. Castillo and deg (a) V. L´pez o (a,b):(a,b)∈Ein Outline = Pagerank(a) p(a) Introduction a:a∈G −G Collusion and Pagerank Where p(a) is the fraction of links from node a pointing to Experiments in a the colluding set, possibly 0 for some nodes. synthetic graph Experiments in a real Web graph Conclusions Pagerank(a)p(a) a:a∈G −G p= a:a∈G −G Pagerank(a) a:a∈G −G Pagerank(a)p(a) = 1−x Z p is a weighted average of p(a), it reflects how “important” pages in the colluding set are
  • 22. Pagerank calculation: incoming and self links Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pin can be rewritten as: Outline Introduction Pin = (1 − )(1 − x)p Collusion and Pagerank Using the same trick for Pself , we can take s as the weighted Experiments in a average of the fraction of self-links of each page in the synthetic graph colluding set, and write: Experiments in a real Web graph Conclusions Pself = (1 − )xs
  • 23. Pagerank calculation: incoming and self links Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pin can be rewritten as: Outline Introduction Pin = (1 − )(1 − x)p Collusion and Pagerank Using the same trick for Pself , we can take s as the weighted Experiments in a average of the fraction of self-links of each page in the synthetic graph colluding set, and write: Experiments in a real Web graph Conclusions Pself = (1 − )xs
  • 24. Pagerank calculation summary Pagerank Increase under Collusion R. Baeza-Yates, Pin= (1-)(1-x)p C. Castillo and V. L´pez o Outline Introduction Collusion and Pjump= (M/N) Pagerank M nodes, Experiments in a N-M nodes, Pagerank= synthetic graph Pagerank= x 1-x Experiments in a real Web graph Conclusions Pself= (1-)xs
  • 25. Solving Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Solving the stationary state Pin + Pjump + Pself = x yields: Outline M Introduction + (1 − ) p N xnormal = Collusion and (p − s)(1 − ) + 1 Pagerank Experiments in a synthetic graph What happens when colluding ? Experiments in a real Web graph Colluding means pointing more links to the inside Conclusions This means s → s , with s > s, yielding xcolluding
  • 26. Solving Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Solving the stationary state Pin + Pjump + Pself = x yields: Outline M Introduction + (1 − ) p N xnormal = Collusion and (p − s)(1 − ) + 1 Pagerank Experiments in a synthetic graph What happens when colluding ? Experiments in a real Web graph Colluding means pointing more links to the inside Conclusions This means s → s , with s > s, yielding xcolluding
  • 27. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 28. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 29. Pagerank increase due to collusion Pagerank Increase under Collusion Making s = 1, all links from the colluding set go inside now: R. Baeza-Yates, C. Castillo and V. L´pez o 1−s xcolluding =1+ Outline xnormal p + 1− Introduction Collusion and Making s = p, originally the set was not colluding: Pagerank Experiments in a xcolluding 1 synthetic graph = p(1 − ) + xnormal Experiments in a real Web graph Conclusions Z This is inversely correlated to p, the original weighted fraction of links going to the colluding set
  • 30. Expected Pagerank change Pagerank Increase under xcolluding /xnormal as a function of p Collusion R. Baeza-Yates, C. Castillo and 7 V. L´pez o 1/ε Maximum pagerank change Outline 6 Introduction 5 Collusion and Pagerank Experiments in a 4 synthetic graph Experiments in a 3 real Web graph Conclusions 2 1 10-3 10-2 10-1 100 Weighted average of fraction of links to colluding nodes
  • 31. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 32. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 33. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 34. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 35. Experiments in a synthetic graph Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and Created using the generative model [Kumar et al., 2000] V. L´pez o Power-law distribution with parameter −2.1 for in-degree Outline and Pagerank, and −2.7 for out-degree, using parameters Introduction from [Pandurangan et al., 2002] Collusion and Pagerank 100,000–nodes scale-free graph Experiments in a synthetic graph Experiments in a Sampling by Pagerank real Web graph Divided in deciles, each decile has 1/10th of the Pagerank Conclusions Picked a group of 100 nodes at random from each decile Group 1 are low-ranked nodes, group 10 are high-ranked nodes
  • 36. Original Pagerank of the nodes Pagerank Increase under These are the original Pagerank values for each group Collusion R. Baeza-Yates, C. Castillo and 10-2 V. L´pez o Outline Originally very good -3 Introduction 10 Collusion and Pagerank values Pagerank Experiments in a 10-4 synthetic graph Experiments in a real Web graph 10-5 Conclusions Originally very bad Average 10-6 1 2 3 4 5 6 7 8 9 10 Group
  • 37. Modified Pagerank of the nodes Pagerank Increase under These are the modified Pagerank values when colluding. Collusion R. Baeza-Yates, C. Castillo and 10-2 V. L´pez o Outline -3 Introduction 10 Collusion and Pagerank values Pagerank Experiments in a 10-4 synthetic graph Experiments in a real Web graph 10-5 Conclusions Original Clique -6 10 1 2 3 4 5 6 7 8 9 10 Group
  • 38. Distribution of Pagerank Pagerank i But Pagerank values follow a power law distribution ... Increase under Collusion R. Baeza-Yates, C. Castillo and 100 V. L´pez o x-2.1 Outline 10-1 Introduction Collusion and Pagerank 10-2 Frequency Experiments in a synthetic graph 10-3 Experiments in a real Web graph Conclusions 10-4 10-5 -6 10-5 10-4 10-3 10-2 10 Pagerank value
  • 39. Modified Pagerank position of the nodes Pagerank These are the modified Pagerank positions (rankings) when Increase under Collusion colluding. R. Baeza-Yates, C. Castillo and V. L´pez o 1.0 0.9 Outline Introduction 0.8 Collusion and 0.7 Pagerank Pagerank ranking 0.6 Experiments in a synthetic graph 0.5 Experiments in a real Web graph 0.4 Conclusions 0.3 0.2 Original 0.1 Clique 0.0 1 2 3 4 5 6 7 8 9 10 Group
  • 40. Variation of Pagerank when colluding Pagerank Increase under These are the ratio of xcolluding /xoriginal Collusion R. Baeza-Yates, C. Castillo and 7 V. L´pez o 1/ε − Change in Pagerank value Change in ranking 6 Outline New value / original value Introduction 5 Collusion and Pagerank 4 Experiments in a synthetic graph 3 Experiments in a real Web graph 2 Conclusions 1 0 1 2 3 4 5 6 7 8 9 10 Group
  • 41. It is not necessary to create a clique Pagerank Spammers can use a fraction of the links to try to avoid Increase under Collusion detection R. Baeza-Yates, C. Castillo and 7 Full clique V. L´pez o 1/ε − 95% 90% 85% Outline 6 80% 75% Introduction 70% New Pagerank / original Pagerank 65% Collusion and 60% 5 55% Pagerank 50% 45% Experiments in a 40% synthetic graph 4 35% 30% Experiments in a 25% real Web graph 20% 15% 3 10% Conclusions 05% 2 1 1 2 3 4 5 6 7 8 9 10 Group In the paper, other topologies: star and ring
  • 42. It is not necessary to create a clique Pagerank Spammers can use a fraction of the links to try to avoid Increase under Collusion detection R. Baeza-Yates, C. Castillo and 7 Full clique V. L´pez o 1/ε − 95% 90% 85% Outline 6 80% 75% Introduction 70% New Pagerank / original Pagerank 65% Collusion and 60% 5 55% Pagerank 50% 45% Experiments in a 40% synthetic graph 4 35% 30% Experiments in a 25% real Web graph 20% 15% 3 10% Conclusions 05% 2 1 1 2 3 4 5 6 7 8 9 10 Group In the paper, other topologies: star and ring
  • 43. Experiments in a real Web graph Pagerank Increase under Hostgraph of 310,486 Websites from Spain Collusion R. Baeza-Yates, C. Castillo and 100 V. L´pez o x-2.1 10-1 Outline Introduction -2 Collusion and 10 Pagerank Frequency Experiments in a 10-3 synthetic graph Experiments in a 10-4 real Web graph Conclusions 10-5 10-6 -6 10-5 10-4 10-3 10-2 10 Pagerank value
  • 44. Experiments in a real Web graph Pagerank Increase under Some of the nodes are already colluding [Fetterly et al., 2004] Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 45. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 46. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 47. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 48. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 49. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 50. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 51. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 52. We study a set of Web sites Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o We picked a set of 242 reputable sites with rank ≈ 0.75, and Outline we modify their links. Introduction Disconnect the group Collusion and Pagerank Create a ring Experiments in a Add a central page linking to all of them synthetic graph Experiments in a Add a central page linking to and from all of them (star) real Web graph Fully connect the group (clique) Conclusions Now we measure the new ranking
  • 53. New rankings under graph modifications Pagerank Increase under Collusion 1 R. Baeza-Yates, C. Castillo and 0.9 V. L´pez o 0.8 Outline 0.7 Introduction Rankings 0.6 Collusion and Pagerank 0.5 Experiments in a 0.4 synthetic graph 0.3 Experiments in a real Web graph 0.2 Conclusions 0.1 0 Normal Ring Star Disconnected Central Inv. Ring Clique Strategy
  • 54. Adding 5%-50% of complete subgraph Pagerank Increase under Collusion 1.000 R. Baeza-Yates, C. Castillo and V. L´pez o 0.995 Outline Introduction Rankings Collusion and Pagerank 0.990 Experiments in a synthetic graph Experiments in a 0.985 real Web graph Conclusions Average ranking 0.980 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percent of links of a complete subgraph The best sites also increase their ranking
  • 55. Adding 5%-50% of complete subgraph Pagerank Increase under Collusion 1.000 R. Baeza-Yates, C. Castillo and V. L´pez o 0.995 Outline Introduction Rankings Collusion and Pagerank 0.990 Experiments in a synthetic graph Experiments in a 0.985 real Web graph Conclusions Average ranking 0.980 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percent of links of a complete subgraph The best sites also increase their ranking
  • 56. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 57. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 58. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 59. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 60. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 61. Conclusions Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V Any group of nodes can increase its Pagerank V. L´pez o V Nodes with high Pagerank gain less by colluding Outline Ideas for link spam detection Introduction Collusion and X Only detecting regularities can fail to detect randomized Pagerank structures Experiments in a synthetic graph X Only detecting nepotistic links can give false positives Experiments in a real Web graph Conclusions V Use evidence from multiple sources
  • 62. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Outline Introduction Collusion and Thank you Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions
  • 63. Pagerank Increase under Collusion Clausen, A. (2004). R. Baeza-Yates, The cost of attack of PageRank. C. Castillo and In Proceedings of the international conference on agents, Web V. L´pez o technologies and Internet commerce (IAWTIC), Gold Coast, Australia. Outline Fetterly, D., Manasse, M., and Najork, M. (2004). Introduction Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages. Collusion and Pagerank In Proceedings of the seventh workshop on the Web and databases (WebDB), Paris, France. Experiments in a synthetic graph Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., Experiments in a and Upfal, E. (2000). real Web graph Stochastic models for the web graph. Conclusions In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEE CS Press. Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The Pagerank citation algorithm: bringing order to the web. Technical report, Stanford Digital Library Technologies Project.
  • 64. Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´pez o Pandurangan, G., Raghavan, P., and Upfal, E. (2002). Outline Using Pagerank to characterize Web structure. Introduction In Proceedings of the 8th Annual International Computing and Combinatorics Conference (COCOON), volume 2387 of Lecture Notes in Collusion and Pagerank Computer Science, pages 330–390, Singapore. Springer. Experiments in a Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004). synthetic graph Making eigenvector-based reputation systems robust to collusion. Experiments in a In Proceedings of the third Workshop on Web Graphs (WAW), volume real Web graph 3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy. Conclusions Springer.