SlideShare ist ein Scribd-Unternehmen logo
1 von 141
Downloaden Sie, um offline zu lesen
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                                Link Analysis on the Web
Functional
Rankings
                   The big picture, the small picture and the medium-sized picture
Web Spam

Web Spam
Detection

                                   Ricardo Baeza-Yates3,4
Topological Web
Spam
                    Joint work with: L. Becchetti1 , P. Boldi2 , C. Castillo1,3 ,
Direct Counting
                            D. Donato1,3 , S. Leonardi1 , B. Poblete5
of Supporters

Spam Detection
Results
                            1. Universit` di Roma “La Sapienza” – Rome, Italy
                                        a
                              2. Univerit` degli Studi di Milano – Milan, Italy
                                         a
                             3. Yahoo! Research Barcelona – Catalunya, Spain
                            4. Yahoo! Research Latin America – Santiago, Chile
                              5. Universitat Pompeu Fabra – Catalunya, Spain
Link Analysis on
    the Web

                       Levels of Link Analysis
                   1
Levels of Link
Analysis
                       Generalizing PageRank
                   2
Generalizing
PageRank

Other
                       Other Functional Rankings
                   3
Functional
Rankings

Web Spam
                       Web Spam
                   4
Web Spam
Detection

                       Web Spam Detection
Topological Web    5
Spam

Direct Counting
of Supporters
                       Topological Web Spam
                   6
Spam Detection
Results

                       Direct Counting of Supporters
                   7


                       Spam Detection Results
                   8
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                       Levels of Link Analysis
                   1
Other
                       Generalizing PageRank
                   2
Functional
                       Other Functional Rankings
Rankings           3
                       Web Spam
                   4
Web Spam

                       Web Spam Detection
                   5
Web Spam
Detection
                       Topological Web Spam
                   6
Topological Web
                       Direct Counting of Supporters
                   7
Spam
                       Spam Detection Results
                   8
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   How to find meaningful patterns?
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                   Several levels of analysis:
Web Spam

Web Spam
                        Macroscopic view: overall structure
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   How to find meaningful patterns?
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                   Several levels of analysis:
Web Spam

Web Spam
                        Macroscopic view: overall structure
Detection

                        Microscopic view: nodes
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   How to find meaningful patterns?
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                   Several levels of analysis:
Web Spam

Web Spam
                        Macroscopic view: overall structure
Detection

                        Microscopic view: nodes
Topological Web
Spam
                        Mesoscopic view: regions
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Macroscopic view, e.g. Bow-tie
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results




                   [Broder et al., 2000]
Link Analysis on
                   Macroscopic view, e.g. Bow-tie, migration
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results




                   [Baeza-Yates and Poblete, 2006]
Link Analysis on
                   Macroscopic view, e.g. Jellyfish
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results




                   [Tauro et al., 2001] - Internet Autonomous Systems (AS)
                   Topology
Link Analysis on
                   Macroscopic view, e.g. Jellyfish
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Microscopic view, e.g. Degree
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results




                   [Barab´si, 2002] and others
                         a
Link Analysis on
                   Microscopic view, e.g. Degree
    the Web



                                 Greece                       Chile
Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

                                 Spain                       Korea
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results




                   [Baeza-Yates et al., 2006b] - compares this distribution in 8
                   countries . . . guess what is the result?
Link Analysis on
                   Mesoscopic view, e.g. Hop-plot
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Mesoscopic view, e.g. Hop-plot
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Mesoscopic view, e.g. Hop-plot
    the Web



Levels of Link
Analysis
                                   .it (40M pages)                                      .uk (18M pages)
Generalizing                      0.3                                                  0.3
PageRank

Other                             0.2                                                  0.2
                      Frequency




                                                                           Frequency
Functional
Rankings
                                  0.1                                                  0.1
Web Spam

Web Spam                          0.0                                                  0.0
                                        5   10     15     20   25   30                       5   10     15     20   25   30
Detection
                                                 Distance                                             Distance
Topological Web
                         .eu.int (800K pages)                            Synthetic graph (100K pages)
Spam

Direct Counting                   0.3                                                  0.3
of Supporters

Spam Detection                    0.2                                                  0.2
                      Frequency




                                                                           Frequency
Results

                                  0.1                                                  0.1


                                  0.0                                                  0.0
                                        5   10     15     20   25   30                       5   10     15     20   25   30
                                                 Distance                                             Distance

                   [Baeza-Yates et al., 2006a]
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                       Levels of Link Analysis
                   1
Other
                       Generalizing PageRank
                   2
Functional
                       Other Functional Rankings
Rankings           3
                       Web Spam
                   4
Web Spam

                       Web Spam Detection
                   5
Web Spam
Detection
                       Topological Web Spam
                   6
Topological Web
                       Direct Counting of Supporters
                   7
Spam
                       Spam Detection Results
                   8
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Notation
    the Web



Levels of Link
Analysis

Generalizing
                   Let PN×N be the normalized link matrix of a graph
PageRank

                       Row-normalized
Other
Functional
Rankings
                       No “sinks”
Web Spam

                   Definition (PageRank)
Web Spam
Detection
                   Stationary state of:
Topological Web
Spam

                                                 (1 − α)
Direct Counting
                                          αP +           1N×N
of Supporters
                                                    N
Spam Detection
Results
Link Analysis on
                   Notation
    the Web



Levels of Link
Analysis

Generalizing
                   Let PN×N be the normalized link matrix of a graph
PageRank

                       Row-normalized
Other
Functional
Rankings
                       No “sinks”
Web Spam

                   Definition (PageRank)
Web Spam
Detection
                   Stationary state of:
Topological Web
Spam

                                                 (1 − α)
Direct Counting
                                          αP +           1N×N
of Supporters
                                                    N
Spam Detection
Results

                       Follow links with probability α
                       Random jump with probability 1 − α
Link Analysis on
                   Explicit Formulas
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   Formulas for PageRank
Other
Functional
                   [Newman et al., 2001, Boldi et al., 2005]
Rankings

Web Spam
                                                          ∞
                                               (1 − α)
Web Spam
                                                               (αP)t .
                                      r(α) =
Detection
                                                  N
                                                         t=0
Topological Web
Spam

                                                    (1 − α)α|p|
Direct Counting
                           ri (α) =                             branching(p)
of Supporters
                                                        N
Spam Detection                        p∈Path(−,i)
Results
Link Analysis on
                   Explicit Formulas
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   Formulas for PageRank
Other
Functional
                   [Newman et al., 2001, Boldi et al., 2005]
Rankings

Web Spam
                                                           ∞
                                                (1 − α)
Web Spam
                                                                (αP)t .
                                       r(α) =
Detection
                                                   N
                                                          t=0
Topological Web
Spam

                                                     (1 − α)α|p|
Direct Counting
                            ri (α) =                             branching(p)
of Supporters
                                                         N
Spam Detection                         p∈Path(−,i)
Results

                   Path(−, i) are incoming paths in node i
Link Analysis on
                   Branching contribution
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                   Definition (Branching contribution of a path)
Other
Functional
                   Given a path p = x1 , x2 , . . . , xt of length t = |p|
Rankings

Web Spam
                                                                1
                                     branching(p) =
Web Spam
                                                         d1 d2 · · · dt−1
Detection

Topological Web
                   where di are the out-degrees of the members of the path
Spam

Direct Counting
                   For every node i and every length t
of Supporters

Spam Detection
Results
                                                       branching(p) = 1.
                                   p∈Path(i,−),|p|=t
Link Analysis on
                   Functional ranking
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                   General functional ranking [Baeza-Yates et al., 2006a]
Web Spam

Web Spam
                                                   damping(|p|)
Detection
                          ri (α) =                              branching(p)
                                                       N
Topological Web
                                     p∈Path(−,i)
Spam

Direct Counting
                   PageRank is a particular case of path-based ranking
of Supporters

Spam Detection
Results
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                       Levels of Link Analysis
                   1
Other
                       Generalizing PageRank
                   2
Functional
                       Other Functional Rankings
Rankings           3
                       Web Spam
                   4
Web Spam

                       Web Spam Detection
                   5
Web Spam
Detection
                       Topological Web Spam
                   6
Topological Web
                       Direct Counting of Supporters
                   7
Spam
                       Spam Detection Results
                   8
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Exponential damping = PageRank
    the Web



Levels of Link
                                   0.30
Analysis
                                                    damping(t) with α=0.8
                                                    damping(t) with α=0.7
Generalizing
PageRank

Other
                                   0.20
Functional

                          Weight
Rankings

Web Spam

Web Spam
                                   0.10
Detection

Topological Web
Spam

Direct Counting
                                   0.00
of Supporters
                                          1     2     345678                   9 10
Spam Detection
                                                      Length of the path (t)
Results



                   Exponential damping = PageRank
                                              damping(t) = α(1 − α)t
                   Most of the contribution is on the first few levels.
Link Analysis on
                   Linear damping
    the Web




                                  0.30
Levels of Link
                                                 damping(t) with L=15
Analysis

                                                 damping(t) with L=10
Generalizing
PageRank

                                  0.20
Other
Functional

                         Weight
Rankings

Web Spam

                                  0.10
Web Spam
Detection

Topological Web
Spam

                                  0.00
Direct Counting
of Supporters
                                         1   2     345678                   9 10
Spam Detection
                                                   Length of the path (t)
Results


                   Linear damping
                                                           2(L−t)
                                                                    t<L
                                                           L(L+1)
                                     damping(t) =
                                                                    t≥L
                                                          0
Link Analysis on
                   Example: Calculating LinearRank
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   For calculating LinearRank we use:
Other
Functional
Rankings
                                                    ∞
                                                1
Web Spam
                                                          damping(t)Pt
                              LinearRank =
                                                N
Web Spam
                                                    t=0
Detection

                                                    L−1
Topological Web
                                                          2(L − t) t
                                                1
Spam
                                           =                       P
                                                N         L(L + 1)
Direct Counting
                                                    t=0
of Supporters

Spam Detection
Results
Link Analysis on
                   Example: Calculating LinearRank
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   For calculating LinearRank we use:
Other
Functional
Rankings
                                                    ∞
                                                1
Web Spam
                                                          damping(t)Pt
                              LinearRank =
                                                N
Web Spam
                                                    t=0
Detection

                                                    L−1
Topological Web
                                                          2(L − t) t
                                                1
Spam
                                           =                       P
                                                N         L(L + 1)
Direct Counting
                                                    t=0
of Supporters

Spam Detection
Results
                   However, we cannot hold the temporary Pt in memory!
Link Analysis on
                   Re-write the damping as a recursion
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                   We have to rewrite to be able to calculate:
Other
Functional
                                                   2
Rankings
                                       R(0) =
Web Spam
                                                 L+1
Web Spam
                                                 (L − k − 1) (k)
Detection
                                    R(k+1) =                RP
                                                   (L − k)
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Re-write the damping as a recursion
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                   We have to rewrite to be able to calculate:
Other
Functional
                                                   2
Rankings
                                       R(0) =
Web Spam
                                                 L+1
Web Spam
                                                 (L − k − 1) (k)
Detection
                                    R(k+1) =                RP
                                                   (L − k)
Topological Web
Spam
                                                 L−1
Direct Counting
                                                       R(k)
                                LinearRank =
of Supporters

Spam Detection                                   k=0
Results
Link Analysis on
                   Re-write the damping as a recursion
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                   We have to rewrite to be able to calculate:
Other
Functional
                                                    2
Rankings
                                       R(0) =
Web Spam
                                                  L+1
Web Spam
                                                  (L − k − 1) (k)
Detection
                                    R(k+1) =                 RP
                                                    (L − k)
Topological Web
Spam
                                                  L−1
Direct Counting
                                                        R(k)
                                LinearRank =
of Supporters

Spam Detection                                   k=0
Results

                   Now we can give the algorithm . . .
Link Analysis on
                   Algorithm
    the Web



Levels of Link
                       for i : 1 . . . N do {Initialization}
                    1:
Analysis
                                                2
                         Score[i] ← R[i] ← L+1
                    2:
Generalizing
PageRank
                    3: end for
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Algorithm
    the Web



Levels of Link
                         for i : 1 . . . N do {Initialization}
                    1:
Analysis
                                                  2
                           Score[i] ← R[i] ← L+1
                    2:
Generalizing
PageRank
                         end for
                    3:
Other
                         for k : 1 . . . L − 1 do {Iteration step}
                    4:
Functional
Rankings
                           Aux ← 0
                    5:
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Algorithm
    the Web



Levels of Link
                          for i : 1 . . . N do {Initialization}
                     1:
Analysis
                                                   2
                            Score[i] ← R[i] ← L+1
                     2:
Generalizing
PageRank
                          end for
                     3:
Other
                          for k : 1 . . . L − 1 do {Iteration step}
                     4:
Functional
Rankings
                            Aux ← 0
                     5:
Web Spam
                            for i : 1 . . . N do {Follow links in the graph}
                     6:
Web Spam
                                for all j such that there is a link from i to j do
                     7:
Detection

                                  Aux[j] ← Aux[j] + R[i]/outdegree(i)
Topological Web      8:
Spam
                                end for
                     9:
Direct Counting
                            end for
of Supporters       10:
Spam Detection
Results
Link Analysis on
                   Algorithm
    the Web



Levels of Link
                          for i : 1 . . . N do {Initialization}
                     1:
Analysis
                                                   2
                            Score[i] ← R[i] ← L+1
                     2:
Generalizing
PageRank
                          end for
                     3:
Other
                          for k : 1 . . . L − 1 do {Iteration step}
                     4:
Functional
Rankings
                            Aux ← 0
                     5:
Web Spam
                            for i : 1 . . . N do {Follow links in the graph}
                     6:
Web Spam
                                for all j such that there is a link from i to j do
                     7:
Detection

                                  Aux[j] ← Aux[j] + R[i]/outdegree(i)
Topological Web      8:
Spam
                                end for
                     9:
Direct Counting
                            end for
of Supporters       10:
                            for i : 1 . . . N do {Add to ranking value}
Spam Detection
                    11:
Results
                                R[i] ← Aux[i] × (L−k−1)
                    12:                            (L−k)
                                Score[i] ← Score[i] + R[i]
                    13:
                            end for
                    14:
                          end for
                    15:
                          return Score
                    16:
Link Analysis on
                   Algorithm (general)
    the Web



Levels of Link
                          for i : 1 . . . N do {Initialization}
                     1:
Analysis

                            Score[i] ← R[i] ← INIT
                     2:
Generalizing
PageRank
                          end for
                     3:
Other
                          for k : 1 . . . STOP do {Iteration step}
                     4:
Functional
Rankings
                            Aux ← 0
                     5:
Web Spam
                            for i : 1 . . . N do {Follow links in the graph}
                     6:
Web Spam
                                for all j such that there is a link from i to j do
Detection            7:
                                  Aux[j] ← Aux[j] + R[i]/outdegree(i)
Topological Web
                     8:
Spam
                                end for
                     9:
Direct Counting
of Supporters
                            end for
                    10:
Spam Detection
                            for i : 1 . . . N do {Add to ranking value}
                    11:
Results
                                R[i] ← Aux[i] × FACTOR
                    12:
                                Score[i] ← Score[i] + R[i]
                    13:
                            end for
                    14:
                          end for
                    15:
                          return Score
                    16:
Link Analysis on
                   Other damping functions
    the Web



Levels of Link
Analysis

                   Empirical damping:
Generalizing
PageRank

                                                     0.7
Other
Functional
Rankings

                           Average text similarity   0.6
Web Spam

Web Spam
                                                     0.5
Detection

Topological Web
Spam
                                                     0.4
Direct Counting
of Supporters
                                                     0.3
Spam Detection
Results

                                                     0.2
                                                           1   2         3       4   5
                                                                   Link distance
Link Analysis on
                   Using LinearRank to approximage PageRank
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Experimental comparison: 18-million nodes in the U.K. Web
Rankings

Web Spam
                   Graph
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Using LinearRank to approximage PageRank
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Experimental comparison: 18-million nodes in the U.K. Web
Rankings

Web Spam
                   Graph
Web Spam
                       Calculated PageRank with α = 0.1, 0.2, . . . , 0.9
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Using LinearRank to approximage PageRank
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Experimental comparison: 18-million nodes in the U.K. Web
Rankings

Web Spam
                   Graph
Web Spam
                       Calculated PageRank with α = 0.1, 0.2, . . . , 0.9
Detection

Topological Web
                       Calculated LinearRank with L = 5, 10, . . . , 25
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Using LinearRank to approximage PageRank
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Experimental comparison: 18-million nodes in the U.K. Web
Rankings

Web Spam
                   Graph
Web Spam
                       Calculated PageRank with α = 0.1, 0.2, . . . , 0.9
Detection

Topological Web
                       Calculated LinearRank with L = 5, 10, . . . , 25
Spam

                   For certain combinations of parameters, the rankings are
Direct Counting
of Supporters
                   almost equal!
Spam Detection
Results
Link Analysis on
                   Experimental comparison
    the Web



Levels of Link
Analysis
                      Experimental Comparison in the U.K. Web Graph
Generalizing
PageRank

Other
Functional
                           1.00
Rankings

                           0.95
Web Spam
                       τ
                           0.90
Web Spam
Detection
                           0.85
                                                                                τ ≥ 0.95
Topological Web
                           0.80
Spam

Direct Counting
of Supporters
                              25
Spam Detection
                                   20
Results
                                                                                0.9
                                        15
                                   L                                      0.8
                                             10                 0.7
                                                                      α
                                                          0.6
                                                  5 0.5
Link Analysis on
                   Prediction of best parameter combination
    the Web



Levels of Link
Analysis
                    Prediction of Best Parameter Combinations (Analysis)
Generalizing
PageRank
                                                     25
                                                                                  Actual optimum
Other
                                                                  Predicted optimum with length=5
Functional
Rankings
                      L that maximizes Kendall’s τ
                                                     20
Web Spam

Web Spam
Detection
                                                     15
Topological Web
Spam

                                                     10
Direct Counting
of Supporters

Spam Detection
Results
                                                      5


                                                          0.5   0.6         0.7             0.8     0.9
                                                                         Exponent α
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                       Levels of Link Analysis
                   1
Other
                       Generalizing PageRank
                   2
Functional
                       Other Functional Rankings
Rankings           3
                       Web Spam
                   4
Web Spam

                       Web Spam Detection
                   5
Web Spam
Detection
                       Topological Web Spam
                   6
Topological Web
                       Direct Counting of Supporters
                   7
Spam
                       Spam Detection Results
                   8
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   What is on the Web?
    the Web



                   Information
Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   What is on the Web?
    the Web



                   Information + Porn
Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   What is on the Web?
    the Web



                   Information + Porn + On-line casinos + Free movies +
Levels of Link
Analysis
                   Cheap software + Buy a MBA diploma + Prescription -free
Generalizing
                   drugs + V!-4-gra + Get rich now now now!!!
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results




                                    Graphic: www.milliondollarhomepage.com
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
                          Keyword stuffing
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
                          Keyword stuffing
Functional
Rankings
                          Link farms
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
                          Keyword stuffing
Functional
Rankings
                          Link farms
Web Spam
                          Scraper, “Made for Advertising” sites
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
                          Keyword stuffing
Functional
Rankings
                          Link farms
Web Spam
                          Scraper, “Made for Advertising” sites
Web Spam
                          Spam blogs (splogs)
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
                          Keyword stuffing
Functional
Rankings
                          Link farms
Web Spam
                          Scraper, “Made for Advertising” sites
Web Spam
                          Spam blogs (splogs)
Detection
                          Cloaking
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
                           Keyword stuffing
Functional
Rankings
                           Link farms
Web Spam
                           Scraper, “Made for Advertising” sites
Web Spam
                           Spam blogs (splogs)
Detection
                           Cloaking
Topological Web
Spam
                       Click spam
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
                           Keyword stuffing
Functional
Rankings
                           Link farms
Web Spam
                           Scraper, “Made for Advertising” sites
Web Spam
                           Spam blogs (splogs)
Detection
                           Cloaking
Topological Web
Spam
                       Click spam
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Opportunities for Web spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                     V Spamdexing
Other
                            Keyword stuffing
Functional
Rankings
                            Link farms
Web Spam
                            Scraper, “Made for Advertising” sites
Web Spam
                            Spam blogs (splogs)
Detection
                            Cloaking
Topological Web
Spam
                       Click spam
Direct Counting
of Supporters

                   Adversarial relationship
Spam Detection
Results
                   Every undeserved gain in ranking for a spammer, is a loss of
                   precision for the search engine.
Link Analysis on
                   Typical Web Spam (1)
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Typical Web Spam (2)
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Hidden text
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Made for Advertising (1)
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Made for Advertising (2)
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Made for Advertising (3)
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Search engine?
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Fake search engine
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Problem: “normal” pages that are spam
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                       Levels of Link Analysis
                   1
Other
                       Generalizing PageRank
                   2
Functional
                       Other Functional Rankings
Rankings           3
                       Web Spam
                   4
Web Spam

                       Web Spam Detection
                   5
Web Spam
Detection
                       Topological Web Spam
                   6
Topological Web
                       Direct Counting of Supporters
                   7
Spam
                       Spam Detection Results
                   8
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Machine Learning
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Machine Learning (cont.)
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Feature Extraction
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Challenges: Machine Learning
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                   Machine Learning Challenges:
Web Spam

Web Spam
                       Learning with inter dependent variables (graph)
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Challenges: Machine Learning
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                   Machine Learning Challenges:
Web Spam

Web Spam
                       Learning with inter dependent variables (graph)
Detection

                       Learning with few examples
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Challenges: Machine Learning
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                   Machine Learning Challenges:
Web Spam

Web Spam
                       Learning with inter dependent variables (graph)
Detection

                       Learning with few examples
Topological Web
Spam
                       Scalability
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Challenges: Information Retrieval
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Information Retrieval Challenges:
Rankings

                       Feature extraction: which features?
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Challenges: Information Retrieval
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Information Retrieval Challenges:
Rankings

                       Feature extraction: which features?
Web Spam

Web Spam
                       Feature aggregation: page/host/domain
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Challenges: Information Retrieval
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Information Retrieval Challenges:
Rankings

                       Feature extraction: which features?
Web Spam

Web Spam
                       Feature aggregation: page/host/domain
Detection

Topological Web
                       Feature propagation (graph)
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Challenges: Information Retrieval
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Information Retrieval Challenges:
Rankings

                       Feature extraction: which features?
Web Spam

Web Spam
                       Feature aggregation: page/host/domain
Detection

Topological Web
                       Feature propagation (graph)
Spam

                       Recall/precision tradeoffs
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Challenges: Information Retrieval
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Information Retrieval Challenges:
Rankings

                       Feature extraction: which features?
Web Spam

Web Spam
                       Feature aggregation: page/host/domain
Detection

Topological Web
                       Feature propagation (graph)
Spam

                       Recall/precision tradeoffs
Direct Counting
of Supporters
                       Scalability
Spam Detection
Results
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                       Levels of Link Analysis
                   1
Other
                       Generalizing PageRank
                   2
Functional
                       Other Functional Rankings
Rankings           3
                       Web Spam
                   4
Web Spam

                       Web Spam Detection
                   5
Web Spam
Detection
                       Topological Web Spam
                   6
Topological Web
                       Direct Counting of Supporters
                   7
Spam
                       Spam Detection Results
                   8
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Topological spam: link farms
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Topological spam: link farms
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results


                   Single-level farms can be detected by searching groups of
                   nodes sharing their out-links [Gibson et al., 2005]
Link Analysis on
                   Motivation
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   [Fetterly et al., 2004] hypothesized that studying the
Rankings

                   distribution of statistics about pages could be a good way of
Web Spam

Web Spam
                   detecting spam pages:
Detection

Topological Web
                   “in a number of these distributions, outlier values are
Spam

Direct Counting
                   associated with web spam”
of Supporters

Spam Detection
Results
Link Analysis on
                   Test collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                    U.K. collection
Other
Functional
Rankings
                    18.5 million pages downloaded from the .UK domain
Web Spam

                    5,344 hosts manually classified (6% of the hosts)
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Test collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                    U.K. collection
Other
Functional
Rankings
                    18.5 million pages downloaded from the .UK domain
Web Spam

                    5,344 hosts manually classified (6% of the hosts)
Web Spam
Detection

Topological Web
Spam

Direct Counting
                    Classified entire hosts:
of Supporters

Spam Detection
                      V A few hosts are mixed: spam and non-spam pages
Results

                      X More coverage: sample covers 32% of the pages
Link Analysis on
                   In-degree
    the Web




                                                     δ = 0.35
                                      In−degree
Levels of Link
Analysis

Generalizing
                                                                       Normal
PageRank
                        0.4                                            Spam
Other
Functional
Rankings

                        0.3
Web Spam

Web Spam
Detection

Topological Web
                        0.2
Spam

Direct Counting
of Supporters

Spam Detection
                        0.1
Results




                         0
                          1                100                 10000
                                          Number of in−links
                               (δ = max. difference in C.D.F. plot)
Link Analysis on
                   Out-degree
    the Web



Levels of Link
                                               δ = 0.28
                                Out−degree
Analysis

                        0.3
Generalizing
                                                             Normal
PageRank

                                                             Spam
Other
Functional
Rankings

Web Spam
                        0.2
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters
                        0.1
Spam Detection
Results




                         0
                          1           10               50   100
                                   Number of out−links
Link Analysis on
                   Edge reciprocity
    the Web



Levels of Link
                                                                      δ = 0.35
                               Reciprocity of max. PR page
Analysis

                        0.5
Generalizing
                                                                             Normal
PageRank

                                                                             Spam
Other
Functional
                        0.4
Rankings

Web Spam

Web Spam
                        0.3
Detection

Topological Web
Spam

                        0.2
Direct Counting
of Supporters

Spam Detection
Results
                        0.1


                          0
                           0        0.2         0.4         0.6        0.8        1
                                          Fraction of reciprocal links
Link Analysis on
                   Assortativity
    the Web



Levels of Link

                                                                    δ = 0.31
                               Degree / Degree of neighbors
Analysis

Generalizing
                         0.4
PageRank
                                                                           Normal
                                                                           Spam
Other
Functional
Rankings
                         0.3
Web Spam

Web Spam
Detection

Topological Web
                         0.2
Spam

Direct Counting
of Supporters

Spam Detection
                         0.1
Results




                          0
                         0.001     0.01    0.1       1         10    100      1000
                                      Degree/Degree ratio of home page
Link Analysis on
                   Variance of PageRank
    the Web


                              Suggested in [Bencz´r et al., 2005]
                                                 u
Levels of Link
Analysis

Generalizing
PageRank
                          PageRank                          PageRank
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Variance of PageRank of in-neighbors
    the Web



Levels of Link

                         Stdev. of PR of Neighbors (Home) δ = 0.41
Analysis

Generalizing
PageRank
                                                                        Normal
                                                                        Spam
Other
                        0.3
Functional
Rankings

Web Spam

Web Spam
Detection
                        0.2
Topological Web
Spam

Direct Counting
of Supporters
                        0.1
Spam Detection
Results




                         0
                          0     0.2           0.4       0.6      0.8     1
                                      σ2 of the logarithm of PageRank
Link Analysis on
                   TrustRank
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                   TrustRank [Gy¨ngyi et al., 2004]
                                o
Functional
Rankings

                   A node with high PageRank, but far away from a core set of
Web Spam

                   “trusted nodes” is suspicious
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   TrustRank
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                   TrustRank [Gy¨ngyi et al., 2004]
                                o
Functional
Rankings

                   A node with high PageRank, but far away from a core set of
Web Spam

                   “trusted nodes” is suspicious
Web Spam
Detection

                   Start from a set of trusted nodes, then do a random walk,
Topological Web
Spam
                   returning to the set of trusted nodes with probability 1 − α at
Direct Counting
                   each step
of Supporters

Spam Detection
Results
                   i Trusted nodes:   data from http://www.dmoz.org/
Link Analysis on
                   TrustRank Idea
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   TrustRank score
    the Web



Levels of Link

                                                             δ = 0.59
Analysis
                              TrustRank score of home page
Generalizing
PageRank
                                                                Normal
                        0.4                                     Spam
Other
Functional
Rankings

Web Spam
                        0.3
Web Spam
Detection

Topological Web
Spam
                        0.2
Direct Counting
of Supporters

Spam Detection
                        0.1
Results




                         0
                                            1e−06                  0.001
                                            TrustRank
Link Analysis on
                   TrustRank / PageRank
    the Web



Levels of Link

                                                                    δ = 0.59
Analysis
                        Estimated relative non−spam mass
Generalizing
PageRank
                                                                     Normal
                        0.8
                                                                     Spam
Other
Functional
                        0.7
Rankings

Web Spam
                        0.6
Web Spam
                        0.5
Detection

Topological Web
                        0.4
Spam

Direct Counting
                        0.3
of Supporters

Spam Detection
                        0.2
Results

                        0.1

                         0
                              0.3   1                10            100
                                        TrustRank score/PageRank
Link Analysis on
                   Truncated PageRank
    the Web



Levels of Link
Analysis

Generalizing
                   Proposed in [Becchetti et al., 2006b]. Idea: reduce the direct
PageRank

                   contribution of the first levels of links:
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
                                                           t≤T
                                                    0
Results
                                   damping(t) =
                                                    C αt   t>T
Link Analysis on
                   Truncated PageRank
    the Web



Levels of Link
Analysis

Generalizing
                   Proposed in [Becchetti et al., 2006b]. Idea: reduce the direct
PageRank

                   contribution of the first levels of links:
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
                                                           t≤T
                                                    0
Results
                                   damping(t) =
                                                    C αt   t>T
                   V No extra reading of the graph after PageRank
Link Analysis on
                   Truncated PageRank(T=2) / PageRank
    the Web



Levels of Link
Analysis
                        TruncatedPageRank T=2 / PageRank δ = 0.30
Generalizing
PageRank
                                                                   Normal
Other
                                                                   Spam
                        0.3
Functional
Rankings

Web Spam

Web Spam
Detection
                        0.2
Topological Web
Spam

Direct Counting
of Supporters
                        0.1
Spam Detection
Results




                         0
                         0.2   0.4    0.6   0.8     1      1.2   1.4   1.6
                                 TruncatedPageRank(T=2) / PageRank
Link Analysis on
                   Max. change of Truncated PageRank
    the Web



Levels of Link
Analysis

                        Maximum change of Truncated PageRank δ = 0.29
Generalizing
PageRank
                                                                      Normal
Other
                                                                      Spam
Functional
Rankings
                          0.2
Web Spam

Web Spam
Detection

Topological Web
Spam
                          0.1
Direct Counting
of Supporters

Spam Detection
Results


                           0
                           0.85   0.9   0.95     1       1.05   1.1
                                          max(TrPRi+1/TrPri)
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                       Levels of Link Analysis
                   1
Other
                       Generalizing PageRank
                   2
Functional
                       Other Functional Rankings
Rankings           3
                       Web Spam
                   4
Web Spam

                       Web Spam Detection
                   5
Web Spam
Detection
                       Topological Web Spam
                   6
Topological Web
                       Direct Counting of Supporters
                   7
Spam
                       Spam Detection Results
                   8
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   High and low-ranked pages are different
    the Web



                                                  4
Levels of Link
                                               x 10
Analysis
                                                                     Top 0%−10%
                                          12
Generalizing
                                                                     Top 40%−50%
PageRank
                                                                     Top 60%−70%
Other
                                          10
                        Number of Nodes
Functional
Rankings

                                          8
Web Spam

Web Spam
Detection
                                          6
Topological Web
Spam
                                          4
Direct Counting
of Supporters

                                          2
Spam Detection
Results


                                          0
                                           1          5     10       15        20
                                                          Distance
Link Analysis on
                   High and low-ranked pages are different
    the Web



                                                   4
Levels of Link
                                                x 10
Analysis
                                                                      Top 0%−10%
                                           12
Generalizing
                                                                      Top 40%−50%
PageRank
                                                                      Top 60%−70%
Other
                                           10
                         Number of Nodes
Functional
Rankings

                                           8
Web Spam

Web Spam
Detection
                                           6
Topological Web
Spam
                                           4
Direct Counting
of Supporters

                                           2
Spam Detection
Results


                                           0
                                            1          5     10       15        20
                                                           Distance
                   Areas below the curves are equal if we are in the same
                   strongly-connected component
Link Analysis on
                   Probabilistic counting
    the Web



Levels of Link
Analysis
                                                                        1
                            1
Generalizing                                                            0
                            0
PageRank                                                                0
                            0
                                                                        0
                            0
Other                                     0                                     1
                                    1                                                     1
                                                                        1
                            1
Functional                                0                                     0
                                    1                                                     1
                                                                        0
                            0
Rankings                                                                        0
                                    0     0                                               0
                                                  Propagation of                0
                                    0                                                     1
                                          1
Web Spam                                           bits using the               1
                                    0                                                     1
                                          1
                                                 “OR” operation                 1
                                    0                                                     1
                                          0
Web Spam
Detection
                                                                            1
                                        Target
                                0                                                   Count bits set
Topological Web                                                             0
                                         page
                                0                                                    to estimate
Spam                                                                        0
                                0                                                    supporters
                                                                            0
                                0
Direct Counting                                                     1
                        1                                                   1
                                1
of Supporters                                                       0
                        0                                                   1
                                1
                                                                    0
                        0
Spam Detection                                                      0
                        0
Results                                                             1
                        1
                                                                    0
                        0
Link Analysis on
                   Probabilistic counting
    the Web



Levels of Link
Analysis
                                                                        1
                            1
Generalizing                                                            0
                            0
PageRank                                                                0
                            0
                                                                        0
                            0
Other                                     0                                     1
                                    1                                                     1
                                                                        1
                            1
Functional                                0                                     0
                                    1                                                     1
                                                                        0
                            0
Rankings                                                                        0
                                    0     0                                               0
                                                  Propagation of                0
                                    0                                                     1
                                          1
Web Spam                                           bits using the               1
                                    0                                                     1
                                          1
                                                 “OR” operation                 1
                                    0                                                     1
                                          0
Web Spam
Detection
                                                                            1
                                        Target
                                0                                                   Count bits set
Topological Web                                                             0
                                         page
                                0                                                    to estimate
Spam                                                                        0
                                0                                                    supporters
                                                                            0
                                0
Direct Counting                                                     1
                        1                                                   1
                                1
of Supporters                                                       0
                        0                                                   1
                                1
                                                                    0
                        0
Spam Detection                                                      0
                        0
Results                                                             1
                        1
                                                                    0
                        0




                   [Becchetti et al., 2006b] shows an improvement of ANF
                   algorithm [Palmer et al., 2002] based on probabilistic
                   counting [Flajolet and Martin, 1985]
Link Analysis on
                   General algorithm
    the Web



                   Require: N: number of nodes, d: distance, k: bits
Levels of Link
Analysis
                    1: for node : 1 . . . N, bit: 1 . . . k do
Generalizing
                         INIT(node,bit)
                    2:
PageRank

                    3: end for
Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   General algorithm
    the Web



                   Require: N: number of nodes, d: distance, k: bits
Levels of Link
Analysis
                    1: for node : 1 . . . N, bit: 1 . . . k do
Generalizing
                         INIT(node,bit)
                    2:
PageRank

                    3: end for
Other
Functional
                    4: for distance : 1 . . . d do {Iteration step}
Rankings

                         Aux ← 0k
Web Spam            5:
                         for src : 1 . . . N do {Follow links in the graph}
Web Spam
                    6:
Detection
                            for all links from src to dest do
                    7:
Topological Web
                               Aux[dest] ← Aux[dest] OR V[src,·]
Spam
                    8:
Direct Counting
                            end for
                    9:
of Supporters
                         end for
                   10:
Spam Detection
Results
                         V ← Aux
                   11:
                   12: end for
Link Analysis on
                   General algorithm
    the Web



                   Require: N: number of nodes, d: distance, k: bits
Levels of Link
Analysis
                    1: for node : 1 . . . N, bit: 1 . . . k do
Generalizing
                         INIT(node,bit)
                    2:
PageRank

                    3: end for
Other
Functional
                    4: for distance : 1 . . . d do {Iteration step}
Rankings

                         Aux ← 0k
Web Spam            5:
                         for src : 1 . . . N do {Follow links in the graph}
Web Spam
                    6:
Detection
                            for all links from src to dest do
                    7:
Topological Web
                               Aux[dest] ← Aux[dest] OR V[src,·]
Spam
                    8:
Direct Counting
                            end for
                    9:
of Supporters
                         end for
                   10:
Spam Detection
Results
                         V ← Aux
                   11:
                   12: end for
                   13: for node: 1 . . . N do {Estimate supporters}
                         Supporters[node] ← ESTIMATE( V[node,·] )
                   14:
                   15: end for
                   16: return Supporters
Link Analysis on
                   Our estimator
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Initialize all bits to one with probability
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Our estimator
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Initialize all bits to one with probability
Rankings
                                                                 ones(node)
                   Estimator: neighbors(node) = log(1− ) 1 −
Web Spam
                                                                      k
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Our estimator
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
                   Initialize all bits to one with probability
Rankings
                                                                 ones(node)
                   Estimator: neighbors(node) = log(1− ) 1 −
Web Spam
                                                                      k
Web Spam
Detection
                   Adaptive estimation
Topological Web
Spam
                   Repeat the above process for = 1/2, 1/4, 1/8, . . . , and look
Direct Counting
                   for the transitions from more than (1 − 1/e)k ones to less
of Supporters
                   than (1 − 1/e)k ones.
Spam Detection
Results
Link Analysis on
                   Convergence
    the Web



Levels of Link
Analysis
                                            100%
Generalizing
PageRank
                                            90%
Other
                                            80%
Functional
Rankings
                        Fraction of nodes
                                            70%
                         with estimates
Web Spam

                                            60%
Web Spam
Detection
                                            50%                         d=1
Topological Web
                                                                        d=2
                                            40%
Spam
                                                                        d=3
Direct Counting
                                            30%                         d=4
of Supporters
                                                                        d=5
                                            20%
Spam Detection
                                                                        d=6
Results
                                                                        d=7
                                            10%
                                                                        d=8
                                             0%
                                                   5   10          15   20
                                                       Iteration
Link Analysis on
                   Error rate
    the Web



Levels of Link
Analysis

Generalizing
                                                                                   Ours 64 bits, epsilon−only estimator
PageRank
                                                                                   Ours 64 bits, combined estimator
                                                  0.5
Other
                                                                                   ANF 24 bits × 24 iterations (576 b×i)
                         Average Relative Error
Functional
                                                                                   ANF 24 bits × 48 iterations (1152 b×i)
Rankings

                                                  0.4
Web Spam
                                                                                       960 b×i
Web Spam
                                                                                                             1216 b×i
                                                        512 b×i              832 b×i
Detection                                                                                                               1344 b×i 1408 b×i
                                                                  768 b×i                         1152 b×i
                                                  0.3
Topological Web
Spam

                                                  0.2
Direct Counting                                                                              576 b×i
                                                                  1152 b×i
of Supporters
                                                        512 b×i   768 b×i              960 b×i               1216 b×i   1344 b×i 1408 b×i
                                                                             832 b×i              1152 b×i
Spam Detection
                                                  0.1
Results



                                                   0
                                                        1           2          3         4             5       6          7            8
                                                                                       Distance
Link Analysis on
                   Hosts at distance 4
    the Web



Levels of Link
                                                            δ = 0.39
                              Hosts at Distance Exactly 4
Analysis

                        0.4
Generalizing
                                                                   Normal
PageRank

                                                                   Spam
Other
Functional
Rankings
                        0.3
Web Spam

Web Spam
Detection

Topological Web
                        0.2
Spam

Direct Counting
of Supporters

Spam Detection
                        0.1
Results




                         0
                          1                     100         1000
                                             S4 − S3
Link Analysis on
                   Minimum change of supporters
    the Web



Levels of Link
                                                                   δ = 0.39
                              Minimum change of supporters
Analysis

Generalizing
PageRank
                                                                     Normal
                        0.4                                          Spam
Other
Functional
Rankings

Web Spam
                        0.3
Web Spam
Detection

Topological Web
Spam
                        0.2
Direct Counting
of Supporters

Spam Detection
                        0.1
Results




                         0
                               1                               5          10
                                        min(S2/S1, S3/S2, S4/S3)
Link Analysis on
    the Web



Levels of Link
Analysis

Generalizing
PageRank
                       Levels of Link Analysis
                   1
Other
                       Generalizing PageRank
                   2
Functional
                       Other Functional Rankings
Rankings           3
                       Web Spam
                   4
Web Spam

                       Web Spam Detection
                   5
Web Spam
Detection
                       Topological Web Spam
                   6
Topological Web
                       Direct Counting of Supporters
                   7
Spam
                       Spam Detection Results
                   8
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Detection rates
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   60% (UK-2006) – 80% (UK-2002) of detection rate, with
Other
Functional
                   4%–2% error rate by combining different
Rankings

                   attributes [Becchetti et al., 2006a].
Web Spam

Web Spam
                     X   No magic bullet in link analysis
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Detection rates
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   60% (UK-2006) – 80% (UK-2002) of detection rate, with
Other
Functional
                   4%–2% error rate by combining different
Rankings

                   attributes [Becchetti et al., 2006a].
Web Spam

Web Spam
                     X   No magic bullet in link analysis
Detection


                     X
Topological Web
                         Precision still low compared to e-mail spam filters
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Detection rates
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   60% (UK-2006) – 80% (UK-2002) of detection rate, with
Other
Functional
                   4%–2% error rate by combining different
Rankings

                   attributes [Becchetti et al., 2006a].
Web Spam

Web Spam
                     X   No magic bullet in link analysis
Detection


                     X
Topological Web
                         Precision still low compared to e-mail spam filters
Spam

                     V   Measure both home page and max. PageRank page
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Detection rates
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   60% (UK-2006) – 80% (UK-2002) of detection rate, with
Other
Functional
                   4%–2% error rate by combining different
Rankings

                   attributes [Becchetti et al., 2006a].
Web Spam

Web Spam
                     X   No magic bullet in link analysis
Detection


                     X
Topological Web
                         Precision still low compared to e-mail spam filters
Spam

                     V   Measure both home page and max. PageRank page
Direct Counting
of Supporters
                     V   Host-based counts of neighbors are important
Spam Detection
Results
Link Analysis on
                   Detection rates
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   60% (UK-2006) – 80% (UK-2002) of detection rate, with
Other
Functional
                   4%–2% error rate by combining different
Rankings

                   attributes [Becchetti et al., 2006a].
Web Spam

Web Spam
                     X   No magic bullet in link analysis
Detection


                     X
Topological Web
                         Precision still low compared to e-mail spam filters
Spam

                     V   Measure both home page and max. PageRank page
Direct Counting
of Supporters
                     V   Host-based counts of neighbors are important
Spam Detection
Results
Link Analysis on
                   Detection rates
    the Web



Levels of Link
Analysis

Generalizing
PageRank

                   60% (UK-2006) – 80% (UK-2002) of detection rate, with
Other
Functional
                   4%–2% error rate by combining different
Rankings

                   attributes [Becchetti et al., 2006a].
Web Spam

Web Spam
                     X   No magic bullet in link analysis
Detection


                     X
Topological Web
                         Precision still low compared to e-mail spam filters
Spam

                     V   Measure both home page and max. PageRank page
Direct Counting
of Supporters
                     V   Host-based counts of neighbors are important
Spam Detection
Results
                   Next step: combine link analysis and content analysis
Link Analysis on
                   Upcoming Web Spam Challenge on UK-2006
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                      We asked 20+ volunteers to clasify entire hosts
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Upcoming Web Spam Challenge on UK-2006
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                      We asked 20+ volunteers to clasify entire hosts
Web Spam

Web Spam
                      We provided several examples
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Upcoming Web Spam Challenge on UK-2006
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                      We asked 20+ volunteers to clasify entire hosts
Web Spam

Web Spam
                      We provided several examples
Detection

                      Asked to classify normal / borderline / spam
Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Upcoming Web Spam Challenge on UK-2006
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

                      We asked 20+ volunteers to clasify entire hosts
Web Spam

Web Spam
                      We provided several examples
Detection

                      Asked to classify normal / borderline / spam
Topological Web
Spam

                      Do they agree? Mostly . . .
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Agreement between humans
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Result: first public Web Spam collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                       Public spam collection
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Result: first public Web Spam collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                       Public spam collection
Functional
Rankings
                           Web graph with ∼80 million pages
Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Result: first public Web Spam collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                       Public spam collection
Functional
Rankings
                           Web graph with ∼80 million pages
Web Spam
                           ∼11,000 hosts
Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Result: first public Web Spam collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                       Public spam collection
Functional
Rankings
                           Web graph with ∼80 million pages
Web Spam
                           ∼11,000 hosts
Web Spam
                           Labels for ∼4,000 hosts by at least 2 humans each
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Result: first public Web Spam collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                       Public spam collection
Functional
Rankings
                           Web graph with ∼80 million pages
Web Spam
                           ∼11,000 hosts
Web Spam
                           Labels for ∼4,000 hosts by at least 2 humans each
Detection

Topological Web
                       Upcoming Web Spam challenge
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Result: first public Web Spam collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                       Public spam collection
Functional
Rankings
                           Web graph with ∼80 million pages
Web Spam
                           ∼11,000 hosts
Web Spam
                           Labels for ∼4,000 hosts by at least 2 humans each
Detection

Topological Web
                       Upcoming Web Spam challenge
Spam
                           Machine learning
Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
                   Result: first public Web Spam collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                       Public spam collection
Functional
Rankings
                           Web graph with ∼80 million pages
Web Spam
                           ∼11,000 hosts
Web Spam
                           Labels for ∼4,000 hosts by at least 2 humans each
Detection

Topological Web
                       Upcoming Web Spam challenge
Spam
                           Machine learning
Direct Counting
of Supporters              Information retrieval
Spam Detection
Results
Link Analysis on
                   Result: first public Web Spam collection
    the Web



Levels of Link
Analysis

Generalizing
PageRank

Other
                       Public spam collection
Functional
Rankings
                           Web graph with ∼80 million pages
Web Spam
                           ∼11,000 hosts
Web Spam
                           Labels for ∼4,000 hosts by at least 2 humans each
Detection

Topological Web
                       Upcoming Web Spam challenge
Spam
                           Machine learning
Direct Counting
of Supporters              Information retrieval
Spam Detection
                       webspam-announces-subscribe@yahoogroups.com
Results
Link Analysis on
    the Web



Levels of Link
                   Thank you!
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
    the Web



Levels of Link
                   Thank you!
Analysis

Generalizing
PageRank

Other
Functional
Rankings

Web Spam

Web Spam
Detection

Topological Web
Spam

Direct Counting
of Supporters

Spam Detection
Results
Link Analysis on
    the Web
                   Baeza-Yates, R., Boldi, P., and Castillo, C. (2006a).
                   Generalizing pagerank: Damping functions for link-based
Levels of Link
Analysis
                   ranking algorithms.
Generalizing
                   In Proceedings of ACM SIGIR, pages 308–315, Seattle,
PageRank

                   Washington, USA. ACM Press.
Other
Functional
Rankings
                   Baeza-Yates, R., Castillo, C., and Efthimiadis, E. (2006b).
Web Spam
                   Characterization of national web domains.
Web Spam
Detection
                   To appear in ACM TOIT.
Topological Web
Spam
                   Baeza-Yates, R. and Poblete, B. (2006).
Direct Counting
of Supporters
                   Dynamics of the chilean web structure.
Spam Detection
                   Comput. Networks, 50(10):1464–1473.
Results


                   Barab´si, A.-L. (2002).
                        a
                   Linked: The New Science of Networks.
                   Perseus Books Group.
Link Analysis on
    the Web
                   Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and
                   Baeza-Yates, R. (2006a).
Levels of Link
                   Link-based characterization and detection of Web Spam.
Analysis

Generalizing
                   In Second International Workshop on Adversarial Information
PageRank
                   Retrieval on the Web (AIRWeb), Seattle, USA.
Other
Functional
Rankings
                   Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and
Web Spam
                   Baeza-Yates, R. (2006b).
Web Spam
                   Using rank propagation and probabilistic counting for
Detection
                   link-based spam detection.
Topological Web
Spam
                   In Proceedings of the Workshop on Web Mining and Web
Direct Counting
                   Usage Analysis (WebKDD), Pennsylvania, USA. ACM Press.
of Supporters

Spam Detection
                   Bencz´r, A. A., Csalog´ny, K., Sarl´s, T., and Uher, M.
                        u                a            o
Results

                   (2005).
                   Spamrank: fully automatic link spam detection.
                   In Proceedings of the First International Workshop on
                   Adversarial Information Retrieval on the Web, Chiba, Japan.
Link Analysis on
    the Web

                   Boldi, P., Santini, M., and Vigna, S. (2005).
                   Pagerank as a function of the damping factor.
Levels of Link
Analysis
                   In Proceedings of the 14th international conference on World
Generalizing
                   Wide Web, pages 557–566, Chiba, Japan. ACM Press.
PageRank

Other
Functional
                   Broder, A., Kumar, R., Maghoul, F., Raghavan, P.,
Rankings
                   Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J.
Web Spam
                   (2000).
Web Spam
Detection
                   Graph structure in the web: Experiments and models.
Topological Web
                   In Proceedings of the Ninth Conference on World Wide Web,
Spam
                   pages 309–320, Amsterdam, Netherlands. ACM Press.
Direct Counting
of Supporters

                   Fetterly, D., Manasse, M., and Najork, M. (2004).
Spam Detection
Results
                   Spam, damn spam, and statistics: Using statistical analysis to
                   locate spam web pages.
                   In Proceedings of the seventh workshop on the Web and
                   databases (WebDB), pages 1–6, Paris, France.
Link Analysis on
                   Flajolet, P. and Martin, N. G. (1985).
    the Web

                   Probabilistic counting algorithms for data base applications.
Levels of Link
                   Journal of Computer and System Sciences, 31(2):182–209.
Analysis

Generalizing
                   Gibson, D., Kumar, R., and Tomkins, A. (2005).
PageRank

Other
                   Discovering large dense subgraphs in massive graphs.
Functional
Rankings
                   In VLDB ’05: Proceedings of the 31st international conference
Web Spam
                   on Very large data bases, pages 721–732. VLDB Endowment.
Web Spam
Detection
                   Gy¨ngyi, Z., Molina, H. G., and Pedersen, J. (2004).
                     o
Topological Web
                   Combating web spam with trustrank.
Spam

Direct Counting
                   In Proceedings of the Thirtieth International Conference on
of Supporters
                   Very Large Data Bases (VLDB), pages 576–587, Toronto,
Spam Detection
                   Canada. Morgan Kaufmann.
Results


                   Newman, M. E., Strogatz, S. H., and Watts, D. J. (2001).
                   Random graphs with arbitrary degree distributions and their
                   applications.
                   Phys Rev E Stat Nonlin Soft Matter Phys, 64(2 Pt 2).
Link Analysis on
    the Web



Levels of Link
Analysis

                   Palmer, C. R., Gibbons, P. B., and Faloutsos, C. (2002).
Generalizing
PageRank
                   ANF: a fast and scalable tool for data mining in massive
Other
Functional
                   graphs.
Rankings
                   In Proceedings of the eighth ACM SIGKDD international
Web Spam
                   conference on Knowledge discovery and data mining, pages
Web Spam
Detection
                   81–90, New York, NY, USA. ACM Press.
Topological Web
Spam
                   Tauro, L., Palmer, C., Siganos, G., and Faloutsos, M. (2001).
Direct Counting
                   A simple conceptual model for the internet topology.
of Supporters

Spam Detection
                   In Global Internet, San Antonio, Texas, USA. IEEE CS Press.
Results

Weitere ähnliche Inhalte

Andere mochten auch

The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?Kundan Bhaduri
 
Page rank and hyperlink
Page rank and hyperlink Page rank and hyperlink
Page rank and hyperlink Silicon
 
Page rank algorithm
Page rank algorithmPage rank algorithm
Page rank algorithmJunghoon Kim
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithmsAnkit Raj
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explainedjdhaar
 

Andere mochten auch (7)

Seo and page rank algorithm
Seo and page rank algorithmSeo and page rank algorithm
Seo and page rank algorithm
 
The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?
 
Page rank and hyperlink
Page rank and hyperlink Page rank and hyperlink
Page rank and hyperlink
 
Page rank algorithm
Page rank algorithmPage rank algorithm
Page rank algorithm
 
Google PageRank
Google PageRankGoogle PageRank
Google PageRank
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explained
 

Ähnlich wie MAC-MES-MIC Views of Link Analysis

Algorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConAlgorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConmattthemathman
 
Mumsnet - BlogFest 2012 - Matt Bennett - SEO
Mumsnet - BlogFest 2012 - Matt Bennett - SEOMumsnet - BlogFest 2012 - Matt Bennett - SEO
Mumsnet - BlogFest 2012 - Matt Bennett - SEOMatt Bennett
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
 
Diagnose SEO Issues with Live Search Webmaster Tools
Diagnose SEO Issues with Live Search Webmaster ToolsDiagnose SEO Issues with Live Search Webmaster Tools
Diagnose SEO Issues with Live Search Webmaster ToolsNathan Buggia
 
Seojocktoberfest - link health audits and organic traffic recovery - Scott McLay
Seojocktoberfest - link health audits and organic traffic recovery - Scott McLaySeojocktoberfest - link health audits and organic traffic recovery - Scott McLay
Seojocktoberfest - link health audits and organic traffic recovery - Scott McLayYard
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学Xu jiakon
 
Sample SEO presentation for clients
Sample SEO presentation for clientsSample SEO presentation for clients
Sample SEO presentation for clientsSiddu Hosageri
 
Rusland Gleb Kaplun
Rusland Gleb KaplunRusland Gleb Kaplun
Rusland Gleb Kaplundorrit
 
Lrt top 10
Lrt top 10Lrt top 10
Lrt top 10Guru99
 
Using Rank Propagation for Spam Detection (WebKDD 2006)
Using Rank Propagation for Spam Detection (WebKDD 2006)Using Rank Propagation for Spam Detection (WebKDD 2006)
Using Rank Propagation for Spam Detection (WebKDD 2006)Carlos Castillo (ChaTo)
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsIOSRjournaljce
 
Analyzing the effectivess_and_coverage_of_web_app_scanners
Analyzing the effectivess_and_coverage_of_web_app_scannersAnalyzing the effectivess_and_coverage_of_web_app_scanners
Analyzing the effectivess_and_coverage_of_web_app_scannersLarry Suto
 
Getting the Most out of Linkscape
Getting the Most out of LinkscapeGetting the Most out of Linkscape
Getting the Most out of LinkscapeNick Gerner
 
Authority Building vs. Link Building - SMX Advanced Seattle
Authority Building vs. Link Building - SMX Advanced SeattleAuthority Building vs. Link Building - SMX Advanced Seattle
Authority Building vs. Link Building - SMX Advanced SeattleAdvanced Web Ranking
 

Ähnlich wie MAC-MES-MIC Views of Link Analysis (20)

Algorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConAlgorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozCon
 
Mumsnet - BlogFest 2012 - Matt Bennett - SEO
Mumsnet - BlogFest 2012 - Matt Bennett - SEOMumsnet - BlogFest 2012 - Matt Bennett - SEO
Mumsnet - BlogFest 2012 - Matt Bennett - SEO
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
 
Diagnose SEO Issues with Live Search Webmaster Tools
Diagnose SEO Issues with Live Search Webmaster ToolsDiagnose SEO Issues with Live Search Webmaster Tools
Diagnose SEO Issues with Live Search Webmaster Tools
 
TrustRank.PDF
TrustRank.PDFTrustRank.PDF
TrustRank.PDF
 
Mazhiming
MazhimingMazhiming
Mazhiming
 
Seojocktoberfest - link health audits and organic traffic recovery - Scott McLay
Seojocktoberfest - link health audits and organic traffic recovery - Scott McLaySeojocktoberfest - link health audits and organic traffic recovery - Scott McLay
Seojocktoberfest - link health audits and organic traffic recovery - Scott McLay
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学
 
Link checker tools
Link checker toolsLink checker tools
Link checker tools
 
Sample SEO presentation for clients
Sample SEO presentation for clientsSample SEO presentation for clients
Sample SEO presentation for clients
 
Rusland Gleb Kaplun
Rusland Gleb KaplunRusland Gleb Kaplun
Rusland Gleb Kaplun
 
Lrt top 10
Lrt top 10Lrt top 10
Lrt top 10
 
Using Rank Propagation for Spam Detection (WebKDD 2006)
Using Rank Propagation for Spam Detection (WebKDD 2006)Using Rank Propagation for Spam Detection (WebKDD 2006)
Using Rank Propagation for Spam Detection (WebKDD 2006)
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
Analyzing the effectivess_and_coverage_of_web_app_scanners
Analyzing the effectivess_and_coverage_of_web_app_scannersAnalyzing the effectivess_and_coverage_of_web_app_scanners
Analyzing the effectivess_and_coverage_of_web_app_scanners
 
Getting the Most out of Linkscape
Getting the Most out of LinkscapeGetting the Most out of Linkscape
Getting the Most out of Linkscape
 
I04015559
I04015559I04015559
I04015559
 
Page Rank Link Farm Detection
Page Rank Link Farm DetectionPage Rank Link Farm Detection
Page Rank Link Farm Detection
 
Authority Building vs. Link Building - SMX Advanced Seattle
Authority Building vs. Link Building - SMX Advanced SeattleAuthority Building vs. Link Building - SMX Advanced Seattle
Authority Building vs. Link Building - SMX Advanced Seattle
 
17 00 distil rami
17 00 distil rami17 00 distil rami
17 00 distil rami
 

Mehr von Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

Mehr von Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

MAC-MES-MIC Views of Link Analysis

  • 1. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Other Link Analysis on the Web Functional Rankings The big picture, the small picture and the medium-sized picture Web Spam Web Spam Detection Ricardo Baeza-Yates3,4 Topological Web Spam Joint work with: L. Becchetti1 , P. Boldi2 , C. Castillo1,3 , Direct Counting D. Donato1,3 , S. Leonardi1 , B. Poblete5 of Supporters Spam Detection Results 1. Universit` di Roma “La Sapienza” – Rome, Italy a 2. Univerit` degli Studi di Milano – Milan, Italy a 3. Yahoo! Research Barcelona – Catalunya, Spain 4. Yahoo! Research Latin America – Santiago, Chile 5. Universitat Pompeu Fabra – Catalunya, Spain
  • 2. Link Analysis on the Web Levels of Link Analysis 1 Levels of Link Analysis Generalizing PageRank 2 Generalizing PageRank Other Other Functional Rankings 3 Functional Rankings Web Spam Web Spam 4 Web Spam Detection Web Spam Detection Topological Web 5 Spam Direct Counting of Supporters Topological Web Spam 6 Spam Detection Results Direct Counting of Supporters 7 Spam Detection Results 8
  • 3. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Levels of Link Analysis 1 Other Generalizing PageRank 2 Functional Other Functional Rankings Rankings 3 Web Spam 4 Web Spam Web Spam Detection 5 Web Spam Detection Topological Web Spam 6 Topological Web Direct Counting of Supporters 7 Spam Spam Detection Results 8 Direct Counting of Supporters Spam Detection Results
  • 4. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 5. Link Analysis on How to find meaningful patterns? the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Several levels of analysis: Web Spam Web Spam Macroscopic view: overall structure Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 6. Link Analysis on How to find meaningful patterns? the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Several levels of analysis: Web Spam Web Spam Macroscopic view: overall structure Detection Microscopic view: nodes Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 7. Link Analysis on How to find meaningful patterns? the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Several levels of analysis: Web Spam Web Spam Macroscopic view: overall structure Detection Microscopic view: nodes Topological Web Spam Mesoscopic view: regions Direct Counting of Supporters Spam Detection Results
  • 8. Link Analysis on Macroscopic view, e.g. Bow-tie the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results [Broder et al., 2000]
  • 9. Link Analysis on Macroscopic view, e.g. Bow-tie, migration the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results [Baeza-Yates and Poblete, 2006]
  • 10. Link Analysis on Macroscopic view, e.g. Jellyfish the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results [Tauro et al., 2001] - Internet Autonomous Systems (AS) Topology
  • 11. Link Analysis on Macroscopic view, e.g. Jellyfish the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 12. Link Analysis on Microscopic view, e.g. Degree the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results [Barab´si, 2002] and others a
  • 13. Link Analysis on Microscopic view, e.g. Degree the Web Greece Chile Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Spain Korea Topological Web Spam Direct Counting of Supporters Spam Detection Results [Baeza-Yates et al., 2006b] - compares this distribution in 8 countries . . . guess what is the result?
  • 14. Link Analysis on Mesoscopic view, e.g. Hop-plot the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 15. Link Analysis on Mesoscopic view, e.g. Hop-plot the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 16. Link Analysis on Mesoscopic view, e.g. Hop-plot the Web Levels of Link Analysis .it (40M pages) .uk (18M pages) Generalizing 0.3 0.3 PageRank Other 0.2 0.2 Frequency Frequency Functional Rankings 0.1 0.1 Web Spam Web Spam 0.0 0.0 5 10 15 20 25 30 5 10 15 20 25 30 Detection Distance Distance Topological Web .eu.int (800K pages) Synthetic graph (100K pages) Spam Direct Counting 0.3 0.3 of Supporters Spam Detection 0.2 0.2 Frequency Frequency Results 0.1 0.1 0.0 0.0 5 10 15 20 25 30 5 10 15 20 25 30 Distance Distance [Baeza-Yates et al., 2006a]
  • 17. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 18. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Levels of Link Analysis 1 Other Generalizing PageRank 2 Functional Other Functional Rankings Rankings 3 Web Spam 4 Web Spam Web Spam Detection 5 Web Spam Detection Topological Web Spam 6 Topological Web Direct Counting of Supporters 7 Spam Spam Detection Results 8 Direct Counting of Supporters Spam Detection Results
  • 19. Link Analysis on Notation the Web Levels of Link Analysis Generalizing Let PN×N be the normalized link matrix of a graph PageRank Row-normalized Other Functional Rankings No “sinks” Web Spam Definition (PageRank) Web Spam Detection Stationary state of: Topological Web Spam (1 − α) Direct Counting αP + 1N×N of Supporters N Spam Detection Results
  • 20. Link Analysis on Notation the Web Levels of Link Analysis Generalizing Let PN×N be the normalized link matrix of a graph PageRank Row-normalized Other Functional Rankings No “sinks” Web Spam Definition (PageRank) Web Spam Detection Stationary state of: Topological Web Spam (1 − α) Direct Counting αP + 1N×N of Supporters N Spam Detection Results Follow links with probability α Random jump with probability 1 − α
  • 21. Link Analysis on Explicit Formulas the Web Levels of Link Analysis Generalizing PageRank Formulas for PageRank Other Functional [Newman et al., 2001, Boldi et al., 2005] Rankings Web Spam ∞ (1 − α) Web Spam (αP)t . r(α) = Detection N t=0 Topological Web Spam (1 − α)α|p| Direct Counting ri (α) = branching(p) of Supporters N Spam Detection p∈Path(−,i) Results
  • 22. Link Analysis on Explicit Formulas the Web Levels of Link Analysis Generalizing PageRank Formulas for PageRank Other Functional [Newman et al., 2001, Boldi et al., 2005] Rankings Web Spam ∞ (1 − α) Web Spam (αP)t . r(α) = Detection N t=0 Topological Web Spam (1 − α)α|p| Direct Counting ri (α) = branching(p) of Supporters N Spam Detection p∈Path(−,i) Results Path(−, i) are incoming paths in node i
  • 23. Link Analysis on Branching contribution the Web Levels of Link Analysis Generalizing PageRank Definition (Branching contribution of a path) Other Functional Given a path p = x1 , x2 , . . . , xt of length t = |p| Rankings Web Spam 1 branching(p) = Web Spam d1 d2 · · · dt−1 Detection Topological Web where di are the out-degrees of the members of the path Spam Direct Counting For every node i and every length t of Supporters Spam Detection Results branching(p) = 1. p∈Path(i,−),|p|=t
  • 24. Link Analysis on Functional ranking the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings General functional ranking [Baeza-Yates et al., 2006a] Web Spam Web Spam damping(|p|) Detection ri (α) = branching(p) N Topological Web p∈Path(−,i) Spam Direct Counting PageRank is a particular case of path-based ranking of Supporters Spam Detection Results
  • 25. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Levels of Link Analysis 1 Other Generalizing PageRank 2 Functional Other Functional Rankings Rankings 3 Web Spam 4 Web Spam Web Spam Detection 5 Web Spam Detection Topological Web Spam 6 Topological Web Direct Counting of Supporters 7 Spam Spam Detection Results 8 Direct Counting of Supporters Spam Detection Results
  • 26. Link Analysis on Exponential damping = PageRank the Web Levels of Link 0.30 Analysis damping(t) with α=0.8 damping(t) with α=0.7 Generalizing PageRank Other 0.20 Functional Weight Rankings Web Spam Web Spam 0.10 Detection Topological Web Spam Direct Counting 0.00 of Supporters 1 2 345678 9 10 Spam Detection Length of the path (t) Results Exponential damping = PageRank damping(t) = α(1 − α)t Most of the contribution is on the first few levels.
  • 27. Link Analysis on Linear damping the Web 0.30 Levels of Link damping(t) with L=15 Analysis damping(t) with L=10 Generalizing PageRank 0.20 Other Functional Weight Rankings Web Spam 0.10 Web Spam Detection Topological Web Spam 0.00 Direct Counting of Supporters 1 2 345678 9 10 Spam Detection Length of the path (t) Results Linear damping 2(L−t) t<L L(L+1) damping(t) = t≥L 0
  • 28. Link Analysis on Example: Calculating LinearRank the Web Levels of Link Analysis Generalizing PageRank For calculating LinearRank we use: Other Functional Rankings ∞ 1 Web Spam damping(t)Pt LinearRank = N Web Spam t=0 Detection L−1 Topological Web 2(L − t) t 1 Spam = P N L(L + 1) Direct Counting t=0 of Supporters Spam Detection Results
  • 29. Link Analysis on Example: Calculating LinearRank the Web Levels of Link Analysis Generalizing PageRank For calculating LinearRank we use: Other Functional Rankings ∞ 1 Web Spam damping(t)Pt LinearRank = N Web Spam t=0 Detection L−1 Topological Web 2(L − t) t 1 Spam = P N L(L + 1) Direct Counting t=0 of Supporters Spam Detection Results However, we cannot hold the temporary Pt in memory!
  • 30. Link Analysis on Re-write the damping as a recursion the Web Levels of Link Analysis Generalizing PageRank We have to rewrite to be able to calculate: Other Functional 2 Rankings R(0) = Web Spam L+1 Web Spam (L − k − 1) (k) Detection R(k+1) = RP (L − k) Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 31. Link Analysis on Re-write the damping as a recursion the Web Levels of Link Analysis Generalizing PageRank We have to rewrite to be able to calculate: Other Functional 2 Rankings R(0) = Web Spam L+1 Web Spam (L − k − 1) (k) Detection R(k+1) = RP (L − k) Topological Web Spam L−1 Direct Counting R(k) LinearRank = of Supporters Spam Detection k=0 Results
  • 32. Link Analysis on Re-write the damping as a recursion the Web Levels of Link Analysis Generalizing PageRank We have to rewrite to be able to calculate: Other Functional 2 Rankings R(0) = Web Spam L+1 Web Spam (L − k − 1) (k) Detection R(k+1) = RP (L − k) Topological Web Spam L−1 Direct Counting R(k) LinearRank = of Supporters Spam Detection k=0 Results Now we can give the algorithm . . .
  • 33. Link Analysis on Algorithm the Web Levels of Link for i : 1 . . . N do {Initialization} 1: Analysis 2 Score[i] ← R[i] ← L+1 2: Generalizing PageRank 3: end for Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 34. Link Analysis on Algorithm the Web Levels of Link for i : 1 . . . N do {Initialization} 1: Analysis 2 Score[i] ← R[i] ← L+1 2: Generalizing PageRank end for 3: Other for k : 1 . . . L − 1 do {Iteration step} 4: Functional Rankings Aux ← 0 5: Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 35. Link Analysis on Algorithm the Web Levels of Link for i : 1 . . . N do {Initialization} 1: Analysis 2 Score[i] ← R[i] ← L+1 2: Generalizing PageRank end for 3: Other for k : 1 . . . L − 1 do {Iteration step} 4: Functional Rankings Aux ← 0 5: Web Spam for i : 1 . . . N do {Follow links in the graph} 6: Web Spam for all j such that there is a link from i to j do 7: Detection Aux[j] ← Aux[j] + R[i]/outdegree(i) Topological Web 8: Spam end for 9: Direct Counting end for of Supporters 10: Spam Detection Results
  • 36. Link Analysis on Algorithm the Web Levels of Link for i : 1 . . . N do {Initialization} 1: Analysis 2 Score[i] ← R[i] ← L+1 2: Generalizing PageRank end for 3: Other for k : 1 . . . L − 1 do {Iteration step} 4: Functional Rankings Aux ← 0 5: Web Spam for i : 1 . . . N do {Follow links in the graph} 6: Web Spam for all j such that there is a link from i to j do 7: Detection Aux[j] ← Aux[j] + R[i]/outdegree(i) Topological Web 8: Spam end for 9: Direct Counting end for of Supporters 10: for i : 1 . . . N do {Add to ranking value} Spam Detection 11: Results R[i] ← Aux[i] × (L−k−1) 12: (L−k) Score[i] ← Score[i] + R[i] 13: end for 14: end for 15: return Score 16:
  • 37. Link Analysis on Algorithm (general) the Web Levels of Link for i : 1 . . . N do {Initialization} 1: Analysis Score[i] ← R[i] ← INIT 2: Generalizing PageRank end for 3: Other for k : 1 . . . STOP do {Iteration step} 4: Functional Rankings Aux ← 0 5: Web Spam for i : 1 . . . N do {Follow links in the graph} 6: Web Spam for all j such that there is a link from i to j do Detection 7: Aux[j] ← Aux[j] + R[i]/outdegree(i) Topological Web 8: Spam end for 9: Direct Counting of Supporters end for 10: Spam Detection for i : 1 . . . N do {Add to ranking value} 11: Results R[i] ← Aux[i] × FACTOR 12: Score[i] ← Score[i] + R[i] 13: end for 14: end for 15: return Score 16:
  • 38. Link Analysis on Other damping functions the Web Levels of Link Analysis Empirical damping: Generalizing PageRank 0.7 Other Functional Rankings Average text similarity 0.6 Web Spam Web Spam 0.5 Detection Topological Web Spam 0.4 Direct Counting of Supporters 0.3 Spam Detection Results 0.2 1 2 3 4 5 Link distance
  • 39. Link Analysis on Using LinearRank to approximage PageRank the Web Levels of Link Analysis Generalizing PageRank Other Functional Experimental comparison: 18-million nodes in the U.K. Web Rankings Web Spam Graph Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 40. Link Analysis on Using LinearRank to approximage PageRank the Web Levels of Link Analysis Generalizing PageRank Other Functional Experimental comparison: 18-million nodes in the U.K. Web Rankings Web Spam Graph Web Spam Calculated PageRank with α = 0.1, 0.2, . . . , 0.9 Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 41. Link Analysis on Using LinearRank to approximage PageRank the Web Levels of Link Analysis Generalizing PageRank Other Functional Experimental comparison: 18-million nodes in the U.K. Web Rankings Web Spam Graph Web Spam Calculated PageRank with α = 0.1, 0.2, . . . , 0.9 Detection Topological Web Calculated LinearRank with L = 5, 10, . . . , 25 Spam Direct Counting of Supporters Spam Detection Results
  • 42. Link Analysis on Using LinearRank to approximage PageRank the Web Levels of Link Analysis Generalizing PageRank Other Functional Experimental comparison: 18-million nodes in the U.K. Web Rankings Web Spam Graph Web Spam Calculated PageRank with α = 0.1, 0.2, . . . , 0.9 Detection Topological Web Calculated LinearRank with L = 5, 10, . . . , 25 Spam For certain combinations of parameters, the rankings are Direct Counting of Supporters almost equal! Spam Detection Results
  • 43. Link Analysis on Experimental comparison the Web Levels of Link Analysis Experimental Comparison in the U.K. Web Graph Generalizing PageRank Other Functional 1.00 Rankings 0.95 Web Spam τ 0.90 Web Spam Detection 0.85 τ ≥ 0.95 Topological Web 0.80 Spam Direct Counting of Supporters 25 Spam Detection 20 Results 0.9 15 L 0.8 10 0.7 α 0.6 5 0.5
  • 44. Link Analysis on Prediction of best parameter combination the Web Levels of Link Analysis Prediction of Best Parameter Combinations (Analysis) Generalizing PageRank 25 Actual optimum Other Predicted optimum with length=5 Functional Rankings L that maximizes Kendall’s τ 20 Web Spam Web Spam Detection 15 Topological Web Spam 10 Direct Counting of Supporters Spam Detection Results 5 0.5 0.6 0.7 0.8 0.9 Exponent α
  • 45. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Levels of Link Analysis 1 Other Generalizing PageRank 2 Functional Other Functional Rankings Rankings 3 Web Spam 4 Web Spam Web Spam Detection 5 Web Spam Detection Topological Web Spam 6 Topological Web Direct Counting of Supporters 7 Spam Spam Detection Results 8 Direct Counting of Supporters Spam Detection Results
  • 46. Link Analysis on What is on the Web? the Web Information Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 47. Link Analysis on What is on the Web? the Web Information + Porn Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 48. Link Analysis on What is on the Web? the Web Information + Porn + On-line casinos + Free movies + Levels of Link Analysis Cheap software + Buy a MBA diploma + Prescription -free Generalizing drugs + V!-4-gra + Get rich now now now!!! PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results Graphic: www.milliondollarhomepage.com
  • 49. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 50. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Keyword stuffing Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 51. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Keyword stuffing Functional Rankings Link farms Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 52. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Keyword stuffing Functional Rankings Link farms Web Spam Scraper, “Made for Advertising” sites Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 53. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Keyword stuffing Functional Rankings Link farms Web Spam Scraper, “Made for Advertising” sites Web Spam Spam blogs (splogs) Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 54. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Keyword stuffing Functional Rankings Link farms Web Spam Scraper, “Made for Advertising” sites Web Spam Spam blogs (splogs) Detection Cloaking Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 55. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Keyword stuffing Functional Rankings Link farms Web Spam Scraper, “Made for Advertising” sites Web Spam Spam blogs (splogs) Detection Cloaking Topological Web Spam Click spam Direct Counting of Supporters Spam Detection Results
  • 56. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Keyword stuffing Functional Rankings Link farms Web Spam Scraper, “Made for Advertising” sites Web Spam Spam blogs (splogs) Detection Cloaking Topological Web Spam Click spam Direct Counting of Supporters Spam Detection Results
  • 57. Link Analysis on Opportunities for Web spam the Web Levels of Link Analysis Generalizing PageRank V Spamdexing Other Keyword stuffing Functional Rankings Link farms Web Spam Scraper, “Made for Advertising” sites Web Spam Spam blogs (splogs) Detection Cloaking Topological Web Spam Click spam Direct Counting of Supporters Adversarial relationship Spam Detection Results Every undeserved gain in ranking for a spammer, is a loss of precision for the search engine.
  • 58. Link Analysis on Typical Web Spam (1) the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 59. Link Analysis on Typical Web Spam (2) the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 60. Link Analysis on Hidden text the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 61. Link Analysis on Made for Advertising (1) the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 62. Link Analysis on Made for Advertising (2) the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 63. Link Analysis on Made for Advertising (3) the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 64. Link Analysis on Search engine? the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 65. Link Analysis on Fake search engine the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 66. Link Analysis on Problem: “normal” pages that are spam the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 67. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Levels of Link Analysis 1 Other Generalizing PageRank 2 Functional Other Functional Rankings Rankings 3 Web Spam 4 Web Spam Web Spam Detection 5 Web Spam Detection Topological Web Spam 6 Topological Web Direct Counting of Supporters 7 Spam Spam Detection Results 8 Direct Counting of Supporters Spam Detection Results
  • 68. Link Analysis on Machine Learning the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 69. Link Analysis on Machine Learning (cont.) the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 70. Link Analysis on Feature Extraction the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 71. Link Analysis on Challenges: Machine Learning the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Machine Learning Challenges: Web Spam Web Spam Learning with inter dependent variables (graph) Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 72. Link Analysis on Challenges: Machine Learning the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Machine Learning Challenges: Web Spam Web Spam Learning with inter dependent variables (graph) Detection Learning with few examples Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 73. Link Analysis on Challenges: Machine Learning the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Machine Learning Challenges: Web Spam Web Spam Learning with inter dependent variables (graph) Detection Learning with few examples Topological Web Spam Scalability Direct Counting of Supporters Spam Detection Results
  • 74. Link Analysis on Challenges: Information Retrieval the Web Levels of Link Analysis Generalizing PageRank Other Functional Information Retrieval Challenges: Rankings Feature extraction: which features? Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 75. Link Analysis on Challenges: Information Retrieval the Web Levels of Link Analysis Generalizing PageRank Other Functional Information Retrieval Challenges: Rankings Feature extraction: which features? Web Spam Web Spam Feature aggregation: page/host/domain Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 76. Link Analysis on Challenges: Information Retrieval the Web Levels of Link Analysis Generalizing PageRank Other Functional Information Retrieval Challenges: Rankings Feature extraction: which features? Web Spam Web Spam Feature aggregation: page/host/domain Detection Topological Web Feature propagation (graph) Spam Direct Counting of Supporters Spam Detection Results
  • 77. Link Analysis on Challenges: Information Retrieval the Web Levels of Link Analysis Generalizing PageRank Other Functional Information Retrieval Challenges: Rankings Feature extraction: which features? Web Spam Web Spam Feature aggregation: page/host/domain Detection Topological Web Feature propagation (graph) Spam Recall/precision tradeoffs Direct Counting of Supporters Spam Detection Results
  • 78. Link Analysis on Challenges: Information Retrieval the Web Levels of Link Analysis Generalizing PageRank Other Functional Information Retrieval Challenges: Rankings Feature extraction: which features? Web Spam Web Spam Feature aggregation: page/host/domain Detection Topological Web Feature propagation (graph) Spam Recall/precision tradeoffs Direct Counting of Supporters Scalability Spam Detection Results
  • 79. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Levels of Link Analysis 1 Other Generalizing PageRank 2 Functional Other Functional Rankings Rankings 3 Web Spam 4 Web Spam Web Spam Detection 5 Web Spam Detection Topological Web Spam 6 Topological Web Direct Counting of Supporters 7 Spam Spam Detection Results 8 Direct Counting of Supporters Spam Detection Results
  • 80. Link Analysis on Topological spam: link farms the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 81. Link Analysis on Topological spam: link farms the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results Single-level farms can be detected by searching groups of nodes sharing their out-links [Gibson et al., 2005]
  • 82. Link Analysis on Motivation the Web Levels of Link Analysis Generalizing PageRank Other Functional [Fetterly et al., 2004] hypothesized that studying the Rankings distribution of statistics about pages could be a good way of Web Spam Web Spam detecting spam pages: Detection Topological Web “in a number of these distributions, outlier values are Spam Direct Counting associated with web spam” of Supporters Spam Detection Results
  • 83. Link Analysis on Test collection the Web Levels of Link Analysis Generalizing PageRank U.K. collection Other Functional Rankings 18.5 million pages downloaded from the .UK domain Web Spam 5,344 hosts manually classified (6% of the hosts) Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 84. Link Analysis on Test collection the Web Levels of Link Analysis Generalizing PageRank U.K. collection Other Functional Rankings 18.5 million pages downloaded from the .UK domain Web Spam 5,344 hosts manually classified (6% of the hosts) Web Spam Detection Topological Web Spam Direct Counting Classified entire hosts: of Supporters Spam Detection V A few hosts are mixed: spam and non-spam pages Results X More coverage: sample covers 32% of the pages
  • 85. Link Analysis on In-degree the Web δ = 0.35 In−degree Levels of Link Analysis Generalizing Normal PageRank 0.4 Spam Other Functional Rankings 0.3 Web Spam Web Spam Detection Topological Web 0.2 Spam Direct Counting of Supporters Spam Detection 0.1 Results 0 1 100 10000 Number of in−links (δ = max. difference in C.D.F. plot)
  • 86. Link Analysis on Out-degree the Web Levels of Link δ = 0.28 Out−degree Analysis 0.3 Generalizing Normal PageRank Spam Other Functional Rankings Web Spam 0.2 Web Spam Detection Topological Web Spam Direct Counting of Supporters 0.1 Spam Detection Results 0 1 10 50 100 Number of out−links
  • 87. Link Analysis on Edge reciprocity the Web Levels of Link δ = 0.35 Reciprocity of max. PR page Analysis 0.5 Generalizing Normal PageRank Spam Other Functional 0.4 Rankings Web Spam Web Spam 0.3 Detection Topological Web Spam 0.2 Direct Counting of Supporters Spam Detection Results 0.1 0 0 0.2 0.4 0.6 0.8 1 Fraction of reciprocal links
  • 88. Link Analysis on Assortativity the Web Levels of Link δ = 0.31 Degree / Degree of neighbors Analysis Generalizing 0.4 PageRank Normal Spam Other Functional Rankings 0.3 Web Spam Web Spam Detection Topological Web 0.2 Spam Direct Counting of Supporters Spam Detection 0.1 Results 0 0.001 0.01 0.1 1 10 100 1000 Degree/Degree ratio of home page
  • 89. Link Analysis on Variance of PageRank the Web Suggested in [Bencz´r et al., 2005] u Levels of Link Analysis Generalizing PageRank PageRank PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 90. Link Analysis on Variance of PageRank of in-neighbors the Web Levels of Link Stdev. of PR of Neighbors (Home) δ = 0.41 Analysis Generalizing PageRank Normal Spam Other 0.3 Functional Rankings Web Spam Web Spam Detection 0.2 Topological Web Spam Direct Counting of Supporters 0.1 Spam Detection Results 0 0 0.2 0.4 0.6 0.8 1 σ2 of the logarithm of PageRank
  • 91. Link Analysis on TrustRank the Web Levels of Link Analysis Generalizing PageRank Other TrustRank [Gy¨ngyi et al., 2004] o Functional Rankings A node with high PageRank, but far away from a core set of Web Spam “trusted nodes” is suspicious Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 92. Link Analysis on TrustRank the Web Levels of Link Analysis Generalizing PageRank Other TrustRank [Gy¨ngyi et al., 2004] o Functional Rankings A node with high PageRank, but far away from a core set of Web Spam “trusted nodes” is suspicious Web Spam Detection Start from a set of trusted nodes, then do a random walk, Topological Web Spam returning to the set of trusted nodes with probability 1 − α at Direct Counting each step of Supporters Spam Detection Results i Trusted nodes: data from http://www.dmoz.org/
  • 93. Link Analysis on TrustRank Idea the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 94. Link Analysis on TrustRank score the Web Levels of Link δ = 0.59 Analysis TrustRank score of home page Generalizing PageRank Normal 0.4 Spam Other Functional Rankings Web Spam 0.3 Web Spam Detection Topological Web Spam 0.2 Direct Counting of Supporters Spam Detection 0.1 Results 0 1e−06 0.001 TrustRank
  • 95. Link Analysis on TrustRank / PageRank the Web Levels of Link δ = 0.59 Analysis Estimated relative non−spam mass Generalizing PageRank Normal 0.8 Spam Other Functional 0.7 Rankings Web Spam 0.6 Web Spam 0.5 Detection Topological Web 0.4 Spam Direct Counting 0.3 of Supporters Spam Detection 0.2 Results 0.1 0 0.3 1 10 100 TrustRank score/PageRank
  • 96. Link Analysis on Truncated PageRank the Web Levels of Link Analysis Generalizing Proposed in [Becchetti et al., 2006b]. Idea: reduce the direct PageRank contribution of the first levels of links: Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection t≤T 0 Results damping(t) = C αt t>T
  • 97. Link Analysis on Truncated PageRank the Web Levels of Link Analysis Generalizing Proposed in [Becchetti et al., 2006b]. Idea: reduce the direct PageRank contribution of the first levels of links: Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection t≤T 0 Results damping(t) = C αt t>T V No extra reading of the graph after PageRank
  • 98. Link Analysis on Truncated PageRank(T=2) / PageRank the Web Levels of Link Analysis TruncatedPageRank T=2 / PageRank δ = 0.30 Generalizing PageRank Normal Other Spam 0.3 Functional Rankings Web Spam Web Spam Detection 0.2 Topological Web Spam Direct Counting of Supporters 0.1 Spam Detection Results 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 TruncatedPageRank(T=2) / PageRank
  • 99. Link Analysis on Max. change of Truncated PageRank the Web Levels of Link Analysis Maximum change of Truncated PageRank δ = 0.29 Generalizing PageRank Normal Other Spam Functional Rankings 0.2 Web Spam Web Spam Detection Topological Web Spam 0.1 Direct Counting of Supporters Spam Detection Results 0 0.85 0.9 0.95 1 1.05 1.1 max(TrPRi+1/TrPri)
  • 100. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Levels of Link Analysis 1 Other Generalizing PageRank 2 Functional Other Functional Rankings Rankings 3 Web Spam 4 Web Spam Web Spam Detection 5 Web Spam Detection Topological Web Spam 6 Topological Web Direct Counting of Supporters 7 Spam Spam Detection Results 8 Direct Counting of Supporters Spam Detection Results
  • 101. Link Analysis on High and low-ranked pages are different the Web 4 Levels of Link x 10 Analysis Top 0%−10% 12 Generalizing Top 40%−50% PageRank Top 60%−70% Other 10 Number of Nodes Functional Rankings 8 Web Spam Web Spam Detection 6 Topological Web Spam 4 Direct Counting of Supporters 2 Spam Detection Results 0 1 5 10 15 20 Distance
  • 102. Link Analysis on High and low-ranked pages are different the Web 4 Levels of Link x 10 Analysis Top 0%−10% 12 Generalizing Top 40%−50% PageRank Top 60%−70% Other 10 Number of Nodes Functional Rankings 8 Web Spam Web Spam Detection 6 Topological Web Spam 4 Direct Counting of Supporters 2 Spam Detection Results 0 1 5 10 15 20 Distance Areas below the curves are equal if we are in the same strongly-connected component
  • 103. Link Analysis on Probabilistic counting the Web Levels of Link Analysis 1 1 Generalizing 0 0 PageRank 0 0 0 0 Other 0 1 1 1 1 1 Functional 0 0 1 1 0 0 Rankings 0 0 0 0 Propagation of 0 0 1 1 Web Spam bits using the 1 0 1 1 “OR” operation 1 0 1 0 Web Spam Detection 1 Target 0 Count bits set Topological Web 0 page 0 to estimate Spam 0 0 supporters 0 0 Direct Counting 1 1 1 1 of Supporters 0 0 1 1 0 0 Spam Detection 0 0 Results 1 1 0 0
  • 104. Link Analysis on Probabilistic counting the Web Levels of Link Analysis 1 1 Generalizing 0 0 PageRank 0 0 0 0 Other 0 1 1 1 1 1 Functional 0 0 1 1 0 0 Rankings 0 0 0 0 Propagation of 0 0 1 1 Web Spam bits using the 1 0 1 1 “OR” operation 1 0 1 0 Web Spam Detection 1 Target 0 Count bits set Topological Web 0 page 0 to estimate Spam 0 0 supporters 0 0 Direct Counting 1 1 1 1 of Supporters 0 0 1 1 0 0 Spam Detection 0 0 Results 1 1 0 0 [Becchetti et al., 2006b] shows an improvement of ANF algorithm [Palmer et al., 2002] based on probabilistic counting [Flajolet and Martin, 1985]
  • 105. Link Analysis on General algorithm the Web Require: N: number of nodes, d: distance, k: bits Levels of Link Analysis 1: for node : 1 . . . N, bit: 1 . . . k do Generalizing INIT(node,bit) 2: PageRank 3: end for Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 106. Link Analysis on General algorithm the Web Require: N: number of nodes, d: distance, k: bits Levels of Link Analysis 1: for node : 1 . . . N, bit: 1 . . . k do Generalizing INIT(node,bit) 2: PageRank 3: end for Other Functional 4: for distance : 1 . . . d do {Iteration step} Rankings Aux ← 0k Web Spam 5: for src : 1 . . . N do {Follow links in the graph} Web Spam 6: Detection for all links from src to dest do 7: Topological Web Aux[dest] ← Aux[dest] OR V[src,·] Spam 8: Direct Counting end for 9: of Supporters end for 10: Spam Detection Results V ← Aux 11: 12: end for
  • 107. Link Analysis on General algorithm the Web Require: N: number of nodes, d: distance, k: bits Levels of Link Analysis 1: for node : 1 . . . N, bit: 1 . . . k do Generalizing INIT(node,bit) 2: PageRank 3: end for Other Functional 4: for distance : 1 . . . d do {Iteration step} Rankings Aux ← 0k Web Spam 5: for src : 1 . . . N do {Follow links in the graph} Web Spam 6: Detection for all links from src to dest do 7: Topological Web Aux[dest] ← Aux[dest] OR V[src,·] Spam 8: Direct Counting end for 9: of Supporters end for 10: Spam Detection Results V ← Aux 11: 12: end for 13: for node: 1 . . . N do {Estimate supporters} Supporters[node] ← ESTIMATE( V[node,·] ) 14: 15: end for 16: return Supporters
  • 108. Link Analysis on Our estimator the Web Levels of Link Analysis Generalizing PageRank Other Functional Initialize all bits to one with probability Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 109. Link Analysis on Our estimator the Web Levels of Link Analysis Generalizing PageRank Other Functional Initialize all bits to one with probability Rankings ones(node) Estimator: neighbors(node) = log(1− ) 1 − Web Spam k Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 110. Link Analysis on Our estimator the Web Levels of Link Analysis Generalizing PageRank Other Functional Initialize all bits to one with probability Rankings ones(node) Estimator: neighbors(node) = log(1− ) 1 − Web Spam k Web Spam Detection Adaptive estimation Topological Web Spam Repeat the above process for = 1/2, 1/4, 1/8, . . . , and look Direct Counting for the transitions from more than (1 − 1/e)k ones to less of Supporters than (1 − 1/e)k ones. Spam Detection Results
  • 111. Link Analysis on Convergence the Web Levels of Link Analysis 100% Generalizing PageRank 90% Other 80% Functional Rankings Fraction of nodes 70% with estimates Web Spam 60% Web Spam Detection 50% d=1 Topological Web d=2 40% Spam d=3 Direct Counting 30% d=4 of Supporters d=5 20% Spam Detection d=6 Results d=7 10% d=8 0% 5 10 15 20 Iteration
  • 112. Link Analysis on Error rate the Web Levels of Link Analysis Generalizing Ours 64 bits, epsilon−only estimator PageRank Ours 64 bits, combined estimator 0.5 Other ANF 24 bits × 24 iterations (576 b×i) Average Relative Error Functional ANF 24 bits × 48 iterations (1152 b×i) Rankings 0.4 Web Spam 960 b×i Web Spam 1216 b×i 512 b×i 832 b×i Detection 1344 b×i 1408 b×i 768 b×i 1152 b×i 0.3 Topological Web Spam 0.2 Direct Counting 576 b×i 1152 b×i of Supporters 512 b×i 768 b×i 960 b×i 1216 b×i 1344 b×i 1408 b×i 832 b×i 1152 b×i Spam Detection 0.1 Results 0 1 2 3 4 5 6 7 8 Distance
  • 113. Link Analysis on Hosts at distance 4 the Web Levels of Link δ = 0.39 Hosts at Distance Exactly 4 Analysis 0.4 Generalizing Normal PageRank Spam Other Functional Rankings 0.3 Web Spam Web Spam Detection Topological Web 0.2 Spam Direct Counting of Supporters Spam Detection 0.1 Results 0 1 100 1000 S4 − S3
  • 114. Link Analysis on Minimum change of supporters the Web Levels of Link δ = 0.39 Minimum change of supporters Analysis Generalizing PageRank Normal 0.4 Spam Other Functional Rankings Web Spam 0.3 Web Spam Detection Topological Web Spam 0.2 Direct Counting of Supporters Spam Detection 0.1 Results 0 1 5 10 min(S2/S1, S3/S2, S4/S3)
  • 115. Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Levels of Link Analysis 1 Other Generalizing PageRank 2 Functional Other Functional Rankings Rankings 3 Web Spam 4 Web Spam Web Spam Detection 5 Web Spam Detection Topological Web Spam 6 Topological Web Direct Counting of Supporters 7 Spam Spam Detection Results 8 Direct Counting of Supporters Spam Detection Results
  • 116. Link Analysis on Detection rates the Web Levels of Link Analysis Generalizing PageRank 60% (UK-2006) – 80% (UK-2002) of detection rate, with Other Functional 4%–2% error rate by combining different Rankings attributes [Becchetti et al., 2006a]. Web Spam Web Spam X No magic bullet in link analysis Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 117. Link Analysis on Detection rates the Web Levels of Link Analysis Generalizing PageRank 60% (UK-2006) – 80% (UK-2002) of detection rate, with Other Functional 4%–2% error rate by combining different Rankings attributes [Becchetti et al., 2006a]. Web Spam Web Spam X No magic bullet in link analysis Detection X Topological Web Precision still low compared to e-mail spam filters Spam Direct Counting of Supporters Spam Detection Results
  • 118. Link Analysis on Detection rates the Web Levels of Link Analysis Generalizing PageRank 60% (UK-2006) – 80% (UK-2002) of detection rate, with Other Functional 4%–2% error rate by combining different Rankings attributes [Becchetti et al., 2006a]. Web Spam Web Spam X No magic bullet in link analysis Detection X Topological Web Precision still low compared to e-mail spam filters Spam V Measure both home page and max. PageRank page Direct Counting of Supporters Spam Detection Results
  • 119. Link Analysis on Detection rates the Web Levels of Link Analysis Generalizing PageRank 60% (UK-2006) – 80% (UK-2002) of detection rate, with Other Functional 4%–2% error rate by combining different Rankings attributes [Becchetti et al., 2006a]. Web Spam Web Spam X No magic bullet in link analysis Detection X Topological Web Precision still low compared to e-mail spam filters Spam V Measure both home page and max. PageRank page Direct Counting of Supporters V Host-based counts of neighbors are important Spam Detection Results
  • 120. Link Analysis on Detection rates the Web Levels of Link Analysis Generalizing PageRank 60% (UK-2006) – 80% (UK-2002) of detection rate, with Other Functional 4%–2% error rate by combining different Rankings attributes [Becchetti et al., 2006a]. Web Spam Web Spam X No magic bullet in link analysis Detection X Topological Web Precision still low compared to e-mail spam filters Spam V Measure both home page and max. PageRank page Direct Counting of Supporters V Host-based counts of neighbors are important Spam Detection Results
  • 121. Link Analysis on Detection rates the Web Levels of Link Analysis Generalizing PageRank 60% (UK-2006) – 80% (UK-2002) of detection rate, with Other Functional 4%–2% error rate by combining different Rankings attributes [Becchetti et al., 2006a]. Web Spam Web Spam X No magic bullet in link analysis Detection X Topological Web Precision still low compared to e-mail spam filters Spam V Measure both home page and max. PageRank page Direct Counting of Supporters V Host-based counts of neighbors are important Spam Detection Results Next step: combine link analysis and content analysis
  • 122. Link Analysis on Upcoming Web Spam Challenge on UK-2006 the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings We asked 20+ volunteers to clasify entire hosts Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 123. Link Analysis on Upcoming Web Spam Challenge on UK-2006 the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings We asked 20+ volunteers to clasify entire hosts Web Spam Web Spam We provided several examples Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 124. Link Analysis on Upcoming Web Spam Challenge on UK-2006 the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings We asked 20+ volunteers to clasify entire hosts Web Spam Web Spam We provided several examples Detection Asked to classify normal / borderline / spam Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 125. Link Analysis on Upcoming Web Spam Challenge on UK-2006 the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings We asked 20+ volunteers to clasify entire hosts Web Spam Web Spam We provided several examples Detection Asked to classify normal / borderline / spam Topological Web Spam Do they agree? Mostly . . . Direct Counting of Supporters Spam Detection Results
  • 126. Link Analysis on Agreement between humans the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 127. Link Analysis on Result: first public Web Spam collection the Web Levels of Link Analysis Generalizing PageRank Other Public spam collection Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 128. Link Analysis on Result: first public Web Spam collection the Web Levels of Link Analysis Generalizing PageRank Other Public spam collection Functional Rankings Web graph with ∼80 million pages Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 129. Link Analysis on Result: first public Web Spam collection the Web Levels of Link Analysis Generalizing PageRank Other Public spam collection Functional Rankings Web graph with ∼80 million pages Web Spam ∼11,000 hosts Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 130. Link Analysis on Result: first public Web Spam collection the Web Levels of Link Analysis Generalizing PageRank Other Public spam collection Functional Rankings Web graph with ∼80 million pages Web Spam ∼11,000 hosts Web Spam Labels for ∼4,000 hosts by at least 2 humans each Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 131. Link Analysis on Result: first public Web Spam collection the Web Levels of Link Analysis Generalizing PageRank Other Public spam collection Functional Rankings Web graph with ∼80 million pages Web Spam ∼11,000 hosts Web Spam Labels for ∼4,000 hosts by at least 2 humans each Detection Topological Web Upcoming Web Spam challenge Spam Direct Counting of Supporters Spam Detection Results
  • 132. Link Analysis on Result: first public Web Spam collection the Web Levels of Link Analysis Generalizing PageRank Other Public spam collection Functional Rankings Web graph with ∼80 million pages Web Spam ∼11,000 hosts Web Spam Labels for ∼4,000 hosts by at least 2 humans each Detection Topological Web Upcoming Web Spam challenge Spam Machine learning Direct Counting of Supporters Spam Detection Results
  • 133. Link Analysis on Result: first public Web Spam collection the Web Levels of Link Analysis Generalizing PageRank Other Public spam collection Functional Rankings Web graph with ∼80 million pages Web Spam ∼11,000 hosts Web Spam Labels for ∼4,000 hosts by at least 2 humans each Detection Topological Web Upcoming Web Spam challenge Spam Machine learning Direct Counting of Supporters Information retrieval Spam Detection Results
  • 134. Link Analysis on Result: first public Web Spam collection the Web Levels of Link Analysis Generalizing PageRank Other Public spam collection Functional Rankings Web graph with ∼80 million pages Web Spam ∼11,000 hosts Web Spam Labels for ∼4,000 hosts by at least 2 humans each Detection Topological Web Upcoming Web Spam challenge Spam Machine learning Direct Counting of Supporters Information retrieval Spam Detection webspam-announces-subscribe@yahoogroups.com Results
  • 135. Link Analysis on the Web Levels of Link Thank you! Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 136. Link Analysis on the Web Levels of Link Thank you! Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results
  • 137. Link Analysis on the Web Baeza-Yates, R., Boldi, P., and Castillo, C. (2006a). Generalizing pagerank: Damping functions for link-based Levels of Link Analysis ranking algorithms. Generalizing In Proceedings of ACM SIGIR, pages 308–315, Seattle, PageRank Washington, USA. ACM Press. Other Functional Rankings Baeza-Yates, R., Castillo, C., and Efthimiadis, E. (2006b). Web Spam Characterization of national web domains. Web Spam Detection To appear in ACM TOIT. Topological Web Spam Baeza-Yates, R. and Poblete, B. (2006). Direct Counting of Supporters Dynamics of the chilean web structure. Spam Detection Comput. Networks, 50(10):1464–1473. Results Barab´si, A.-L. (2002). a Linked: The New Science of Networks. Perseus Books Group.
  • 138. Link Analysis on the Web Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and Baeza-Yates, R. (2006a). Levels of Link Link-based characterization and detection of Web Spam. Analysis Generalizing In Second International Workshop on Adversarial Information PageRank Retrieval on the Web (AIRWeb), Seattle, USA. Other Functional Rankings Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and Web Spam Baeza-Yates, R. (2006b). Web Spam Using rank propagation and probabilistic counting for Detection link-based spam detection. Topological Web Spam In Proceedings of the Workshop on Web Mining and Web Direct Counting Usage Analysis (WebKDD), Pennsylvania, USA. ACM Press. of Supporters Spam Detection Bencz´r, A. A., Csalog´ny, K., Sarl´s, T., and Uher, M. u a o Results (2005). Spamrank: fully automatic link spam detection. In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, Chiba, Japan.
  • 139. Link Analysis on the Web Boldi, P., Santini, M., and Vigna, S. (2005). Pagerank as a function of the damping factor. Levels of Link Analysis In Proceedings of the 14th international conference on World Generalizing Wide Web, pages 557–566, Chiba, Japan. ACM Press. PageRank Other Functional Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rankings Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. Web Spam (2000). Web Spam Detection Graph structure in the web: Experiments and models. Topological Web In Proceedings of the Ninth Conference on World Wide Web, Spam pages 309–320, Amsterdam, Netherlands. ACM Press. Direct Counting of Supporters Fetterly, D., Manasse, M., and Najork, M. (2004). Spam Detection Results Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Proceedings of the seventh workshop on the Web and databases (WebDB), pages 1–6, Paris, France.
  • 140. Link Analysis on Flajolet, P. and Martin, N. G. (1985). the Web Probabilistic counting algorithms for data base applications. Levels of Link Journal of Computer and System Sciences, 31(2):182–209. Analysis Generalizing Gibson, D., Kumar, R., and Tomkins, A. (2005). PageRank Other Discovering large dense subgraphs in massive graphs. Functional Rankings In VLDB ’05: Proceedings of the 31st international conference Web Spam on Very large data bases, pages 721–732. VLDB Endowment. Web Spam Detection Gy¨ngyi, Z., Molina, H. G., and Pedersen, J. (2004). o Topological Web Combating web spam with trustrank. Spam Direct Counting In Proceedings of the Thirtieth International Conference on of Supporters Very Large Data Bases (VLDB), pages 576–587, Toronto, Spam Detection Canada. Morgan Kaufmann. Results Newman, M. E., Strogatz, S. H., and Watts, D. J. (2001). Random graphs with arbitrary degree distributions and their applications. Phys Rev E Stat Nonlin Soft Matter Phys, 64(2 Pt 2).
  • 141. Link Analysis on the Web Levels of Link Analysis Palmer, C. R., Gibbons, P. B., and Faloutsos, C. (2002). Generalizing PageRank ANF: a fast and scalable tool for data mining in massive Other Functional graphs. Rankings In Proceedings of the eighth ACM SIGKDD international Web Spam conference on Knowledge discovery and data mining, pages Web Spam Detection 81–90, New York, NY, USA. ACM Press. Topological Web Spam Tauro, L., Palmer, C., Siganos, G., and Faloutsos, M. (2001). Direct Counting A simple conceptual model for the internet topology. of Supporters Spam Detection In Global Internet, San Antonio, Texas, USA. IEEE CS Press. Results