There is a vast empirical research on the behaviour of ranking algorithms, e.g. Google PageRank, in scale-free networks. In this talk, we address this problem by analytical probabilistic methods. In particular, it is well-known that the PageRank in scale-free networks follows a power law with the same exponent as in-degree. Recent probabilistic analysis has provided an explanation for this phenomenon by obtaining a natural approximation for PageRank based on stochastic fixed-point equations. For these equations, explicit solutions can be constructed on weighted branching trees, and their tail behavior can be described in great detail.
In this talk we present a model for generating directed random graphs with prescribed degree distributions where we can prove that the PageRank of a randomly chosen node does indeed converge to the solution of the corresponding fixed-point equation as the number of nodes in the graph grows to infinity. The proof of this result is based on classical random graph coupling techniques combined with the now extensive literature on the behavior of branching recursions on trees.
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks
1. Asymptotic behaviour of ranking
algorithms in directed random
networks
Nelly Litvak
University of Twente, The Netherlands
joint work with
Mariana Olvera-Cravioto and Ningyuan Chen
Workshop on Extremal Graph Theory
Moscow, 06-06-2014
2. Power law of PageRank
Pandurangan, Raghavan, Upfal, 2002.
[ Nelly Litvak, SOR group ] 2/25
3. Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...
[ Nelly Litvak, SOR group ] 3/25
4. Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...
degree of the node = # (in-/out-) links
[fraction nodes degree at least k] = pk,
Power law: pk ≈ const · k−α, α > 0.
Power law is the model for high variability: some nodes (hubs)
have extremely many connections
[ Nelly Litvak, SOR group ] 3/25
5. Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...
degree of the node = # (in-/out-) links
[fraction nodes degree at least k] = pk,
Power law: pk ≈ const · k−α, α > 0.
Power law is the model for high variability: some nodes (hubs)
have extremely many connections
log pk = log(const) − α log k
[ Nelly Litvak, SOR group ] 3/25
6. Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...
degree of the node = # (in-/out-) links
[fraction nodes degree at least k] = pk,
Power law: pk ≈ const · k−α, α > 0.
Power law is the model for high variability: some nodes (hubs)
have extremely many connections
log pk = log(const) − α log k
Straight line on the log-log scale
[ Nelly Litvak, SOR group ] 3/25
7. Regular variation
X is regularly varying random variable with index α
P(X > x) = L(x)x−α
, x > 0
L(x) is slowly varying:
for every t > 0, L(tx)/L(x) → 1 as x → ∞
[ Nelly Litvak, SOR group ] 4/25
8. Google PageRank
S. Brin, L. Page, The anatomy of a large-scale hypertextual
Web search engine (1998)
[ Nelly Litvak, SOR group ] 5/25
9. Google PageRank
S. Brin, L. Page, The anatomy of a large-scale hypertextual
Web search engine (1998)
PageRank Ri of page i = 1, . . . , n is defined as a stationary
distribution of a random walk with jumps:
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
dj = # out-links of page j
c ∈ (0, 1), originally 0.85, probability of a random jump
bi probability to jump to page i, originally, bi = 1/n
personalized PageRank: bi = 1/n
[ Nelly Litvak, SOR group ] 5/25
10. Google PageRank
S. Brin, L. Page, The anatomy of a large-scale hypertextual
Web search engine (1998)
PageRank Ri of page i = 1, . . . , n is defined as a stationary
distribution of a random walk with jumps:
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
dj = # out-links of page j
c ∈ (0, 1), originally 0.85, probability of a random jump
bi probability to jump to page i, originally, bi = 1/n
personalized PageRank: bi = 1/n
[ Nelly Litvak, SOR group ] 5/25
11. Examples of applications
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
Topic-sensitive search (Haveliwala, 2002);
Spam detection (Gy¨ongyi et al., 2004)
Finding related entities (Chakrabarti, 2007);
Link prediction (Liben-Nowell and Kleinberg, 2003;
Voevodski, Teng, Xia, 2009);
Finding local cuts (Andersen, Chung, Lang, 2006);
Graph clustering (Tsiatas, Chung, 2010);
Person name disambiguation
(Smirnova, Avrachenkov, Trousse, 2010);
Finding most influential people in Wikipedia
(Shepelyansky et al, 2010, 2013)
[ Nelly Litvak, SOR group ] 6/25
12. Stochastic model for PageRank
Rescale: Ri → nRi , bi → nbi
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
[ Nelly Litvak, SOR group ] 7/25
13. Stochastic model for PageRank
Rescale: Ri → nRi , bi → nbi
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
Stochastic equation:
R
d
= c
N
j=1
1
Dj
Rj + cp0 + (1 − c)B
N: in-degree of the randomly chosen page
D: out-degree of page that links to the randomly chosen page
p0: fraction of pages with out-degree zero
Rj is distributed as R; N, D, Rj are independent; N and B can
be dependent
We can denote Q = cp0 + (1 − c)B, Cj = c/Dj .
[ Nelly Litvak, SOR group ] 7/25
14. Results for stochastic recursion
R
d
=
N
j=1
Cj Rj + Q
Theorem (Volkovich&L 2010)
If P(B > x) = o(P(N > x)), then the following are equivalent:
P(N > x) ∼ x−αN LN(x) as x → ∞,
P(R > x) ∼ cNx−αN LN(x) as x → ∞,
where cN = (E(c/D))αN [1 − E(N)E((C)αN )]−1
[ Nelly Litvak, SOR group ] 8/25
15. Power Law behaviour of PageRank
Data for Web, Wikipedia and Preferential Attachment graph
[ Nelly Litvak, SOR group ] 9/25
16. Results for stochastic recursion
R
d
=
N
j=1
Cj Rj + Q
Series of papers Olvera-Cravioto& Jelenkovic 2010, 2012,
Olvera-Cravioto 2012 analyzed the recursion in details using
sample path large deviation and implicit renewal theory.
Tail behaviour of R is obtained under most general
assumptions on Cj ’s
R can be heavy-tailed even when N is light-tailed.
[ Nelly Litvak, SOR group ] 10/25
17. Recursion on a graph
So far we, in fact, consider recursion on a tree
Will similar results hold on a particular graph structure?
Some graphs are tree-like (Thorny Branching Process, TBP)
[ Nelly Litvak, SOR group ] 11/25
18. Directed configuration model
Directed graph on n nodes V = {v1, . . . , vn}.
In-degree and out-degree:
mi = in-degree of node vi = number of edges pointing to vi .
di = out-degree of node vi = number of edges pointing from
vi .
(m, d) = ({mi }, {di }) is called a bi-degree-sequence.
Target distributions:
In-degree: F = (fk : k = 0, 1, 2, . . . ), and
Out-degree: G = (gk : k = 0, 1, 2, . . . ).
[ Nelly Litvak, SOR group ] 12/25
19. Assumptions on the target distributions
Suppose further that for some α, β 2,
F(x) =
k>x
fk x−α
LF (x)
and
G(x) =
k>x
gk x−β
LG (x),
for all x 0, where LF (·) and LG (·) are slowly varying.
Assume both F and G have finite variance.
[ Nelly Litvak, SOR group ] 13/25
20. The bi-degree sequence (Chen&Olvera-Cravioto, 2012)
1 Fix 0 < δ0 < 1 − θ, θ = max{α−1, β−1, 1/2}.
2 Sample {γ1, . . . , γn} i.i.d. from F; let Γn = n
i=1 γi .
3 Sample {ξ1, . . . , ξn} i.i.d. from G; let Ξn = n
i=1 ξi .
4 Let ∆n = Γn − Ξn. If |∆n| nθ+δ0 go to step 5; otherwise go
to step 2.
5 Choose randomly |∆n| nodes S = {i1, i2, . . . , i|∆n|} without
replacement and let
Ni = γi + τi , Di = ξi + χi , i = 1, 2, . . . , n,
where
χi =
1 if ∆n 0 and i ∈ S,
0 otherwise,
and
τi =
1 if ∆n < 0 and i ∈ S,
0 otherwise.
[ Nelly Litvak, SOR group ] 14/25
21. Constructing the graph
Using the bi-degree-sequence (N, D) for the in- and
out-degrees:
assign to each node vi a number mi of inbound stubs and a
number di of outbound stubs;
pair outbound stubs to inbound stubs to form directed edges
by matching to each inbound stub an outbound stub chosen
uniformly at random from the set of unpaired outbound stubs.
proceed in the same way for all remaining unpaired inbound
stubs, i.e., choose uniformly from the set of unpaired outbound
stubs and draw the corresponding directed edge.
The result is a multigraph (e.g., with self-loops and multiple
edges in the same direction) on nodes {v1, . . . , vn}.
[ Nelly Litvak, SOR group ] 15/25
22. PageRank in directed configuration model
Ci = ζi /Di , where {ζi } is a sequence of i.i.d. random variables
independent of (N, D) (ζi = c in a classical case)
M = M(n) ∈ Rn×n is related to the adjacency matrix of the
graph:
Mi,j =
sij Ci , if there are sij edges from i to j,
0, otherwise.
Q ∈ Rn is a personalization vector
We are interested in one coordinate, R1, of the vector R ∈ Rn
defined by
R = RM + Q
[ Nelly Litvak, SOR group ] 16/25
23. Matrix iterations
R(n,0)
= B,
R(n,1)
= R(n,0)
M + Q = BM + Q,
R(n,2)
= R(n,1)
M + Q = BM2
+ QM + Q,
R(n,3)
= R(n,2)
M + Q = BM3
+ QM2
+ QM + Q,
...
R(n,k)
=
k−1
i=0
QMi
+ BMk
, k 1.
We are interested in analyzing P(R
(n,∞)
1 > x), x → ∞.
[ Nelly Litvak, SOR group ] 17/25
24. Idea of the analysis
ˆR
(n,k)
1 – PageRank on a perfect branching tree
R – solution of the equation
R
d
=
γ
i=1
Cj Rj + Q
We will try to prove the following: for any fixed t ∈ R, and a
randomly chosen node v,
P(R
(n,∞)
1 t) ≈ P(R
(n,k)
1 t) ≈ P( ˆR
(n,k)
1 t) ≈ P(R t)
for large enough n, k.
[ Nelly Litvak, SOR group ] 18/25
25. Idea of the analysis
If we prove that for some k = k(n) → ∞ and any > 0,
(Matrix Iterations) P R
(n,∞)
1 − R
(n,k)
1 > → 0,
(1)
(Coupling with branching tree) P R
(n,k)
1 − ˆR
(n,k)
1 > → 0,
(2)
(Limiting solution) P ˆR
(n,k)
1 − R > → 0,
(3)
as n → ∞, then it will follow, by Slutsky’s lemma, that
R
(n,∞)
1 ⇒ R(∞)
as n → ∞, where ⇒ denotes convergence in distribution.
[ Nelly Litvak, SOR group ] 19/25
26. Coupling with branching tree
We start with random node (node 1) and explore its
neighbours, labeling the stubs that we have already seen
τ – the number of generations of WBP completed before
coupling breaks
[ Nelly Litvak, SOR group ] 20/25
27. Coupling with branching tree
Lemma
Let τ be the number of generations of the TBP that we are able to
complete before we draw the first stub that has already been
observed before. Then, for any 0 < < 1/2, and
a = (1/2 − )/ log m, where m = E[N]
P(τ a log n) = O n− /2
as n → ∞.
[ Nelly Litvak, SOR group ] 21/25
28. Combining with matrix iteration
P R
(n,∞)
1 − R
(n,k)
1 > ckKn = o(1)
We need ckn = o(1) for some k < τ
Combining this with Lemma 2, we get the main result
[ Nelly Litvak, SOR group ] 22/25
29. Main result
Let n be the number of nodes in the random graph, and let N
and D be r.v.s having the in-degree and effective out-degree
distributions, resp.
Let R(n) be the rank vector computed on the graph with n
nodes.
Theorem: (Chen, L, Olvera-Cravioto, 2014) Suppose
0 < c < 1/(E[N])2, then
R1(n) ⇒ R, n → ∞,
where R is the solution to the fixed point equation
R
d
= q + c
N
i=1
Ri
Di
.
[ Nelly Litvak, SOR group ] 23/25
30. Work in progress
Relaxing conditions on c: better bounds for τ and the matrix
iterations
So far, finite variance assumption
The result probably will not hold for all c ∈ (0, 1).
The PageRank must converge for all c < 1. Will we obtain
the same power law but with different factor?
[ Nelly Litvak, SOR group ] 24/25