Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Asymptotic behaviour of ranking
algorithms in directed random
networks
Nelly Litvak
University of Twente, The Netherlands
joint work with
Mariana Olvera-Cravioto and Ningyuan Chen
Workshop on Extremal Graph Theory
Moscow, 06-06-2014

Power law of PageRank
Pandurangan, Raghavan, Upfal, 2002.
[ Nelly Litvak, SOR group ] 2/25

Power laws in complex networks
Power laws: Internet, WWW, social networks, biological
networks, etc...

networks, etc...
degree of the node = # (in-/out-) links
[fraction nodes degree at least k] = pk,
Power law: pk ≈ const · k−α, α > 0.
Power law is the model for high variability: some nodes (hubs)
have extremely many connections

networks, etc...
log pk = log(const) − α log k

networks, etc...
log pk = log(const) − α log k
Straight line on the log-log scale

Regular variation
X is regularly varying random variable with index α
P(X > x) = L(x)x−α
, x > 0
L(x) is slowly varying:
for every t > 0, L(tx)/L(x) → 1 as x → ∞

Google PageRank
S. Brin, L. Page, The anatomy of a large-scale hypertextual
Web search engine (1998)

Google PageRank
S. Brin, L. Page, The anatomy of a large-scale hypertextual
Web search engine (1998)
PageRank Ri of page i = 1, . . . , n is deﬁned as a stationary
distribution of a random walk with jumps:
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
dj = # out-links of page j
c ∈ (0, 1), originally 0.85, probability of a random jump
bi probability to jump to page i, originally, bi = 1/n
personalized PageRank: bi = 1/n

Examples of applications
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
Topic-sensitive search (Haveliwala, 2002);
Spam detection (Gy¨ongyi et al., 2004)
Finding related entities (Chakrabarti, 2007);
Link prediction (Liben-Nowell and Kleinberg, 2003;
Voevodski, Teng, Xia, 2009);
Finding local cuts (Andersen, Chung, Lang, 2006);
Graph clustering (Tsiatas, Chung, 2010);
Person name disambiguation
(Smirnova, Avrachenkov, Trousse, 2010);
Finding most inﬂuential people in Wikipedia
(Shepelyansky et al, 2010, 2013)

Stochastic model for PageRank
Rescale: Ri → nRi , bi → nbi
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n

Stochastic model for PageRank
Rescale: Ri → nRi , bi → nbi
Ri =
j → i
c
dj
Rj + (1 − c)bi , i = 1, . . . , n
Stochastic equation:
R
d
= c
N
j=1
1
Dj
Rj + cp0 + (1 − c)B
N: in-degree of the randomly chosen page
D: out-degree of page that links to the randomly chosen page
p0: fraction of pages with out-degree zero
Rj is distributed as R; N, D, Rj are independent; N and B can
be dependent
We can denote Q = cp0 + (1 − c)B, Cj = c/Dj .

Results for stochastic recursion
R
d
=
N
j=1
Cj Rj + Q
Theorem (Volkovich&L 2010)
If P(B > x) = o(P(N > x)), then the following are equivalent:
P(N > x) ∼ x−αN LN(x) as x → ∞,
P(R > x) ∼ cNx−αN LN(x) as x → ∞,
where cN = (E(c/D))αN [1 − E(N)E((C)αN )]−1

Power Law behaviour of PageRank
Data for Web, Wikipedia and Preferential Attachment graph

Results for stochastic recursion
R
d
=
N
j=1
Cj Rj + Q
Series of papers Olvera-Cravioto& Jelenkovic 2010, 2012,
Olvera-Cravioto 2012 analyzed the recursion in details using
sample path large deviation and implicit renewal theory.
Tail behaviour of R is obtained under most general
assumptions on Cj ’s
R can be heavy-tailed even when N is light-tailed.

Recursion on a graph
So far we, in fact, consider recursion on a tree
Will similar results hold on a particular graph structure?
Some graphs are tree-like (Thorny Branching Process, TBP)

Directed conﬁguration model
Directed graph on n nodes V = {v1, . . . , vn}.
In-degree and out-degree:
mi = in-degree of node vi = number of edges pointing to vi .
di = out-degree of node vi = number of edges pointing from
vi .
(m, d) = ({mi }, {di }) is called a bi-degree-sequence.
Target distributions:
In-degree: F = (fk : k = 0, 1, 2, . . . ), and
Out-degree: G = (gk : k = 0, 1, 2, . . . ).

Assumptions on the target distributions
Suppose further that for some α, β 2,
F(x) =
k>x
fk x−α
LF (x)
and
G(x) =
k>x
gk x−β
LG (x),
for all x 0, where LF (·) and LG (·) are slowly varying.
Assume both F and G have ﬁnite variance.

The bi-degree sequence (Chen&Olvera-Cravioto, 2012)
1 Fix 0 < δ0 < 1 − θ, θ = max{α−1, β−1, 1/2}.
2 Sample {γ1, . . . , γn} i.i.d. from F; let Γn = n
i=1 γi .
3 Sample {ξ1, . . . , ξn} i.i.d. from G; let Ξn = n
i=1 ξi .
4 Let ∆n = Γn − Ξn. If |∆n| nθ+δ0 go to step 5; otherwise go
to step 2.
5 Choose randomly |∆n| nodes S = {i1, i2, . . . , i|∆n|} without
replacement and let
Ni = γi + τi , Di = ξi + χi , i = 1, 2, . . . , n,
where
χi =
1 if ∆n 0 and i ∈ S,
0 otherwise,
and
τi =
1 if ∆n < 0 and i ∈ S,
0 otherwise.

Constructing the graph
Using the bi-degree-sequence (N, D) for the in- and
out-degrees:
assign to each node vi a number mi of inbound stubs and a
number di of outbound stubs;
pair outbound stubs to inbound stubs to form directed edges
by matching to each inbound stub an outbound stub chosen
uniformly at random from the set of unpaired outbound stubs.
proceed in the same way for all remaining unpaired inbound
stubs, i.e., choose uniformly from the set of unpaired outbound
stubs and draw the corresponding directed edge.
The result is a multigraph (e.g., with self-loops and multiple
edges in the same direction) on nodes {v1, . . . , vn}.

PageRank in directed conﬁguration model
Ci = ζi /Di , where {ζi } is a sequence of i.i.d. random variables
independent of (N, D) (ζi = c in a classical case)
M = M(n) ∈ Rn×n is related to the adjacency matrix of the
graph:
Mi,j =
sij Ci , if there are sij edges from i to j,
0, otherwise.
Q ∈ Rn is a personalization vector
We are interested in one coordinate, R1, of the vector R ∈ Rn
deﬁned by
R = RM + Q

Matrix iterations
R(n,0)
= B,
R(n,1)
= R(n,0)
M + Q = BM + Q,
R(n,2)
= R(n,1)
M + Q = BM2
+ QM + Q,
R(n,3)
= R(n,2)
M + Q = BM3
+ QM2
+ QM + Q,
...
R(n,k)
=
k−1
i=0
QMi
+ BMk
, k 1.
We are interested in analyzing P(R
(n,∞)
1 > x), x → ∞.

Idea of the analysis
ˆR
(n,k)
1 – PageRank on a perfect branching tree
R – solution of the equation
R
d
=
γ
i=1
Cj Rj + Q
We will try to prove the following: for any ﬁxed t ∈ R, and a
randomly chosen node v,
P(R
(n,∞)
1 t) ≈ P(R
(n,k)
1 t) ≈ P( ˆR
(n,k)
1 t) ≈ P(R t)
for large enough n, k.

Idea of the analysis
If we prove that for some k = k(n) → ∞ and any > 0,
(Matrix Iterations) P R
(n,∞)
1 − R
(n,k)
1 > → 0,
(1)
(Coupling with branching tree) P R
(n,k)
1 − ˆR
(n,k)
1 > → 0,
(2)
(Limiting solution) P ˆR
(n,k)
1 − R > → 0,
(3)
as n → ∞, then it will follow, by Slutsky’s lemma, that
R
(n,∞)
1 ⇒ R(∞)
as n → ∞, where ⇒ denotes convergence in distribution.

Coupling with branching tree
We start with random node (node 1) and explore its
neighbours, labeling the stubs that we have already seen
τ – the number of generations of WBP completed before
coupling breaks

Coupling with branching tree
Lemma
Let τ be the number of generations of the TBP that we are able to
complete before we draw the ﬁrst stub that has already been
observed before. Then, for any 0 < < 1/2, and
a = (1/2 − )/ log m, where m = E[N]
P(τ a log n) = O n− /2
as n → ∞.

Combining with matrix iteration
P R
(n,∞)
1 − R
(n,k)
1 > ckKn = o(1)
We need ckn = o(1) for some k < τ
Combining this with Lemma 2, we get the main result

Main result
Let n be the number of nodes in the random graph, and let N
and D be r.v.s having the in-degree and eﬀective out-degree
distributions, resp.
Let R(n) be the rank vector computed on the graph with n
nodes.
Theorem: (Chen, L, Olvera-Cravioto, 2014) Suppose
0 < c < 1/(E[N])2, then
R1(n) ⇒ R, n → ∞,
where R is the solution to the ﬁxed point equation
R
d
= q + c
N
i=1
Ri
Di
.

Work in progress
Relaxing conditions on c: better bounds for τ and the matrix
iterations
So far, ﬁnite variance assumption
The result probably will not hold for all c ∈ (0, 1).
The PageRank must converge for all c < 1. Will we obtain
the same power law but with diﬀerent factor?

Thank you!

Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks

Ähnlich wie Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks (20)

Mehr von Yandex

Mehr von Yandex (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random networks