SlideShare ist ein Scribd-Unternehmen logo
1 von 80
Downloaden Sie, um offline zu lesen
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Linear Algebra behind Google Search
Dr. V.N. Krishnachandran
Department of Computer Applications
Vidya Academy of Science and Technology
Thrissur - 680501, Kerala.
August 2011
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Outline
1 Web: An example
2 Importance score
3 First unsuccessful approach
4 Second unsuccessful approach
5 Third unsuccessful approach
6 Dangling nodes
7 Disconnected webs
8 Google approach
9 Computational scheme
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Web world
The web world consists of a number of pages and links from some
of the pages to some other pages.
In a diagrammatic representation of a web world, pages are denoted
by small squares or circles and links are indicated by arrows.
See a simplified web world in next slide.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Web world
Example 1: A web with four pages numbered 1,2,3,4.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Links
In the figure above, arrow denotes:
an incoming link (also called a backlink) to Page q.
an outgoing link from Page p.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Links
Outgoing links in Example 1
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Links
Incoming links in Example 1
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score
In Google’s search algorithm, the most important concept is that
of the importance score of a page.
This we explain in the next few slides...
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score
The importance score, or simply the score, of a page is a
number which is a measure of the relative importance of a
page.
The importance score is a nonnegative real number.
The importance score of a page is derived from the backlinks
for that page.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score vector
We denote the importance score of Page k by xk.
Let there be n pages in the web. The column vector
x = [x1 x2 · · · xn]T
is called the importance score vector.
The importance score vector x is said to be normalised if
x1 + x2 + · · · xn = 1.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Unsuccessful attempts to define importance score
Before considering Google’s approach, we consider
three unsuccessful attempts to define the concept of the
importance score of a page.
A study of these unsuccessful attempts helps one appreciate the
significance of Google’s approach.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score:
First unsuccessful approach
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: First unsuccessful approach
Definition (First unsuccessful approach)
Importance score of Page k is the number of backlinks for Page k.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: First unsuccessful approach
Importance scores in Example 1
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score
Importance score: A desirable property
“A link to Page k from an important page must increase Page k’s
score more than a link from an unimportant page.”
First unsuccessful approach does not have this property.
(see next slide)
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: First unsuccessful approach
Importance score of Page 1 must be higher than that of Page 4.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score:
Second unsuccessful approach
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Second unsuccessful approach
Definition (Second unsuccessful approach)
The importance score of a page is the sum of the scores of all
pages linking to the page.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Second unsuccessful approach
Importance scores in Example 1
The importance scores in Example 1 (second approach) are
solutions of the following system of equations:
x1 = x3 + x4
x2 = x1
x3 = x1 + x2 + x4
x4 = x1 + x2
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Second unsuccessful approach
Importance scores in Example 1 : Matrix formulation
H =




0 0 1 1
1 0 0 0
1 1 0 1
1 1 0 0




x = [x1 x2 x3 x4]T
Hx = x
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Second unsuccessful approach
Importance scores in Example 1 : Matrix formulation
x is an eigenvector with eigenvalue 1 for the matrix H.
1 is not an eigenvalue of H.
There is no eigenvector with eigenvalue 1 for the matrix H.
The second approach does not produce importance scores to pages
in Example 1 .
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Second unsuccessful approach
Importance score: An undesirable property
“A page with many outgoing links has a bigger influence on the
scores of other pages than a page with less number of outgoing
links.”
This is undesirable.
The recommendation letter of a Professor who is choosy in giving
such letters carries higher value than that of a Professor who is
very liberal in issuing such letters.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score:
Third unsuccessful approach
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Notations
n = Number of pages in the web
Pages indexed by k = 1, 2, . . . , n.
nj = Number of outgoing links from page j
Lk = Set of indices of backlinks for page k
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Definition (Third unsuccessful approach)
Let the web contain n pages and let it be indexed by an integer k,
1 ≤ k ≤ n. Let Lk ⊆ {1, 2, . . . , n} be the set of backlinks for Page
k, and nj the number of outgoing links from Page j. Then
xk =
j∈Lk
xj
nj
, k = 1, 2, . . . , n.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Importance scores in Example 1 : Notations
n = 4, k = 1, 2, 3, 4.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Importance scores in Example 1 : Notations
n1 = 3, n2 = 2, n3 = 1, n4 = 2
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Importance scores in Example 1 : Notations
L1 = {3, 4}, L2 = {1}, L3 = {1, 2, 4}, L4 = {1, 2}
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Importance scores in Example 1 : Equations
Expression to compute x1:
x1 =
j∈L1
xj
nj
=
j∈{3,4}
xj
nj
=
x3
n3
+
x4
n4
=
x3
1
+
x4
2
Similar expressions for x2, x3 and x4. (See next slide ...)
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Importance scores in Example 1 : Equations
Linear system of equations to compute importance score:
x1 =
x3
1
+
x4
2
x2 =
x1
3
x3 =
x1
3
+
x2
2
+
x4
2
x4 =
x1
3
+
x2
2
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Importance scores in Example 1 : Matrix formulation
The link matrix of web world in Example 1:
A =




0 0 1 1
2
1
3 0 0 0
1
3
1
2 0 1
2
1
3
1
2 0 0




x = [x1 x2 x3 x4]T
Ax = x
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Importance score: Third unsuccessful approach
Importance scores in Example 1 : Matrix formulation
x is an eigenvector with eigenvalue 1 for the link matrix A.
1 is indeed an eigenvalue of A.
All multiples of the vector [12 4 9 6] are eigenvectors of
A corresponding to the eigenvalue 1.
The normalised importance score vector for the web in
Example 1 is
x =
12
31
4
31
9
31
6
31
= [0.387 0.129 0.290 0.194] (approx.)
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Limitations of
third unsuccessful approach
Third unsuccessful approach has two severe limitations:
Problem of dangling nodes: If there are dangling nodes in the
web, one cannot assign importance scores to any page.
Problem of disconnected web: If the web is disconnected, one
cannot assign unique importance scores to all the pages in the
web.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Dangling nodes
Definition
A dangling node is a page with no outgoing links.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Dangling nodes
Example 2 : Web with dangling node
(Page 4 is a dangling node)
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Dangling nodes
Importance scores in Example 2 : Equations
x1 = x3
x2 =
x1
3
x3 =
x1
3
+
x2
2
x4 =
x1
3
+
x2
2
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Dangling nodes
Importance scores in Example 2 : Matrix formulation
Link matrix for the web in Example 2:
A =




0 0 1 0
1
3 0 0 0
1
3
1
2 0 0
1
3
1
2 0 0




x = [x1 x2 x3 x4]T
Ax = x
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Dangling nodes
Importance scores in Example 2 : Values
x is an eigenvector with eigenvalue 1 for the matrix A.
1 is not an eigenvalue of A.
There is no eigenvector with eigenvalue 1 for the matrix A.
The definition (third approach) does not produce importance
scores to pages in Example 2 .
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Dangling nodes
Mathematics
Definition
A square matrix is called a column-schochastic matrix if all its
entries are nonnegative and the entries in each column sum to 1.
Theorem
Every column-stochastic matrix has 1 as an eigenvalue.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Dangling nodes
Mathematics
Theorem
The link matrix for a web with no dangling nodes is
column-stochastic.
Theorem
The link matrix for a web with no dangling nodes has 1 as an
eigenvalue.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Disconnected webs
Definition
A web W is disconnected if W can be partitioned into two
nonempty subwebs W1 and W2 such that there is no outgoing link
from any page in W1 to any page in W2 and vice versa.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Disconnected webs
Example 3 : A web with two disconnected subwebs
W1 (Pages 1, 2) and W2 (Pages 3, 4, 5)
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Disconnected webs
Importance scores in Example 3 : Equations
x1 = x2
x2 = x1
x3 = x4 +
x5
2
x4 = x3 +
x5
2
x5 = 0
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Disconnected webs
Importance scores in Example 3 : Matrix formulation
A =






0 1 0 0 0
1 0 0 0 0
0 0 0 1 1
2
0 0 1 0 1
2
0 0 0 0 0






x = [x1 x2 x3 x4]T
Ax = x
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Disconnected webs
Importance scores in Example 3 : Values
Two linearly independent eigenvectors with eigenvalue 1:
x =
1
2
1
2
0 0 0
x = 0 0
1
2
1
2
0
These are linearly independent, normalised, importance score
vectors in Example 3 .
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Disconnected webs
The third approach does not produce a unique importance score
for every page in a disconnected web.
In third approach:
Web is disconnected =⇒ Importance scores are not unique
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google matrix: Definition
Consider a web with n pages.
Let A be the link matrix of the web.
Let S be an n × n matrix with all entries equal to 1
n .
Let m be such that 0 ≤ m ≤ 1.
Definition
The Google matrix of the web is
M = (1 − m)A + mS.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google matrix: Damping factor
Definition
The constant 1 − m in the definition of the Google matrix is called
the damping factor of the Google matrix. (The creators of
Google’s search algorithm chose 0.85 as the damping factor.)
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Importance score
Definition
Let M be the Google matrix of a web having n pages. Let xk be
the importance score of Page k in the web and let
x = [x1 x2 · · · xn]T . Then a solution of the matrix equation
Mx = x
is called the importance score vector of the web.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Importance score
Definition (alternate)
Let M be the Google matrix of a web having n pages. Let xk be
the importance score of Page k in the web and let
x = [x1 x2 · · · xn]T . Then an eigenvector of the matrix M
having eigenvalue 1 is called the importance score vector of the
web.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Example 1
Google matrix: Example 1 .
m = 0.15
M = (1 − m)A + mS
= (1 − 0.15)




0 0 1 1
2
1
3 0 0 0
1
3
1
2 0 1
2
1
3
1
2 0 0



 + 0.15




1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4
1
4




=




0.03750 0.03750 0.88750 0.46250
0.3208¯3 0.03750 0.03750 0.03750
0.3208¯3 0.46250 0.03750 0.46250
0.3208¯3 0.46250 0.03750 0.03750




Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Example 1
The importance scores are solutions of the matrix equation
Mx = x,
which are the eigenvectors of M having the eigenvalue 1.
M is column stochastic.
M has 1 as an eigenvalue.
M has an eigenvector having eigenvalue 1.
The web in Example 1 has an importance score vector as per
Google’s approach.
Is the important score vector unique?
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Example 1
The eigenvector of M (in Example 1) having eigenvalue 1 is
x =
106613
58520
40
57
57
40
1 .
The normalised importance score vector is (approximately)
x = [0.368 0.142 0.288 0.202].
The importance scores of the web pages are
x1 = 0.368, x2 = 0.142, x3 = 0.288, x4 = 0.202.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Example 2
Example 2
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Example 3
Google matrix of web in Example 3 .
M = (1 − 0.15)






0 1 0 0 0
1 0 0 0 0
0 0 0 1 1
2
0 0 1 0 1
2
0 0 0 0 0






+ 0.15






1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5
1
5






=






0.030 0.880 0.030 0.030 0.030
0.880 0.030 0.030 0.030 0.030
0.030 0.030 0.030 0.880 0.455
0.030 0.030 0.880 0.030 0.455
0.030 0.030 0.030 0.030 0.030






Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Example 3
M (in Example 3) is column stochastic.
M (in Example 3) has 1 as an eigenvalue.
The eigenvector of M (in Example 3) having eigenvalue 1 is
x = [0.200 0.200 0.285 0.285 0.030].
The importance scores of the web pages (in Example 3) are
x1 = 0.200, x2 = 0.200, x3 = 0.285, x4 = 0.285 x5 = 0.030.
The scores are all positive.
The scores are unique even though the web has disconnected
subwebs.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Mathematics
Definition
A matrix P is said to be positive if all elements of P are positive.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Mathematics
Theorem
If a square matrix P is positive and column-stochastic, then any
eigenvector of P with eigenvalue 1 has all positive or negative
components.
Theorem
If a square matrix P is positive and column-stochastic, then the
eigenspace of P corresponding to the eigenvalue 1 has dimension 1.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Mathematics
Properties of Google matrix
Let M be the Google matrix of a web without dangling nodes.
M is positive.
M is column stochastic.
1 is an eigenvalue of M.
The eigenspace of M corresponding to the eigenvalue 1 has
dimension 1.
Continued in next slide
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Google’s approach: Mathematics
Properties of Google matrix (continued)
M has an eigenvector corresponding to the eigenvalue 1 with
all positive components.
M has a unique eigenvector x = [x1 x2 . . . xn]
corresponding to the eigenvalue 1 such that
xi > 0 for i = 1, 2, . . . , n.
x1 + x2 + · · · + xn = 1.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme in
Google’s approach
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme
Notations:
Let W be a web with n pages and no dangling nodes.
Let A be the link matrix of the web W .
Let 1 − m be the damping factor.
Let u be the n-component column vector with all entries
equal to 1
n .
Let x(0) be some n-component column vector with positive
components and ||x(0)|| = 1.
Let q be the normalised importance score vector of the web
W .
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme
The scheme:
Generate the sequence x(1), x(2), . . . of column vectors using the
following iteration scheme:
x(r+1)
= (1 − m)Ax(r)
+ mu.
Then
q = lim
r→∞
x(r)
.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Example
Compute the importance score vector of web in Example 1 .
Notations:
n = 4
A =




0 0 1 1
2
1
3 0 0 0
1
3
1
2 0 1
2
1
3
1
2 0 0




m = 0.15
u = 1
4
1
4
1
4
1
4
T
.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Example
We choose x(0) = 1
4
1
4
1
4
1
4
T
.
In the next two slides we show the computations of x(1) and
x(2).
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Example
x(1)
= (1 − m)Ax(0)
+ mu
= (1 − 0.15)




0 0 1 1
2
1
3 0 0 0
1
3
1
2 0 1
2
1
3
1
2 0 0








1
4
1
4
1
4
1
4



 + 0.15




1
4
1
4
1
4
1
4




=




0.3562
0.1083
0.3208
0.2146




Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Example
x(2)
= (1 − m)Ax(1)
+ mu
= (1 − 0.15)




0 0 1 1
2
1
3 0 0 0
1
3
1
2 0 1
2
1
3
1
2 0 0








0.3562
0.1083
0.3208
0.2146



 + 0.15




1
4
1
4
1
4
1
4




=




0.4014
0.1384
0.2757
0.1845




Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Example
The values of x(3), x(4), etc. are tabulated in the next slide. Note
that x(11) and x(12) are nearly identical. So further computations
won’t yield more accurate results.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Example
k x
(r)
1 x
(r)
2 x
(r)
3 x
(k)
4
0 0.2500 0.2500 0.2500 0.2500
1 0.3562 0.1083 0.3208 0.2146
2 0.4014 0.1384 0.2757 0.1845
3 0.3502 0.1512 0.2884 0.2101
4 0.3720 0.1367 0.2903 0.2010
5 0.3698 0.1429 0.2864 0.2010
6 0.3664 0.1422 0.2884 0.2030
7 0.3689 0.1413 0.2880 0.2018
8 0.3681 0.1420 0.2878 0.2021
9 0.3680 0.1418 0.2880 0.2021
10 0.3682 0.1418 0.2879 0.2020
11 0.3681 0.1418 0.2880 0.2021
12 0.3681 0.1418 0.2880 0.2021
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Example
The importance scores of various pages in Example 1 are as given
below:
x1 = 0.3681, x2 = 0.1418, x3 = 0.2880, x4 = 0.2021.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Mathematics
Power method to find an eigenvector of a matrix G.
Start with an initial guess (initial approximation) x(0).
Generate successive approximations x(r) by the iteration
scheme
x(r)
= Gx(r−1)
,
or equivalently,
x(r)
= Gr
x(0)
.
For large r, the vector x(r) is a good approximation to an
eigenvector of G.
The power method produces successive approximations to the
eigenvector corresponding to the largest eigenvalue of G.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Mathematics
Modified power method to find an eigenvector of a
matrix G.
Let x(r) = Gr x(0), for r = 1, 2, . . . .
x(r) may diverge to infinity or may decay to the zero vector.
A better iteration scheme is
x(r)
=
Gx(r−1)
||Gx(r−1)||
,
where || || is some vector norm.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Mathematics
Power method applied to Google matrix
We apply the power method to compute the importance score
vector of a web.
Power method can be applied to compute the importance
score eigenvector only if 1 is the largest eigenvalue of the
Google matrix.
However, we can prove that the power method can be applied
to compute the importance score eigenvector without showing
that 1 is the greatest eigenvalue of the Google matrix.
See next few slides ...
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Mathematics
Power method applied to Google matrix
Let M be the Google matrix of a web. We have
M = (1 − m)A + mS.
Let x be a normalised column vector with positive components.
x(r+1)
= Mx(r)
= ((1 − m)A + mS)x(r)
= (1 − m)Ax(r)
+ mSx(r)
= (1 − m)Ax(r)
+ mu.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Mathematics
Definition
The 1-norm of a vector v is
||v||1 = |v1| + |v2| + · · · + |vn|.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Mathematics
Theorem
Let P be a positive column-stochastic n × n real matrix and let V
be the subspace of Rn consisting of vectors v such that j vj = 0.
Then:
1 Pv ∈ V for any v ∈ V .
2 ||Pv||1 ≤ c||v||1 for any v ∈ V , where
c = max
1≤j≤n
|1 − 2 min
1≤i≤n
Pij | < 1.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
Computational scheme: Mathematics
Theorem
Every positive column-stochastic matrix P has a unique vector q
with positive components such that Pq = q with ||q||1 = 1. The
vector q can be computed as
q = lim
r→∞
Pr
x0
for any initial guess x0 with positive components such that
||x0||1 = 1.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
References
Kurt Brian and Tanya Leise, “The $25, 000, 000, 000
eigenvector: The linear algebra behind Google”, SIAM
Review, Vol.48, No.3, pp.568-581 (2005).
Amy N. Langville and Carl D. Meyer, ”Deeper Inside
PageRank”, 2004.
Hwai-Hui Fu, Dennis K.J. Lin and Hsien-Tang Tsai,
”Damping factor in Google page ranking”, Appl. Stochastic
Models Bus. Ind., 2006; 22:431444.
Christiane Rousseau and Yvan Saint-Aubin, Mathematics and
Technology (Chapter 9), Springer Undergraduate Texts in
Mathematics and Technology, 2008.
continued ...
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search
Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme
References (continued)
Monica Bianchini, Marco Gori, and Franco Scarselli, ”Inside
PageRank”, ACM Transactions on Internet Technology, Vol.
5, No. 1, February 2005, Pages 92128.
Sergey Brin and Lawrence Page, ”The Anatomy of a
Large-Scale Hypertextual Web Search Engine”, In Proceedings
of the 7th World Wide Web Conference (WWW7), 1998.
Dr. V.N. Krishnachandran
Linear Algebra behind Google Search

Weitere ähnliche Inhalte

Was ist angesagt?

Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learningShajun Nisha
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regressionkishanthkumaar
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 
Applications of Discrete Structures
Applications of Discrete StructuresApplications of Discrete Structures
Applications of Discrete Structuresaviban
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Edureka!
 
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Edureka!
 
Intro & Applications of Discrete Math
Intro & Applications of Discrete MathIntro & Applications of Discrete Math
Intro & Applications of Discrete MathBilal Khan
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Stuart russell and peter norvig artificial intelligence - a modern approach...
Stuart russell and peter norvig   artificial intelligence - a modern approach...Stuart russell and peter norvig   artificial intelligence - a modern approach...
Stuart russell and peter norvig artificial intelligence - a modern approach...Lê Anh Đạt
 
Smart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesSmart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesDATAVERSITY
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectivenessemapesce
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
The Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdfThe Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdfSaqib Raza
 
Application of discrete mathematics in IT
Application of discrete mathematics in ITApplication of discrete mathematics in IT
Application of discrete mathematics in ITShahidAbbas52
 

Was ist angesagt? (20)

Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Applications of linear algebra in field of it
Applications of linear algebra in field of itApplications of linear algebra in field of it
Applications of linear algebra in field of it
 
Applications of Discrete Structures
Applications of Discrete StructuresApplications of Discrete Structures
Applications of Discrete Structures
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Intro & Applications of Discrete Math
Intro & Applications of Discrete MathIntro & Applications of Discrete Math
Intro & Applications of Discrete Math
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Stuart russell and peter norvig artificial intelligence - a modern approach...
Stuart russell and peter norvig   artificial intelligence - a modern approach...Stuart russell and peter norvig   artificial intelligence - a modern approach...
Stuart russell and peter norvig artificial intelligence - a modern approach...
 
Smart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesSmart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case Studies
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
The Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdfThe Design and Analysis of Algorithms.pdf
The Design and Analysis of Algorithms.pdf
 
Application of discrete mathematics in IT
Application of discrete mathematics in ITApplication of discrete mathematics in IT
Application of discrete mathematics in IT
 
Trees and graphs
Trees and graphsTrees and graphs
Trees and graphs
 

Andere mochten auch

Multi-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures: From Algebra to ApplicationMulti-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures: From Algebra to ApplicationMarko Rodriguez
 
Genetic algorithm and their applications jan2013 (1)
Genetic algorithm and  their applications jan2013 (1)Genetic algorithm and  their applications jan2013 (1)
Genetic algorithm and their applications jan2013 (1)SRI TECHNOLOGICAL SOLUTIONS
 
Linear Algebra's Applications
Linear Algebra's ApplicationsLinear Algebra's Applications
Linear Algebra's ApplicationsNikhil Deswal
 
Genetics and evolution
Genetics and evolutionGenetics and evolution
Genetics and evolutionAnand P P
 
Linear Algebra: Application to Chemistry
Linear Algebra: Application to ChemistryLinear Algebra: Application to Chemistry
Linear Algebra: Application to Chemistryrasen58
 
Cs221 linear algebra
Cs221 linear algebraCs221 linear algebra
Cs221 linear algebradarwinrlo
 
Algebraic expressions
Algebraic expressionsAlgebraic expressions
Algebraic expressionsChristie Harp
 
Application of algebra
Application of algebraApplication of algebra
Application of algebraAbhinav Somani
 
algebraic expression class VIII
algebraic expression class VIIIalgebraic expression class VIII
algebraic expression class VIIIHimani Priya
 
Applications of linear algebra
Applications of linear algebraApplications of linear algebra
Applications of linear algebraPrerak Trivedi
 

Andere mochten auch (11)

Multi-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures: From Algebra to ApplicationMulti-Relational Graph Structures: From Algebra to Application
Multi-Relational Graph Structures: From Algebra to Application
 
Genetic algorithm and their applications jan2013 (1)
Genetic algorithm and  their applications jan2013 (1)Genetic algorithm and  their applications jan2013 (1)
Genetic algorithm and their applications jan2013 (1)
 
Linear Algebra's Applications
Linear Algebra's ApplicationsLinear Algebra's Applications
Linear Algebra's Applications
 
Unit 04
Unit 04Unit 04
Unit 04
 
Genetics and evolution
Genetics and evolutionGenetics and evolution
Genetics and evolution
 
Linear Algebra: Application to Chemistry
Linear Algebra: Application to ChemistryLinear Algebra: Application to Chemistry
Linear Algebra: Application to Chemistry
 
Cs221 linear algebra
Cs221 linear algebraCs221 linear algebra
Cs221 linear algebra
 
Algebraic expressions
Algebraic expressionsAlgebraic expressions
Algebraic expressions
 
Application of algebra
Application of algebraApplication of algebra
Application of algebra
 
algebraic expression class VIII
algebraic expression class VIIIalgebraic expression class VIII
algebraic expression class VIII
 
Applications of linear algebra
Applications of linear algebraApplications of linear algebra
Applications of linear algebra
 

Mehr von PlusOrMinusZero

Deep into to Deep Learning Starting from Basics
Deep into to Deep Learning Starting from BasicsDeep into to Deep Learning Starting from Basics
Deep into to Deep Learning Starting from BasicsPlusOrMinusZero
 
The Untold Story of Indian Origins of Claculus
The Untold Story of Indian Origins of ClaculusThe Untold Story of Indian Origins of Claculus
The Untold Story of Indian Origins of ClaculusPlusOrMinusZero
 
World environment day 2019 celebration
World environment day 2019 celebrationWorld environment day 2019 celebration
World environment day 2019 celebrationPlusOrMinusZero
 
Preseenting Vidya Mobile (An Android app for VAST)
Preseenting Vidya Mobile (An Android app for VAST)Preseenting Vidya Mobile (An Android app for VAST)
Preseenting Vidya Mobile (An Android app for VAST)PlusOrMinusZero
 
An introduction to TeX and LaTeX
An introduction to TeX and LaTeXAn introduction to TeX and LaTeX
An introduction to TeX and LaTeXPlusOrMinusZero
 
Fractional calculus and applications
Fractional calculus and applicationsFractional calculus and applications
Fractional calculus and applicationsPlusOrMinusZero
 
On Sangamagrama Madhava's (c.1350 - c.1425) Algorithms for the Computation of...
On Sangamagrama Madhava's (c.1350 - c.1425) Algorithms for the Computation of...On Sangamagrama Madhava's (c.1350 - c.1425) Algorithms for the Computation of...
On Sangamagrama Madhava's (c.1350 - c.1425) Algorithms for the Computation of...PlusOrMinusZero
 
Technical writing: Some guidelines
Technical writing: Some guidelinesTechnical writing: Some guidelines
Technical writing: Some guidelinesPlusOrMinusZero
 
Almost all about Google Drive
Almost all about Google DriveAlmost all about Google Drive
Almost all about Google DrivePlusOrMinusZero
 
First Introduction to Fractals
First Introduction to FractalsFirst Introduction to Fractals
First Introduction to FractalsPlusOrMinusZero
 
Mobile Communications Sajay K R
Mobile Communications Sajay K RMobile Communications Sajay K R
Mobile Communications Sajay K RPlusOrMinusZero
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh PPlusOrMinusZero
 
On finite differences, interpolation methods and power series expansions in i...
On finite differences, interpolation methods and power series expansions in i...On finite differences, interpolation methods and power series expansions in i...
On finite differences, interpolation methods and power series expansions in i...PlusOrMinusZero
 
First introduction to wireless sensor networks
First introduction to wireless sensor networksFirst introduction to wireless sensor networks
First introduction to wireless sensor networksPlusOrMinusZero
 
World environment day 2010 vidya academy mca renjith sankar
World environment day 2010 vidya academy mca renjith sankarWorld environment day 2010 vidya academy mca renjith sankar
World environment day 2010 vidya academy mca renjith sankarPlusOrMinusZero
 
World environment day 2010 vidya academy mca steffi lazar
World environment day 2010 vidya academy mca steffi lazarWorld environment day 2010 vidya academy mca steffi lazar
World environment day 2010 vidya academy mca steffi lazarPlusOrMinusZero
 
An introduction to free software
An introduction to free softwareAn introduction to free software
An introduction to free softwarePlusOrMinusZero
 
Introduction to Computer Algebra Systems
Introduction to Computer Algebra SystemsIntroduction to Computer Algebra Systems
Introduction to Computer Algebra SystemsPlusOrMinusZero
 
pi to a trillion digits : How and Why?
pi to a trillion digits : How and Why?pi to a trillion digits : How and Why?
pi to a trillion digits : How and Why?PlusOrMinusZero
 
A prize winning entry in a two-hour slide-show presentation contest on World ...
A prize winning entry in a two-hour slide-show presentation contest on World ...A prize winning entry in a two-hour slide-show presentation contest on World ...
A prize winning entry in a two-hour slide-show presentation contest on World ...PlusOrMinusZero
 

Mehr von PlusOrMinusZero (20)

Deep into to Deep Learning Starting from Basics
Deep into to Deep Learning Starting from BasicsDeep into to Deep Learning Starting from Basics
Deep into to Deep Learning Starting from Basics
 
The Untold Story of Indian Origins of Claculus
The Untold Story of Indian Origins of ClaculusThe Untold Story of Indian Origins of Claculus
The Untold Story of Indian Origins of Claculus
 
World environment day 2019 celebration
World environment day 2019 celebrationWorld environment day 2019 celebration
World environment day 2019 celebration
 
Preseenting Vidya Mobile (An Android app for VAST)
Preseenting Vidya Mobile (An Android app for VAST)Preseenting Vidya Mobile (An Android app for VAST)
Preseenting Vidya Mobile (An Android app for VAST)
 
An introduction to TeX and LaTeX
An introduction to TeX and LaTeXAn introduction to TeX and LaTeX
An introduction to TeX and LaTeX
 
Fractional calculus and applications
Fractional calculus and applicationsFractional calculus and applications
Fractional calculus and applications
 
On Sangamagrama Madhava's (c.1350 - c.1425) Algorithms for the Computation of...
On Sangamagrama Madhava's (c.1350 - c.1425) Algorithms for the Computation of...On Sangamagrama Madhava's (c.1350 - c.1425) Algorithms for the Computation of...
On Sangamagrama Madhava's (c.1350 - c.1425) Algorithms for the Computation of...
 
Technical writing: Some guidelines
Technical writing: Some guidelinesTechnical writing: Some guidelines
Technical writing: Some guidelines
 
Almost all about Google Drive
Almost all about Google DriveAlmost all about Google Drive
Almost all about Google Drive
 
First Introduction to Fractals
First Introduction to FractalsFirst Introduction to Fractals
First Introduction to Fractals
 
Mobile Communications Sajay K R
Mobile Communications Sajay K RMobile Communications Sajay K R
Mobile Communications Sajay K R
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh P
 
On finite differences, interpolation methods and power series expansions in i...
On finite differences, interpolation methods and power series expansions in i...On finite differences, interpolation methods and power series expansions in i...
On finite differences, interpolation methods and power series expansions in i...
 
First introduction to wireless sensor networks
First introduction to wireless sensor networksFirst introduction to wireless sensor networks
First introduction to wireless sensor networks
 
World environment day 2010 vidya academy mca renjith sankar
World environment day 2010 vidya academy mca renjith sankarWorld environment day 2010 vidya academy mca renjith sankar
World environment day 2010 vidya academy mca renjith sankar
 
World environment day 2010 vidya academy mca steffi lazar
World environment day 2010 vidya academy mca steffi lazarWorld environment day 2010 vidya academy mca steffi lazar
World environment day 2010 vidya academy mca steffi lazar
 
An introduction to free software
An introduction to free softwareAn introduction to free software
An introduction to free software
 
Introduction to Computer Algebra Systems
Introduction to Computer Algebra SystemsIntroduction to Computer Algebra Systems
Introduction to Computer Algebra Systems
 
pi to a trillion digits : How and Why?
pi to a trillion digits : How and Why?pi to a trillion digits : How and Why?
pi to a trillion digits : How and Why?
 
A prize winning entry in a two-hour slide-show presentation contest on World ...
A prize winning entry in a two-hour slide-show presentation contest on World ...A prize winning entry in a two-hour slide-show presentation contest on World ...
A prize winning entry in a two-hour slide-show presentation contest on World ...
 

Kürzlich hochgeladen

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Kürzlich hochgeladen (20)

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

Linear Algebra Approach to Calculating Web Page Importance Scores

  • 1. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Linear Algebra behind Google Search Dr. V.N. Krishnachandran Department of Computer Applications Vidya Academy of Science and Technology Thrissur - 680501, Kerala. August 2011 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 2. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Outline 1 Web: An example 2 Importance score 3 First unsuccessful approach 4 Second unsuccessful approach 5 Third unsuccessful approach 6 Dangling nodes 7 Disconnected webs 8 Google approach 9 Computational scheme Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 3. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Web world The web world consists of a number of pages and links from some of the pages to some other pages. In a diagrammatic representation of a web world, pages are denoted by small squares or circles and links are indicated by arrows. See a simplified web world in next slide. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 4. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Web world Example 1: A web with four pages numbered 1,2,3,4. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 5. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Links In the figure above, arrow denotes: an incoming link (also called a backlink) to Page q. an outgoing link from Page p. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 6. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Links Outgoing links in Example 1 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 7. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Links Incoming links in Example 1 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 8. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score In Google’s search algorithm, the most important concept is that of the importance score of a page. This we explain in the next few slides... Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 9. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score The importance score, or simply the score, of a page is a number which is a measure of the relative importance of a page. The importance score is a nonnegative real number. The importance score of a page is derived from the backlinks for that page. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 10. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score vector We denote the importance score of Page k by xk. Let there be n pages in the web. The column vector x = [x1 x2 · · · xn]T is called the importance score vector. The importance score vector x is said to be normalised if x1 + x2 + · · · xn = 1. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 11. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Unsuccessful attempts to define importance score Before considering Google’s approach, we consider three unsuccessful attempts to define the concept of the importance score of a page. A study of these unsuccessful attempts helps one appreciate the significance of Google’s approach. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 12. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: First unsuccessful approach Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 13. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: First unsuccessful approach Definition (First unsuccessful approach) Importance score of Page k is the number of backlinks for Page k. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 14. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: First unsuccessful approach Importance scores in Example 1 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 15. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score Importance score: A desirable property “A link to Page k from an important page must increase Page k’s score more than a link from an unimportant page.” First unsuccessful approach does not have this property. (see next slide) Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 16. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: First unsuccessful approach Importance score of Page 1 must be higher than that of Page 4. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 17. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Second unsuccessful approach Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 18. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Second unsuccessful approach Definition (Second unsuccessful approach) The importance score of a page is the sum of the scores of all pages linking to the page. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 19. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Second unsuccessful approach Importance scores in Example 1 The importance scores in Example 1 (second approach) are solutions of the following system of equations: x1 = x3 + x4 x2 = x1 x3 = x1 + x2 + x4 x4 = x1 + x2 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 20. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Second unsuccessful approach Importance scores in Example 1 : Matrix formulation H =     0 0 1 1 1 0 0 0 1 1 0 1 1 1 0 0     x = [x1 x2 x3 x4]T Hx = x Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 21. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Second unsuccessful approach Importance scores in Example 1 : Matrix formulation x is an eigenvector with eigenvalue 1 for the matrix H. 1 is not an eigenvalue of H. There is no eigenvector with eigenvalue 1 for the matrix H. The second approach does not produce importance scores to pages in Example 1 . Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 22. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Second unsuccessful approach Importance score: An undesirable property “A page with many outgoing links has a bigger influence on the scores of other pages than a page with less number of outgoing links.” This is undesirable. The recommendation letter of a Professor who is choosy in giving such letters carries higher value than that of a Professor who is very liberal in issuing such letters. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 23. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 24. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Notations n = Number of pages in the web Pages indexed by k = 1, 2, . . . , n. nj = Number of outgoing links from page j Lk = Set of indices of backlinks for page k Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 25. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Definition (Third unsuccessful approach) Let the web contain n pages and let it be indexed by an integer k, 1 ≤ k ≤ n. Let Lk ⊆ {1, 2, . . . , n} be the set of backlinks for Page k, and nj the number of outgoing links from Page j. Then xk = j∈Lk xj nj , k = 1, 2, . . . , n. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 26. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Importance scores in Example 1 : Notations n = 4, k = 1, 2, 3, 4. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 27. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Importance scores in Example 1 : Notations n1 = 3, n2 = 2, n3 = 1, n4 = 2 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 28. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Importance scores in Example 1 : Notations L1 = {3, 4}, L2 = {1}, L3 = {1, 2, 4}, L4 = {1, 2} Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 29. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Importance scores in Example 1 : Equations Expression to compute x1: x1 = j∈L1 xj nj = j∈{3,4} xj nj = x3 n3 + x4 n4 = x3 1 + x4 2 Similar expressions for x2, x3 and x4. (See next slide ...) Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 30. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Importance scores in Example 1 : Equations Linear system of equations to compute importance score: x1 = x3 1 + x4 2 x2 = x1 3 x3 = x1 3 + x2 2 + x4 2 x4 = x1 3 + x2 2 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 31. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Importance scores in Example 1 : Matrix formulation The link matrix of web world in Example 1: A =     0 0 1 1 2 1 3 0 0 0 1 3 1 2 0 1 2 1 3 1 2 0 0     x = [x1 x2 x3 x4]T Ax = x Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 32. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Importance score: Third unsuccessful approach Importance scores in Example 1 : Matrix formulation x is an eigenvector with eigenvalue 1 for the link matrix A. 1 is indeed an eigenvalue of A. All multiples of the vector [12 4 9 6] are eigenvectors of A corresponding to the eigenvalue 1. The normalised importance score vector for the web in Example 1 is x = 12 31 4 31 9 31 6 31 = [0.387 0.129 0.290 0.194] (approx.) Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 33. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Limitations of third unsuccessful approach Third unsuccessful approach has two severe limitations: Problem of dangling nodes: If there are dangling nodes in the web, one cannot assign importance scores to any page. Problem of disconnected web: If the web is disconnected, one cannot assign unique importance scores to all the pages in the web. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 34. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Dangling nodes Definition A dangling node is a page with no outgoing links. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 35. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Dangling nodes Example 2 : Web with dangling node (Page 4 is a dangling node) Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 36. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Dangling nodes Importance scores in Example 2 : Equations x1 = x3 x2 = x1 3 x3 = x1 3 + x2 2 x4 = x1 3 + x2 2 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 37. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Dangling nodes Importance scores in Example 2 : Matrix formulation Link matrix for the web in Example 2: A =     0 0 1 0 1 3 0 0 0 1 3 1 2 0 0 1 3 1 2 0 0     x = [x1 x2 x3 x4]T Ax = x Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 38. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Dangling nodes Importance scores in Example 2 : Values x is an eigenvector with eigenvalue 1 for the matrix A. 1 is not an eigenvalue of A. There is no eigenvector with eigenvalue 1 for the matrix A. The definition (third approach) does not produce importance scores to pages in Example 2 . Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 39. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Dangling nodes Mathematics Definition A square matrix is called a column-schochastic matrix if all its entries are nonnegative and the entries in each column sum to 1. Theorem Every column-stochastic matrix has 1 as an eigenvalue. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 40. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Dangling nodes Mathematics Theorem The link matrix for a web with no dangling nodes is column-stochastic. Theorem The link matrix for a web with no dangling nodes has 1 as an eigenvalue. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 41. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Disconnected webs Definition A web W is disconnected if W can be partitioned into two nonempty subwebs W1 and W2 such that there is no outgoing link from any page in W1 to any page in W2 and vice versa. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 42. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Disconnected webs Example 3 : A web with two disconnected subwebs W1 (Pages 1, 2) and W2 (Pages 3, 4, 5) Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 43. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Disconnected webs Importance scores in Example 3 : Equations x1 = x2 x2 = x1 x3 = x4 + x5 2 x4 = x3 + x5 2 x5 = 0 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 44. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Disconnected webs Importance scores in Example 3 : Matrix formulation A =       0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 2 0 0 1 0 1 2 0 0 0 0 0       x = [x1 x2 x3 x4]T Ax = x Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 45. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Disconnected webs Importance scores in Example 3 : Values Two linearly independent eigenvectors with eigenvalue 1: x = 1 2 1 2 0 0 0 x = 0 0 1 2 1 2 0 These are linearly independent, normalised, importance score vectors in Example 3 . Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 46. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Disconnected webs The third approach does not produce a unique importance score for every page in a disconnected web. In third approach: Web is disconnected =⇒ Importance scores are not unique Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 47. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 48. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google matrix: Definition Consider a web with n pages. Let A be the link matrix of the web. Let S be an n × n matrix with all entries equal to 1 n . Let m be such that 0 ≤ m ≤ 1. Definition The Google matrix of the web is M = (1 − m)A + mS. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 49. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google matrix: Damping factor Definition The constant 1 − m in the definition of the Google matrix is called the damping factor of the Google matrix. (The creators of Google’s search algorithm chose 0.85 as the damping factor.) Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 50. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Importance score Definition Let M be the Google matrix of a web having n pages. Let xk be the importance score of Page k in the web and let x = [x1 x2 · · · xn]T . Then a solution of the matrix equation Mx = x is called the importance score vector of the web. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 51. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Importance score Definition (alternate) Let M be the Google matrix of a web having n pages. Let xk be the importance score of Page k in the web and let x = [x1 x2 · · · xn]T . Then an eigenvector of the matrix M having eigenvalue 1 is called the importance score vector of the web. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 52. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Example 1 Google matrix: Example 1 . m = 0.15 M = (1 − m)A + mS = (1 − 0.15)     0 0 1 1 2 1 3 0 0 0 1 3 1 2 0 1 2 1 3 1 2 0 0     + 0.15     1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4     =     0.03750 0.03750 0.88750 0.46250 0.3208¯3 0.03750 0.03750 0.03750 0.3208¯3 0.46250 0.03750 0.46250 0.3208¯3 0.46250 0.03750 0.03750     Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 53. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Example 1 The importance scores are solutions of the matrix equation Mx = x, which are the eigenvectors of M having the eigenvalue 1. M is column stochastic. M has 1 as an eigenvalue. M has an eigenvector having eigenvalue 1. The web in Example 1 has an importance score vector as per Google’s approach. Is the important score vector unique? Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 54. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Example 1 The eigenvector of M (in Example 1) having eigenvalue 1 is x = 106613 58520 40 57 57 40 1 . The normalised importance score vector is (approximately) x = [0.368 0.142 0.288 0.202]. The importance scores of the web pages are x1 = 0.368, x2 = 0.142, x3 = 0.288, x4 = 0.202. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 55. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Example 2 Example 2 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 56. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Example 3 Google matrix of web in Example 3 . M = (1 − 0.15)       0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 2 0 0 1 0 1 2 0 0 0 0 0       + 0.15       1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5 1 5       =       0.030 0.880 0.030 0.030 0.030 0.880 0.030 0.030 0.030 0.030 0.030 0.030 0.030 0.880 0.455 0.030 0.030 0.880 0.030 0.455 0.030 0.030 0.030 0.030 0.030       Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 57. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Example 3 M (in Example 3) is column stochastic. M (in Example 3) has 1 as an eigenvalue. The eigenvector of M (in Example 3) having eigenvalue 1 is x = [0.200 0.200 0.285 0.285 0.030]. The importance scores of the web pages (in Example 3) are x1 = 0.200, x2 = 0.200, x3 = 0.285, x4 = 0.285 x5 = 0.030. The scores are all positive. The scores are unique even though the web has disconnected subwebs. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 58. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Mathematics Definition A matrix P is said to be positive if all elements of P are positive. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 59. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Mathematics Theorem If a square matrix P is positive and column-stochastic, then any eigenvector of P with eigenvalue 1 has all positive or negative components. Theorem If a square matrix P is positive and column-stochastic, then the eigenspace of P corresponding to the eigenvalue 1 has dimension 1. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 60. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Mathematics Properties of Google matrix Let M be the Google matrix of a web without dangling nodes. M is positive. M is column stochastic. 1 is an eigenvalue of M. The eigenspace of M corresponding to the eigenvalue 1 has dimension 1. Continued in next slide Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 61. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Google’s approach: Mathematics Properties of Google matrix (continued) M has an eigenvector corresponding to the eigenvalue 1 with all positive components. M has a unique eigenvector x = [x1 x2 . . . xn] corresponding to the eigenvalue 1 such that xi > 0 for i = 1, 2, . . . , n. x1 + x2 + · · · + xn = 1. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 62. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme in Google’s approach Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 63. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme Notations: Let W be a web with n pages and no dangling nodes. Let A be the link matrix of the web W . Let 1 − m be the damping factor. Let u be the n-component column vector with all entries equal to 1 n . Let x(0) be some n-component column vector with positive components and ||x(0)|| = 1. Let q be the normalised importance score vector of the web W . Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 64. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme The scheme: Generate the sequence x(1), x(2), . . . of column vectors using the following iteration scheme: x(r+1) = (1 − m)Ax(r) + mu. Then q = lim r→∞ x(r) . Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 65. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Example Compute the importance score vector of web in Example 1 . Notations: n = 4 A =     0 0 1 1 2 1 3 0 0 0 1 3 1 2 0 1 2 1 3 1 2 0 0     m = 0.15 u = 1 4 1 4 1 4 1 4 T . Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 66. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Example We choose x(0) = 1 4 1 4 1 4 1 4 T . In the next two slides we show the computations of x(1) and x(2). Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 67. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Example x(1) = (1 − m)Ax(0) + mu = (1 − 0.15)     0 0 1 1 2 1 3 0 0 0 1 3 1 2 0 1 2 1 3 1 2 0 0         1 4 1 4 1 4 1 4     + 0.15     1 4 1 4 1 4 1 4     =     0.3562 0.1083 0.3208 0.2146     Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 68. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Example x(2) = (1 − m)Ax(1) + mu = (1 − 0.15)     0 0 1 1 2 1 3 0 0 0 1 3 1 2 0 1 2 1 3 1 2 0 0         0.3562 0.1083 0.3208 0.2146     + 0.15     1 4 1 4 1 4 1 4     =     0.4014 0.1384 0.2757 0.1845     Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 69. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Example The values of x(3), x(4), etc. are tabulated in the next slide. Note that x(11) and x(12) are nearly identical. So further computations won’t yield more accurate results. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 70. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Example k x (r) 1 x (r) 2 x (r) 3 x (k) 4 0 0.2500 0.2500 0.2500 0.2500 1 0.3562 0.1083 0.3208 0.2146 2 0.4014 0.1384 0.2757 0.1845 3 0.3502 0.1512 0.2884 0.2101 4 0.3720 0.1367 0.2903 0.2010 5 0.3698 0.1429 0.2864 0.2010 6 0.3664 0.1422 0.2884 0.2030 7 0.3689 0.1413 0.2880 0.2018 8 0.3681 0.1420 0.2878 0.2021 9 0.3680 0.1418 0.2880 0.2021 10 0.3682 0.1418 0.2879 0.2020 11 0.3681 0.1418 0.2880 0.2021 12 0.3681 0.1418 0.2880 0.2021 Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 71. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Example The importance scores of various pages in Example 1 are as given below: x1 = 0.3681, x2 = 0.1418, x3 = 0.2880, x4 = 0.2021. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 72. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Mathematics Power method to find an eigenvector of a matrix G. Start with an initial guess (initial approximation) x(0). Generate successive approximations x(r) by the iteration scheme x(r) = Gx(r−1) , or equivalently, x(r) = Gr x(0) . For large r, the vector x(r) is a good approximation to an eigenvector of G. The power method produces successive approximations to the eigenvector corresponding to the largest eigenvalue of G. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 73. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Mathematics Modified power method to find an eigenvector of a matrix G. Let x(r) = Gr x(0), for r = 1, 2, . . . . x(r) may diverge to infinity or may decay to the zero vector. A better iteration scheme is x(r) = Gx(r−1) ||Gx(r−1)|| , where || || is some vector norm. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 74. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Mathematics Power method applied to Google matrix We apply the power method to compute the importance score vector of a web. Power method can be applied to compute the importance score eigenvector only if 1 is the largest eigenvalue of the Google matrix. However, we can prove that the power method can be applied to compute the importance score eigenvector without showing that 1 is the greatest eigenvalue of the Google matrix. See next few slides ... Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 75. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Mathematics Power method applied to Google matrix Let M be the Google matrix of a web. We have M = (1 − m)A + mS. Let x be a normalised column vector with positive components. x(r+1) = Mx(r) = ((1 − m)A + mS)x(r) = (1 − m)Ax(r) + mSx(r) = (1 − m)Ax(r) + mu. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 76. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Mathematics Definition The 1-norm of a vector v is ||v||1 = |v1| + |v2| + · · · + |vn|. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 77. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Mathematics Theorem Let P be a positive column-stochastic n × n real matrix and let V be the subspace of Rn consisting of vectors v such that j vj = 0. Then: 1 Pv ∈ V for any v ∈ V . 2 ||Pv||1 ≤ c||v||1 for any v ∈ V , where c = max 1≤j≤n |1 − 2 min 1≤i≤n Pij | < 1. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 78. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme Computational scheme: Mathematics Theorem Every positive column-stochastic matrix P has a unique vector q with positive components such that Pq = q with ||q||1 = 1. The vector q can be computed as q = lim r→∞ Pr x0 for any initial guess x0 with positive components such that ||x0||1 = 1. Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 79. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme References Kurt Brian and Tanya Leise, “The $25, 000, 000, 000 eigenvector: The linear algebra behind Google”, SIAM Review, Vol.48, No.3, pp.568-581 (2005). Amy N. Langville and Carl D. Meyer, ”Deeper Inside PageRank”, 2004. Hwai-Hui Fu, Dennis K.J. Lin and Hsien-Tang Tsai, ”Damping factor in Google page ranking”, Appl. Stochastic Models Bus. Ind., 2006; 22:431444. Christiane Rousseau and Yvan Saint-Aubin, Mathematics and Technology (Chapter 9), Springer Undergraduate Texts in Mathematics and Technology, 2008. continued ... Dr. V.N. Krishnachandran Linear Algebra behind Google Search
  • 80. Web Scores Approach 1 Approach 2 Approach 3 Dangling... Disconnected... Google’s approach Computational scheme References (continued) Monica Bianchini, Marco Gori, and Franco Scarselli, ”Inside PageRank”, ACM Transactions on Internet Technology, Vol. 5, No. 1, February 2005, Pages 92128. Sergey Brin and Lawrence Page, ”The Anatomy of a Large-Scale Hypertextual Web Search Engine”, In Proceedings of the 7th World Wide Web Conference (WWW7), 1998. Dr. V.N. Krishnachandran Linear Algebra behind Google Search