Write the link matrix and compute the importance score vector by usin.docx
Write the link matrix and compute the importance score vector
by using the Google PageRank algorithm, if the web consists of
4 webpages with the following links: 1 rightarrow 2, 2, 2
rightarrow 1, 4 rightarrow 3, 3 rightarrow 4 and 2 rightarrow 3.
[0 1 8 1/2 0 8 0 0 0 0 1 0 0 1 0] How would you modify the
webgraph by adding a single edge such that all webpages have
the same importance score.
Solution
Ok. Let's see. You have got 4 webpages with the following
links:
1 -> 2 , 2 -> 1, 4 -> 3, 3 -> 4, 2 -> 3
Lets write them in sorted way, so that easy to understand:
1 -> 2 , 2 -> 1, , 2 -> 3, 3 -> 4, 4 -> 3
Well lets calculate adjacency matrix for this graph.
0 1 0 0
So, A = 1 0 1 0
0 0 0 1
0 0 1 0
I assume by link matrix, you were trying to say, the transition
matrix which looks like as follows:
0 1/2 0 0
So, L = 1 0 0 0
0 1/2 0 1
0 0 1 0
So, link matrix is simply calculated based on the number of
outbound links. That is, outbound link to a page2 by page1
divided by the total outbound links from page1.
Sum of all the values in a column must be 1.
Now, we want to calculate the importance score for the pages
using the Google PageRank Algorithm.
Now, the original pagerank algorithm was given by the
following equation:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... ... .... + PR(Tn)/C(Tn))
where
PR(A) is the PageRank of page A,
PR(Ti) is the PageRank of pages Ti which link to page A,
C(Ti) is the number of outbound links on page Ti and
d is a damping factor which can be set between 0 and 1.
Ok, lets apply it to our example. We will have to first make
guess for the page rank of a page.
Say, PR of page 2 is 1 and page 4 is also 1, just assuming.
Damping factor is usually said to be taken as 0.85.
So, as per the equation,
PR(1) = (1 - 0.85) + 0.85 * (PR(2) / 2)
Now, in this equation page 2 links to page 1, thats why we are
considering the page rank for page 2, and dividing it by total
number of outbound links on page 2 i.e. 2.
So, PR(1) = 0.15 + 0.85 * 0.5 = 0.575
Similarly, PR(2) = 0.15 + 0.85 * (PR(1) / 1) = 0.15 + 0.85
* 0.575 = 0.639
PR(3) = 0.15 + 0.85 * (PR(2) / 2 + PR(4) / 1 ) = 0.15
+ 0.85 * (0.5 + 1) = 1.425
PR(4) = 0.15 + 0.85 * (PR(3) / 1 ) = 0.15 + 0.85 * 1.425
= 1.361
Now, we have to re do these calculations or iterations until page
rank scores stop changing. It doesn't matter what guesses you
start with, you will always end up with the right page rank
scores.
I am skipping the iterations and writing down for you the final
page rank scores.
PR(1) = 0.0837
PR(2) = 0.1086
PR(3) = 0.4163
PR(4) = 0.3914
You can see above, the page rank for page 1 is lowest and for
page 3 its highest.
Reason is page 3 is referred by two incoming links, from page
2 and page 4. More the number of incoming links, more will be
the pagerank.
Now, you may ask, page 4 has also one incoming link same as
page 1 and 2, then why it has more score. Because page 4 is
referred by an incoming link from page 3 which has very high
score. So, when a page with high score votes another page
through a link, that page also gets the high page rank score.
b) Now, you wanted to add a single edge and get the scores for
all the pages same.
Well the answer is add an outgoing edge from page 4 to page 1.
So, now it looks like as follows:
1 -> 2 , 2 -> 1, , 2 -> 3, 3 -> 4, 4 -> 1, 4 -> 3
And the link matrix as follows:
0 1/2 0 1/2
L = 1 0 0 0
0 1/2 0 1/2
0 0 1 0
Now, the page rank scores are as follows:
PR(1) = 0.2500
PR(2) = 0.2500
PR(3) = 0.2500
PR(4) = 0.2500
Below, I have also pasted the MATLAB code implementing the
page rank algorithm.
Refer it, analyze it and you can experiment with it as well.
L = [0 0.5 0 0.5; %Link Matrix
1 0 0 0;
0 0.5 0 0.5;
0 0 1 0;];
N = length(L);
PR = (1/N)*ones(length(L),1); %define PageRank vector for t
= 0
d = 0.85; %define damping rate
iter = 1;
delta_PR = Inf; %set initial error to infinity
while delta_PR > 1e-6 %iterate until error is less than
1e-6
tic;
prev_PR = PR; %save previous PageRank vector (t-
1)
PR = d*L*PR + ((1-d)/N)*ones(N,1); %calculate new
PageRank (t)
delta_PR = norm(PR-prev_PR);%calculate new error
t(iter)=toc;
str=sprintf('for d=%g , iteration %d:
time=%11.4g',delta_PR,iter,t(iter));
disp(str);
iter = iter + 1;
end
powerRank = pinv((eye(length(L)) - d*L))*(((1-
d)/N)*ones(length(L),1));