Facts
• Developed by Larry Page and Sergey Brin in 1998
• Patented by Stanford university
• Trademark of Google
• Backbone of Google Search Engine Technology
• http://infolab.stanford.edu/~backrub/google.html - research
paper
What is PageRank
• Link Analysis Algorithm
• Ranks pages based on the number of other pages that link to
• Gives an indication of the relative importance of a page
• Hence, an appropriate SERP(Search Engine Result Page)
listing
• Calculated by weight and number of back links
Definition
PageRank works by counting the number and quality of links to a page
to determine a rough estimate of how important the page is. The
underlying assumption is that more important pages are likely to receive
more links from other websites.
“We assume page A has pages B,C,D which points to it . The parameter
d is a damping factor which can be set from 0 and 1. We usually set d to
0.85. Also L(A) is outbound links going of page A.
The PageRank of a page A is given as follows:
PR(A)=(1-D) + D(PR(B)/L(B) + PR(C)/L(C) + PR(D)/L(D))”
PageRank forms a probability distribution over web pages, so the sum of
all the web pages, PageRank will be 1.
What is damping Factor????
• The theory is that an
imaginary surfer who is
randomly clicking on
links will eventually
stop clicking. The
probability, at any
step, that the person
will continue is a
damping factor d
Observe A
• It have inbound link only , no outbound
link
• D to A is Called Dangling links - simply
links that point to any page with no
outgoing links.
• They affect the model because it is not
clear where their weight should be
distributed, and there are a large
number of them.
• Because dangling links do not affect the
ranking of any other page directly, we
simply remove
Calculating PageRank
PageRank of a page is as follows:
PR(A)=(1-D)/N + D(PR(B)/L(B) + PR(C)/L(C) + PR(D)/L(D))
• The PR of each page depends on the PR of
pages pointing to it.
• We don’t know what PR those pages have
until the pages pointing to them have their
PR calculated.
Solution
• PageRank can be calculated by using
Simple iterative algorithm
• It means we can calculate one page’s PR
without knowing the final value of PR of
other pages
In this example each node have
equal weight 1 initially which we
have divided among each
outgoing node equally
So we got lucky, what if PR=0
PR(A) = 0.15 + 0.85*0 = 0.15
PR(B ) = 0.15 + 0.85*0.15 = 0.2775
AGAIN,
PR(A) = 0.15 + 0.85*0.2775 = 0.387875
PR(B ) = 0.15 + 0.85*0.385875 = 0.4779375
AND AGAIN,
PR(A) = 0.15 + 0.85*0. 4779375 = 0.5562946875
PR(B ) = 0.15 + 0.85*0. 5562946875 = 0.622850484375
TILL PR 1
It really doesn’t matter if PR is 1; 0 ; or any other number it will eventually settle at 1.0
Lets run the code
int main()
{
double d=0.85;
double a,b;
a=0;b=0;
int i=40;
while(i-->0){
printf("a: %5f b: %5fn",a,b);
a=(1-d)+d*b;
b=(1-d)+d*a;
}
printf("Average PageRank= %4f" ,(a+b)/2);
getch();
return 0;
}
Now Lets Try another example
int main()
{
double d=0.85;
double a,b,c,e;
a=0;b=0;c=0;e=0;
int i=40;
while(i-->0){
printf("a: %5f b: %5f c: %5f e: %5fn",a,b,c,e);
a=(1-d)+d*((b/3) +(c/3) +(e/3));
b=(1-d)+d*((c/2) +(e/2));
c=(1-d)+d*(a);
e=(1-d)+d*((c/2) +(a/2));
}
printf("Average PageRank= %4f" ,(a+b+c+e)/4);
getch();
return 0;
}
Issues with PageRank
• Prefer Old Documents than new.
• Pages Redirect to main page itself rising there rank –
spoofed PageRank
• Search optimizer selling High PageRank's to
webmasters
• Cloaking – show different content to google and
different to users
• Link Exchange - ” I’ll add you if you add me ”
• Buying Links – Buying link to your website
• Keyword Stuffing – Link in whitespaces
• Bot Writing – Automatically update , edit and copy
content
Some applications beyond Google
• Dynamic Price Setting
• Programmable Networks
• Stock market Trading
• Opinion polls
• Web Mining
• Theme based Ranking
• Reputation system for ecommerce
• Collaborative Filtering
• Business Intelligence