Anzeige
Anzeige

Más contenido relacionado

Anzeige
Anzeige

PageRank

  1. Google Page Rank Algorithm Abhav Luthra 7th Semester Computer Science Engineering
  2. Facts • Developed by Larry Page and Sergey Brin in 1998 • Patented by Stanford university • Trademark of Google • Backbone of Google Search Engine Technology • http://infolab.stanford.edu/~backrub/google.html - research paper
  3. What is PageRank • Link Analysis Algorithm • Ranks pages based on the number of other pages that link to • Gives an indication of the relative importance of a page • Hence, an appropriate SERP(Search Engine Result Page) listing • Calculated by weight and number of back links
  4. BACK LINKS INBOUND LINKS OUTBOUND LINKS
  5. Definition PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the page is. The underlying assumption is that more important pages are likely to receive more links from other websites. “We assume page A has pages B,C,D which points to it . The parameter d is a damping factor which can be set from 0 and 1. We usually set d to 0.85. Also L(A) is outbound links going of page A. The PageRank of a page A is given as follows: PR(A)=(1-D) + D(PR(B)/L(B) + PR(C)/L(C) + PR(D)/L(D))” PageRank forms a probability distribution over web pages, so the sum of all the web pages, PageRank will be 1.
  6. What is damping Factor???? • The theory is that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d
  7. Observe A • It have inbound link only , no outbound link • D to A is Called Dangling links - simply links that point to any page with no outgoing links. • They affect the model because it is not clear where their weight should be distributed, and there are a large number of them. • Because dangling links do not affect the ranking of any other page directly, we simply remove
  8. Calculating PageRank PageRank of a page is as follows: PR(A)=(1-D)/N + D(PR(B)/L(B) + PR(C)/L(C) + PR(D)/L(D)) • The PR of each page depends on the PR of pages pointing to it. • We don’t know what PR those pages have until the pages pointing to them have their PR calculated.
  9. Solution • PageRank can be calculated by using Simple iterative algorithm • It means we can calculate one page’s PR without knowing the final value of PR of other pages In this example each node have equal weight 1 initially which we have divided among each outgoing node equally
  10. So we got lucky, what if PR=0 PR(A) = 0.15 + 0.85*0 = 0.15 PR(B ) = 0.15 + 0.85*0.15 = 0.2775 AGAIN, PR(A) = 0.15 + 0.85*0.2775 = 0.387875 PR(B ) = 0.15 + 0.85*0.385875 = 0.4779375 AND AGAIN, PR(A) = 0.15 + 0.85*0. 4779375 = 0.5562946875 PR(B ) = 0.15 + 0.85*0. 5562946875 = 0.622850484375 TILL PR  1 It really doesn’t matter if PR is 1; 0 ; or any other number it will eventually settle at 1.0
  11. Lets run the code int main() { double d=0.85; double a,b; a=0;b=0; int i=40; while(i-->0){ printf("a: %5f b: %5fn",a,b); a=(1-d)+d*b; b=(1-d)+d*a; } printf("Average PageRank= %4f" ,(a+b)/2); getch(); return 0; }
  12. PageRank eventually settle at 1 in a long run
  13. Now Lets Try another example int main() { double d=0.85; double a,b,c,e; a=0;b=0;c=0;e=0; int i=40; while(i-->0){ printf("a: %5f b: %5f c: %5f e: %5fn",a,b,c,e); a=(1-d)+d*((b/3) +(c/3) +(e/3)); b=(1-d)+d*((c/2) +(e/2)); c=(1-d)+d*(a); e=(1-d)+d*((c/2) +(a/2)); } printf("Average PageRank= %4f" ,(a+b+c+e)/4); getch(); return 0; }
  14. Issues with PageRank • Prefer Old Documents than new. • Pages Redirect to main page itself rising there rank – spoofed PageRank • Search optimizer selling High PageRank's to webmasters
  15. • Cloaking – show different content to google and different to users • Link Exchange - ” I’ll add you if you add me ” • Buying Links – Buying link to your website • Keyword Stuffing – Link in whitespaces • Bot Writing – Automatically update , edit and copy content
  16. Some applications beyond Google • Dynamic Price Setting • Programmable Networks • Stock market Trading • Opinion polls • Web Mining • Theme based Ranking • Reputation system for ecommerce • Collaborative Filtering • Business Intelligence
Anzeige