21. Apr 2014•0 gefällt mir•821 views

Downloaden Sie, um offline zu lesen

Melden

Internet

Technologie

changed the (formerly wrongly) definition of damping factor.

Yifan LiFolgen

- 1. Google PageRank Yifan Li GA DC Data Science, 19 April 2014
- 2. 2 Outline What is PageRank Why it is important History of PageRank Understand PageRank Simplified PageRank Algorithm Current state of the art
- 3. What is PageRank PageRank is a link analysis algorithm which assigns a numerical weighting to each Web page, with the purpose of "measuring" relative importance. Based on the hyperlinks map An excellent way to prioritize the results of web keyword searches
- 4. 4 Why it is important • At the time that Page and Brin met, search engines typically linked to pages that had the highest keyword density, which meant people could game the system by repeating the same phrase over and over to attract higher search page results. • PageRank provides a Search Engine Optimization to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
- 5. History of PageRank • PageRank was developed by Google founders Larry Page and Sergey Brin at Stanford. PageRank is patented by Stanford, and the name PageRank likely comes from Larry Page. • PageRank is now one of 200 ranking factors that Google uses to determine a page’s popularity. Even though PageRank is no longer directly important for SEO(Search Engine Optimization) purposes, the existence of back- links from more popular websites continues to push a webpage higher up in search rankings.
- 6. 6 Understand PageRank PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.
- 7. Understand PageRank(cont.) A "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back“, but eventually gets bored and starts on another random page. d damping factor is the probability, at any step, that the surfer will continue surfing.（1- d) is the probability at each page the "random surfer" will get bored and request another random page. Google uses d as 0.85. Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have PageRank zero. A page can have a high PageRank If there are many pages that point to it Or if there are some pages that point to it, and have a high PageRank.
- 8. Simplified PageRank algorithm Assume four web pages: A, B,C and D. Let each page would begin with an estimated PageRank of 0.25. L(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: A B C D A B C D
- 9. Simplified PageRank algorithm(cont.) Assume page A has pages B, C, D ..., which point to it. The parameter d is a damping factor which can be set between 0 and 1. Usually set d to 0.85. The PageRank of a page A is given as follows:
- 10. State of the art • PageRank is now one of 200 ranking factors that Google uses to determine a page’s popularity. Google Panda is one of the other strategies Google now relies on to rank popularity of pages.Even though PageRank is no longer directly important for SEO(Search Engine Optimization) purposes, the existence of back-links from more popular websites continues to push a webpage higher up in search rankings.
- 11. Thanks!