SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
QAB Term 1

Markov Chains and Google Inc.




                      GUSTAVO ARGUELLO
                       KUNDAN BHADURI
                         VERITY NOBLE




                        IMBA NOV 2010 N1

                       IE BUSINESS SCHOOL
                       MARIA DE MOLINA 11
                       MADRID 28002 SPAIN
QAB Term 1 Project: Markov Chains and Google Inc.

Table of Contents

Implementing Markov Chains with Google PageRank ......................................................................................................... 2
Issues to be addressed ......................................................................................................................................................... 3
Techniques that may be used to overcome the problem of solving such a large system ................................................... 4
Exhibit 1: A sample 4-state Markov chain with transition probabilities .............................................................................. 6
Exhibit 2: Sample 4X4 transition Matrix ............................................................................................................................... 6
Exhibit 3: Explaining the basis of Markov’s chain ................................................................................................................ 6
Exhibit 4: Demonstrating the stable state values using simple matrix multiplication ......................................................... 7
Exhibit 5: Calculating the steady state eigen values πA and πE ............................................................................................ 8
Exhibit 6: The improved Google PageRank algorithm.......................................................................................................... 8
Exhibit 7: PageRank of the search string ‘Techbend blog’ ................................................................................................... 9
Exhibit 8: The correlation between a webpage and the rest of the web ............................................................................ 9
Exhibit 9: KundanBhaduri.com and its links to other sites................................................................................................. 10
Exhibit 10: Applying Markov Chain method to calculate the PageRank for ‘TechBend blog’ ........................................... 11
Exhibit 11: Computing a small Eigen value with Power Method ....................................................................................... 12




IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                                                                                Page | 1
QAB Term 1 Project: Markov Chains and Google Inc.

Implementing Markov Chains with Google PageRank

In its most basic form, a homogeneous Markov chain (Exhibit 1) simply refers to a series of events/actions that follow
one another and that are independent of each other, while the transition from one state to another is memory-less.
More scientifically, a Markov chain is a collection of random variables {Xt} which holds the property that given the
current state, the future is conditionally independent of the past.1 The collection of these variables is shown in a
square matrix which is known as the Transition Matrix. Therefore, we can classify a problem to be solvable by the
theory of Markov chains if it bears the following characteristics:

       a) At any point in time, any of the objects should be in one and exactly one defined state. At the end of the period,
          the object can move to a new state or remain in its original state 2.
       b) The objects move between states based on the transition probabilities (Exhibit 2) that depend on only the
          current state. The sum of all probabilities of moving to all possible states should be one.
       c) The transition probabilities (of going from A to B) remain constant over time.

In order to develop an understanding of how to solve the Markov chain, assume that the simple 2-state chain in
Exhibit 2 describes a simple website. A user typically clicks a link on the homepage (E) for 70% of the time that leads
her to page (A), while the remaining 30% of the time, the user clicks a link that keeps her on the same page (E).
Similarly, once the user is on page (A), 40% of the times, the user clicks another link back to (E) and the remaining 60%
of the time the user clicks a link that keeps her on the same page (E). The Markov chain can help us find the
probabilities of a random user being present on any page after X number of iterations of this chain. The website
administrator might want to use this information in order to decide as to which page to focus on for maximising his ad
revenue. Please note that Google’s implementation of the Markov Chain is that of a Non-Absrobing Markov Chain.

In order to solve this problem, we start by using the tree method of calculating 2nd level probability Pij (2) i.e. the
probability of going from any node i to j in the 2nd iteration, where i, j belong to E or A as given in Exhibit 4. Here we
observe that the probability of landing on the page A are now 63% and 64% respectively if the user was at E and A
respectively at the end of the first iteration. Following this method, if we continue working for up to 7 iterations, we
will realize that the probability values have reached a steady state and do not change anymore.

In order to find the steady state probability values of both the webpages, we use the steady state equation of π =
π*P and solve as shown in Exhibit 5. This establishes the Eigen values of πA and πE as 0.63 and 0.37 respectively.
Therefore, we can recommend that it is wiser to spend advertising effort on the page A since in the long run it is twice
as likely to attract clicks as page E. As we progress towards looking at how Google ranks pages according to their
relevance, it will be interesting to note that their Eigen values play a significant part.

Markov chains have significant use in industrial research, organization behaviour, financial markets analysis, human
resource planning, marketing forecast etc. A very interesting use of Markov’s chain has been in the music industry. As
early as in the 1950s, music composers used the Markov Chain to study the pattern of notes in popular songs3 and
thereby create new music sequences based on the studied musical notes.

The example of linked webpages that we discussed above can now be extrapolated to calculate the probability of
arriving at any webpage for a certain search criteria, if the entire World Wide Web is considered as a large connected,
memoryless chain. Based on the relevance criterion, we can estimate the highest relevance factor, and therefore any
page’s utility rank for a search string. This is the rationale behind Google’s patented PageRank algorithm.



1
    Weisstein, Eric W. "Markov Chain." From MathWorld - A Wolfram Web Resource. http://mathworld.wolfram.com/MarkovChain.html
2
    Tamara Lynn Anthony, Rice University: Markov Chains
3
    Verbeurgt Karsten, Dinolfo Michael, Fayer Mikhail: Extracting Patterns in Music for Composition via Markov Chains


IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                                              Page | 2
QAB Term 1 Project: Markov Chains and Google Inc.

Google’s PageRank algorithm4 is a stochastic algorithm that determines the significance of a page relative to a search
string. This is not the only factor that Google adopts to rank pages, but it is an important one. For Google (or for a web
administrator), the PageRank of a page denotes the real probability of a random web surfer reaching that page after
clicking on many links. The PageRanks form a probability distribution over web pages, explaining why the sum of
PageRank of all pages is 1. Refer to Exhibit 6 for a mathematical representation of the PageRank algorithm. Essentially,
the Google PageRank method will rank those pages higher (i.e. more important) that have links to other higher ranked
or more important pages.

Let us explain the algorithm with a real-life example: One of the co-authors of this report is an active Technology
blogger and writes a blog called “The TechBend” at www.KundanBhaduri.com. Exhibit 7 shows that the Google
PageRank of the search string “Techbend blog” is highest for www.KundanBhaduri.com and it thus appears on top of
Google’s search results. Interestingly, while there are other professional sites and blogs with domain names such as
www.TechBend.com etc, yet they do not figure anywhere close to the top of the search results on Google. Let us
explore how this was achieved using the application of Markov Chain.

Holistically, the internet as we know is a connected graph of interlinked webpages (Exhibit 8). Therefore, it will have
an exhaustively large transition probability matrix. One look at Exhibit 9 tells us that for the homepage of The
Techbend to rank high on Google’s PageRank, its Eigen value has to be higher than all other competing webpages that
have the same context. More specifically, Eigen values on connections to those nodes (webpages) in the matrix have
to be high which themselves have high Eigen values with other connections. In other words, the probability of reaching
our target page will be high when coming from another high-probability page. We tested this logic with Exhibits 3 and
5 where we saw that A achieved a higher Eigen value because it was more probable to arrive at A from E or to remain
on A itself. This logic is at the core of Google’s PageRank.

In our example, www.KundanBhaduri.com does achieve a higher PageRank by linking itself with other highly
prominent websites such as Techcrunch, Engadget and TED. Since these sites enjoy a higher PageRank, by linking
themselves back to The Techbend Blog, the overall probability of a random surfer arriving at www.KundanBhaduri.com
is higher than it is for www.TechBend.com. This is explained by a higher Eigen Value (Exhibit 10) and therefore a
higher PageRank for The Techbend. An important factor that needs to be emphasized here is that it is not just about
the number of links that a webpage exchanges with another but its relative importance in the universe of all such links.

Issues to be addressed

However, since the internet is an exhaustively large set of nodes (over 1 trillion)5, there are some issues that need to
be addressed to make the Markov Chain model functional for Google PageRank. Firstly, the calculation of the Eigen
Vector for such a large (and growing) matrix is non-trivial. We will address this issue in the second part of the report.

Other than that, the issues related to handling dangling nodes (i.e. dead pages) and calculating an appropriate
damping factor are significant. The damping factor refers to the probability that the random user will not abruptly end
the session (by either exiting the browser or typing a new URL). In order to avoid a situation of creating an absorbing
Markov chain, pages with no outbound links are assumed to link out to all other pages in the collection. Their
PageRank scores are therefore divided evenly amongst all other pages.

Calculating the preliminary transition matrix of the web is also a significant challenge given the massive size of the
worldwide web. Therefore, a workaround to this problem is by ‘guessing’ the transition matrix and then progressively
correcting the value. Since Google recalculates the PageRanks every time it crawls through the web, its approximation
decreases with each iteration.

4
    Hwai-Hui Fu , Dennis K. J. Lin and Hsien-Tang Tsai (Dept. of Bus. Administration, Shu-Te University): Applied Stochastic models in Business and Industry
5
    Alpert, Jesse, and Nissan Hajaj. "We Knew the Web Was Big..." Official Google Blog. 25 July 2008. Web. 06 Feb. 2011.


IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                                                                             Page | 3
QAB Term 1 Project: Markov Chains and Google Inc.

Techniques that may be used to overcome the problem of solving such a large
system:

Now that we understand how Google was able to apply a form of Markov Chain modelling to create their PageRank system,
we will address one of the most significant problems they faced, solving the system π = π P. Solving this equation in a small
matrix we can quickly find exact solutions. When the web was much smaller, Google could compute the steady state vector
of 26 million pages in about 2 hours6. The resulting computation would then be used for a fixed period of time. However,
because of the sheer size of the World Wide Web, which Google asserts the number of websites is now over the 1 trillion
mark7, the resulting stochastic matrix will now contain over a trillion rows and columns.

Additionally, given the dynamics of Web 2.0, it would no longer be efficient for Google to use the stale data from these
computations for a fixed time interval. “Today, Google downloads the web continuously, collecting updated page
information and re-processing the entire web-link graph several times per day”8. In sum, the ever changing, and ever
expanding nature of the World Wide Web and its content, coupled with the search engine’s commitment to provide the
best information available, only serves to multiply exponentially the problem of solving the aforementioned system.

If you think about it, the resulting matrix of the web, with it’s over a trillion columns and rows, is going to be composed
mostly of zeroes, given that most webpages link to a very tiny and limited number of additional web pages. In fact, a 2004
study shows that the average number of out-links from a given webpage is just 52, hence only 52 of the remaining trillion
elements are non-zero.9 This means that the web matrix is very sparse.

In order to solve this problem, one of the main tools that can be used (or a variation thereof that Google appears to have
implemented), is called “The Power Method” or “Power Iteration”. This method applied to the Google matrix will converge
to the PageRank vector, in other words, it will ultimately help us define the weighting or importance of our webpages
relative to the entire matrix. The power method is an iterative process for approximating eigenvalues; we will use this
method to find our dominant Eigenvalue and Eigenvector. “Eigenvectors of a square matrix are the non-zero vectors
which, after being multiplied by the matrix, remain proportional to the original vector".10 In order to implement this
method, we must assume that our matrix, which we will now refer to as matrix A, has a dominant eigenvalue with
corresponding dominant eigenvectors. The dominant eigenvector of a matrix is an eigenvector corresponding to the
eigenvalue of largest magnitude of that matrix. In order to approximate a dominant eigenvector we choose an initial
approximation of one of the dominant eigenvectors of A, which we will call π 0. Then we can form the following sequence11:

                                                                                        π 1 = A π0
                                                                     π 2 = A π 1 = A(A π 0) = A2 π 0
                                                                    π 3 = A π 2 = A(A2 π 0) = A3 π 0
                                                                                           ⁞
                                                                 π k = A π k-1 = A(Ak-1 π 0) = Ak π 0

For large powers of k, this method provides a good approximation of the dominant eigenvector in matrix A. The method
requires successive iterates until some convergence criterion is satisfied. With our dominant eigenvector, we can find our
dominant eigenvalue using the Rayleigh quotient, as follows12:

6
  Alpert, Jesse, and Nissan Hajaj. "We Knew the Web Was Big..." Official Google Blog. 25 July 2008. Web. 06 Feb. 2011.
<http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html>.
7
  Ibid.
8
    Ibid.
9
  Anuj Nanavati, Arindam Chakraborty, David Deangelis, Hasrat Godil, and Thomas D’Silva, An investigation of documents on the World Wide Web,
h p://www.iit.edu/˜dsiltho/Inves ga on.pdf, December 2004.
10 "Eigenvalues and Eigenvectors." Wikipedia, the Free Encyclopedia. 27 Sept. 2010. Web. 10 Feb. 2011.
http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors.
11
   Larson, Ron, David C. Falvo, and Bruce H. Edwards. Elementary Linear Algebra. Boston: Houghton Mifflin, 2004. 550-58. Print.
12
     Ibid.


IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                                                              Page | 4
QAB Term 1 Project: Markov Chains and Google Inc.


                                                         λ= Aπ ∙π
                                                              ___________________________



                                                                        π∙π

“In cases for which the power method generates a good approximation of a dominant eigenvector, the Rayleigh quotient
provides a correspondingly good approximation of the dominant eigenvalue”13.

One of the unique features of the Google matrix, as we briefly mentioned before, is that the total number of nonzero
elements in a given row is quite small (due to the small number of hyperlinks that a given webpage might contain) (Exhibit
11). Since all our computations involve this sparse matrix multiplied by vectors, an iteration of the power method is
considered very cheap14.

Another necessary technique Google implemented to make this system solvable was the fix to the dangling node problem.
What happens when a user arrives at a webpage that does not link out to another webpage? Does our random surfer
become absorbed by this webpage, does he never leave? This is the dangling node problem, for which our Markov Chain
could categorize these nodes as absorbing states, unless we do something to correct this situation. Suppose the Google
Matrix was called Matrix H. In order to correct for this, we could create a new matrix S = H + dw, where d is a column
vector that identifies dangling nodes and assigns either a 1 if the node is dangling or a 0 otherwise, and w is a row vector
(w1, w2, …, wn) used to determine where our random surfer will go in order to not become absorbed. One way of assigning
value to this row vector is to say that there is equal probability our surfer will land on any of the n webpages that exist, so
the row for w would look like this: (                     …             ). Whilst there are other ways to assign w, this is the most common, and is
sufficient for our purposes.

Another important technique that may be used by Google to help solve the system is the inclusion of a damping factor. The
damping factor is added in to account for the possibility that a given web surfer may at any time choose not to follow the
links on a given webpage that are available to him and type in any URL in order to go to a page that is out of the current
chain. In fact, Brin and Page reference the damping factor in their original paper on Google (submitted while at Stanford),
“The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85”15.

While the damping factor is intended to model the behaviour of a random web surfer, it also serves the additional
purpose of speeding up convergence of the power method. This is because the ratio of the two eigenvalues largest in
magnitude of the matrix determine how quickly the method converges16. It has been proven that the second largest
eigenvalue of the Google matrix is less than or equal to the damping factor used17. The power method converges quickly
when the damping factor is less than 1. According to Rebecca Wills, only 29 iterations are required for the difference
between iterates to become less than 10-2 when using a damping factor of 0.85, the number of iterations goes up to 44
when the damping factor is raised to 0.9018. Hence, the damping factor increases/speeds the solvability of this complex
system by reducing the iterations necessary to assign PageRank vectors.

While Google’s problem of solving this enormous system is certainly no easy task, especially not at the speed that they
might require. They have been able to overcome these significant obstacles through the unique application of certain
existing mathematical algorithms.



13
     Larson, Ron, David C. Falvo, and Bruce H. Edwards. Elementary Linear Algebra. Boston: Houghton Mifflin, 2004. 550-58. Print.
14
     Wills, Rebecca. “Google’s PageRank: The Math Behind the Search Engine.” The Mathematical Intelligencer 28.4 (Fall 2006): 6-11.
15
     Brin, S., and Page L.. "The Anatomy of a Large-scale Hypertextual Web Search Engine." Computer Networks and ISDN Systems 30.1-7 (1998): 107-17. Print.
16
     Gene H. Golub and Charles F. Van Loan, Matrix computations, 3rd ed., The Johns Hopkins University Press, 1996.
17
     Taher H. Haveliwala and Sepandar D. Kamvar, The second eigenvalue of the Google matrix, Tech. report, Stanford University, 2003.
18
     Wills, Rebecca. “Google’s PageRank: The Math Behind the Search Engine.” The Mathematical Intelligencer 28.4 (Fall 2006): 6-11.


IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                                                                 Page | 5
QAB Term 1 Project: Markov Chains and Google Inc.

Exhibit 1: A sample 4-state Markov chain with transition probabilities


                               P11


                           1                      P12

                                                                                                  2
                                                                    P23

                                                                                                         P24
                                                                             P41
                                             3
                                                                                                                 4
                                                                                 P34




Exhibit 2: Sample 4X4 transition Matrix

                                                                                                   			
                                                                                           		     				     			
                                                              			
                                                                                                   			
                                                                          				         								    			     		




Exhibit 3: Explaining the basis of Markov’s chain19




19
     Image taken from http://en.wikipedia.org/wiki/Markov_chain


IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                                   Page | 6
QAB Term 1 Project: Markov Chains and Google Inc.

Exhibit 4: Demonstrating the stable state values using simple matrix
multiplication

                                          0.3 0.7
                                          0.4 0.6
                                    P=

                                     Pij (2) = |P|2ij

                            0.3 0.7   0.3 0.7   0.37 0.63
                            0.4 0.6   0.4 0.6   0.36 0.64
                                    *         =




                              P3           0.363          0.637
                                           0.364          0.636

                              P4          0.3637         0.6363
                                          0.3636         0.6364

                              P5         0.36363        0.63637
                                         0.36364        0.63636

                              P6      0.363637 0.636363
                                      0.363636 0.636364

                              P7      0.363636 0.636364           S
                                      0.363636 0.636364           t
                                                                  a
                                                                  b
                              P8      0.363636 0.636364
                                                                  l
                                      0.363636 0.636364
                                                                  e

                              P9      0.363636 0.636364           S
                                      0.363636 0.636364           t
                                                                  a
                              P10     0.363636 0.636364           t
                                                                  e
                                      0.363636 0.636364
                                                                  V
                              P11     0.363636 0.636364           a
                                      0.363636 0.636364           l
                                                                  u
                              P12     0.363636 0.636364           e
                                      0.363636 0.636364           s




IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble     Page | 7
QAB Term 1 Project: Markov Chains and Google Inc.

Exhibit 5: Calculating the steady state eigen values πA and πE


                                                         π = π*P

                                                                                0.3 0.7
                                 Therefore, π 						π 		 =      π 						π 		*           	
                                                                                0.4 0.6


                                            Solving these two equations:

                                            1. π = 0.3*π +0.4*π
                                            2. π = 0.7*π +0.6*π

                                                             	
                                                   Also, we know that:

                                                   3. πE + πA = 1



                        Since equations 1 & 2 are similar, solving equations 2 and 3 together:

                                             π = 0.7*(1 − π ) +0.6*π

                                                       Or,     = 0.63

                                                       And,    = 0.37




Exhibit 6: The improved Google PageRank algorithm

                                               1                     ( )          ( )           ( )
                   PR(A) = 1 −         	 ∗	        +           ∗         +	           + ⋯ +	
                                  ∑ ( )                ∑ ( )        ( )          ( )           ( )

                       Where:

                   •   PR(A) is the PageRank of page A
                   •   PR(Ti) is the PageRank of pages Ti that link to page A
                   •   C(Ti) is the number of outbound links on page Ti
                   •   n is the total number of all pages that link to page A
                   •   N is the total number of all pages on the web.

It is noteworthy that there is an adjusting damping factor involved in the calculation. The above equation represents
the final version of the PageRank algorithm with the damping factor being incorporated within the first argument on
the RHS of the equation.




IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                        Page | 8
QAB Term 1 Project: Markov Chains and Google Inc.

Exhibit 7: PageRank of the search string ‘Techbend blog’




Exhibit 8: The correlation between a webpage and the rest of the web20




      The importance of these links determines
     the overall importance of your webpage to
              the PageRank algorithm




20
 Laure Ninove, Cristobald de Kerchove , Paul Van Dooren: Université Catholique de Louvain
http://www.esat.kuleuven.be/scd/golub/presentations/Gene_PVD.pdf

IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                          Page | 9
QAB Term 1 Project: Markov Chains and Google Inc.

 Exhibit 9: KundanBhaduri.com and its links to other sites




                                               TechCrunch
                                            Very high PageRank




                                                                            Rest of
                                                                           Internet


                                                 Engadget
                                            Very high PageRank

The homepage of KundanBhaduri.com
     hosts the blog The TechBend




                                                    TED
                                            Very high PageRank




 IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                   Page | 10
QAB Term 1 Project: Markov Chains and Google Inc.

Exhibit 10: Applying Markov Chain method to calculate the PageRank for
‘TechBend blog’

Following is the probability matrix that shows the likelihood of a user clicking on a page to arrive at the homepage of
another website when she is searching for the string “TechBend blog”. All site names here refer to their respective
homepages, for the purpose of Markov chain analysis.




                                            m




                                                                                                                                     et
                                                                                                      com
                                          .co




                                                                                                                                er n
                                                             m



                                                                         com
                                                         .co
                                       uri




                                                                                    m




                                                                                                                            I nt
                                                                                                  ch.
                                                                           t.
                                       ad




                                                                                      o
                                                      nd



                                                                        ge


                                                                                   D.c




                                                                                                 n




                                                                                                                        the
                                                                                                            …
                                    Bh



                                                   Be




                                                                                             Cru
                                                                    gad


                                                                                TE
                                  an



                                                   ch




                                                                                                                       of
                                                                                             ch
                                                                 En
                                nd



                                                Te




                                                                                                                       st
                                                                                          Te




                                                                                                                    Re
                               Ku




        KundanBhaduri.com           0.6              0.3             0.01          0.03      0.01               …           …
          TechBend.com              0.42             0.1             0.12          0.01      0.11               …           …
          Engadget.com              0.65            0.02              0.1          0.21      0.01               …           …
             TED.com                0.54            0.22              0.1            0       0.09               …           …
         TechCrunch.com             0.64            0.17             0.13          0.01        0                …           …
                  …                 0.59            0.31             0.02          0.04      0.01               …           …
        Rest of the Internet         …                …                …            …         …                 …           …




                               Transition Probabilities of
                         KundanBhaduri.com and TechBend.com

                                             For the Stable-state matrix π = π*P                                              (1)
                                                        We assume:
                                                 Webpage                Eigen Value
                                            KundanBhaduri.com                πA
                                               TechBend.com                  πB
                                               Engadget.com                  πC
                                                  TED.com                    πD
                                              TechCrunch.com                 πE


                                                  Therefore using (1), we get:
                          πA = πA *0.6 + πB*0.42 + πC*0.65 + πD*0.54 + πE*0.64 + …*0.59 + …                                   (2)

                          πB = πA *0.3 + πB*0.1 + πC*0.02 + πD*0.22 + πE*0.17 + …*0.31 + …                                    (3)

It is clear from equations (2) and (3) that πA >> πB considering that there are no other webpages on the internet
that are more important (i.e. have higher probability rank) than the pages described in the above table.
Therefore, we conclude that KundanBhaduri.com will have a higher PageRank than TechBend.com for the search
term ‘TechBend blog’




IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                                              Page | 11
QAB Term 1 Project: Markov Chains and Google Inc.

Exhibit 11: Computing a small Eigen value with Power Method


We know that: π = π*P
For a hypothetical π of the order 20X20, notice that most of the nodes are zero. This considerably reduces
the total cost of computing the π*P value, since sum of all the zero valued π row/column values will be zero.



                     1    0   0   0   0   0   0 0 0 0 0 0 0 0 0 0 0 0 0 0
                     0    0   4   0   0   0   4 0 0 4 0 0 9 0 0 7 0 0 0 1
                     0    9   0   0   6   0   0 12 0 8 0 0 8 0 0 5 0 0 2 0
                     8    0   7   0   0   8   0 0 4 0 2 0 2 0 5 0 0 6 0 0
                     0    0   0   0   0   0   0 0 0 0 0 0 0 0 0 0 5 0 5 3
                     0    0   0   0   0   0   0 0 0 0 0 0 0 7 0 4 0 0 0 0
                     0    0   0   0   0   0   0 0 0 0 0 3 0 0 0 0 0 0 0 0
                     0    0   0   0   0   0   0 0 0 4 0 0 0 0 0 0 0 0 0 0
                     0    0   0   0   0   0   0 5 0 0 0 0 0 0 0 0 0 0 0 0
                     0    0   0   6   0   0   0 0 6 0 0 0 0 0 0 0 0 0 0 0
                     0    0   5   0   0   8   0 0 0 0 0 0 0 0 0 0 0 0 0 0
                     0    0   0   0   0   0   0 0 0 0 0 0 0 6 0 0 0 0 0 0
                     0    0   0   0   0   0   8 0 8 0 0 0 0 0 0 0 0 0 0 0
                     0    0   0   0   0   0   0 0 0 0 0 0 0 0 0 0 0 0 7 0
                     0    0   0   0   0   0   0 0 0 2 0 0 0 0 0 0 0 0 0 0
                     0    0   0   0   0   0   0 0 5 0 7 8 0 6 0 6 0 8 1 0
                     0    0   0   0   0   0   0 8 0 0 0 0 9 0 0 0 2 0 1 0
                     0    0   0   0   0   8   0 0 0 0 0 0 0 0 7 0 0 0 0 0
                     0    0   0   0   0   0   5 0 0 0 0 0 0 0 0 0 0 0 0 0
                     0    0   0   0   0   0   0 0 0 5 5 0 0 0 0 0 8 0 0 7




           Therefore πA = ∑           ∗   for value of j = 1 and k belongs to a value between πA to πB

Since most of the values of the above terms are zero, we only need to count for rows 1 and 4 from the table
                                   above. Therefore, πA = 1 * πA + 8 * πD

                  This helps us solve a large Markov transition probability matrix in a trivial way.




IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble                                       Page | 12

Weitere ähnliche Inhalte

Was ist angesagt? (18)

Pagerank
Pagerank Pagerank
Pagerank
 
J046045558
J046045558J046045558
J046045558
 
Pr
PrPr
Pr
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
 
Ranking Web Pages
Ranking Web PagesRanking Web Pages
Ranking Web Pages
 
Pagerank
PagerankPagerank
Pagerank
 
Page rank
Page rankPage rank
Page rank
 
Extracting Resources that Help Tell Events' Stories
Extracting Resources that Help Tell Events' StoriesExtracting Resources that Help Tell Events' Stories
Extracting Resources that Help Tell Events' Stories
 
Link analysis .. Data Mining
Link analysis .. Data MiningLink analysis .. Data Mining
Link analysis .. Data Mining
 
JPJ1423 Keyword Query Routing
JPJ1423   Keyword Query RoutingJPJ1423   Keyword Query Routing
JPJ1423 Keyword Query Routing
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
Pagerank
PagerankPagerank
Pagerank
 
Pagerank
PagerankPagerank
Pagerank
 
Pagerank
PagerankPagerank
Pagerank
 
Pagerank
PagerankPagerank
Pagerank
 
Pagerank
PagerankPagerank
Pagerank
 
Pagerank
PagerankPagerank
Pagerank
 

Andere mochten auch

Google Page Rank Algorithm
Google Page Rank AlgorithmGoogle Page Rank Algorithm
Google Page Rank AlgorithmOmkar Dash
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explainedjdhaar
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data miningMai Mustafa
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithmsAnkit Raj
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EEPing Yeh
 
Try It The Google Way .
Try It The Google Way .Try It The Google Way .
Try It The Google Way .abhinavbom
 
Clinical Cases from Resource Limited Settings: David Roesel
Clinical Cases from Resource Limited Settings: David RoeselClinical Cases from Resource Limited Settings: David Roesel
Clinical Cases from Resource Limited Settings: David RoeselUWGlobalHealth
 
How Google Search Engine Algorithm Works ??
How Google Search Engine Algorithm Works ??How Google Search Engine Algorithm Works ??
How Google Search Engine Algorithm Works ??Viral Shah
 
PageRank and Related Methods
PageRank and Related MethodsPageRank and Related Methods
PageRank and Related MethodsJohn Breslin
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceFarzan Hajian
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Understanding search engine algorithms
Understanding search engine algorithmsUnderstanding search engine algorithms
Understanding search engine algorithmsVijay Sankar
 

Andere mochten auch (20)

Google Page Rank Algorithm
Google Page Rank AlgorithmGoogle Page Rank Algorithm
Google Page Rank Algorithm
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explained
 
Google PageRank
Google PageRankGoogle PageRank
Google PageRank
 
PageRank Algorithm In data mining
PageRank Algorithm In data miningPageRank Algorithm In data mining
PageRank Algorithm In data mining
 
PageRank
PageRankPageRank
PageRank
 
Pagerank and hits
Pagerank and hitsPagerank and hits
Pagerank and hits
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
Page rank talk at NTU-EE
Page rank talk at NTU-EEPage rank talk at NTU-EE
Page rank talk at NTU-EE
 
Try It The Google Way .
Try It The Google Way .Try It The Google Way .
Try It The Google Way .
 
Chap14_Ecom
Chap14_EcomChap14_Ecom
Chap14_Ecom
 
Clinical Cases from Resource Limited Settings: David Roesel
Clinical Cases from Resource Limited Settings: David RoeselClinical Cases from Resource Limited Settings: David Roesel
Clinical Cases from Resource Limited Settings: David Roesel
 
How Google Search Engine Algorithm Works ??
How Google Search Engine Algorithm Works ??How Google Search Engine Algorithm Works ??
How Google Search Engine Algorithm Works ??
 
Google algorithim’s
Google  algorithim’sGoogle  algorithim’s
Google algorithim’s
 
Samana m
Samana mSamana m
Samana m
 
PageRank and Related Methods
PageRank and Related MethodsPageRank and Related Methods
PageRank and Related Methods
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Link Analysis (RBY)
Link Analysis (RBY)Link Analysis (RBY)
Link Analysis (RBY)
 
Link analysis
Link analysisLink analysis
Link analysis
 
Understanding search engine algorithms
Understanding search engine algorithmsUnderstanding search engine algorithms
Understanding search engine algorithms
 

Ähnlich wie The Google Pagerank algorithm - How does it work?

Deeper Inside PageRank (NOTES)
Deeper Inside PageRank (NOTES)Deeper Inside PageRank (NOTES)
Deeper Inside PageRank (NOTES)Subhajit Sahu
 
Done reread deeperinsidepagerank
Done reread deeperinsidepagerankDone reread deeperinsidepagerank
Done reread deeperinsidepagerankJames Arnold
 
Evaluation of models for predicting user’s next request in web usage mining
Evaluation of models for predicting user’s next request in web usage miningEvaluation of models for predicting user’s next request in web usage mining
Evaluation of models for predicting user’s next request in web usage miningIJCI JOURNAL
 
Random web surfer pagerank algorithm
Random web surfer pagerank algorithmRandom web surfer pagerank algorithm
Random web surfer pagerank algorithmalexandrelevada
 
Sub-Graph Finding Information over Nebula Networks
Sub-Graph Finding Information over Nebula NetworksSub-Graph Finding Information over Nebula Networks
Sub-Graph Finding Information over Nebula Networksijceronline
 
Incremental Page Rank Computation on Evolving Graphs : NOTES
Incremental Page Rank Computation on Evolving Graphs : NOTESIncremental Page Rank Computation on Evolving Graphs : NOTES
Incremental Page Rank Computation on Evolving Graphs : NOTESSubhajit Sahu
 
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...Waqas Tariq
 
Improvement of a method based on hidden markov model for clustering web users
Improvement of a method based on hidden markov model for clustering web usersImprovement of a method based on hidden markov model for clustering web users
Improvement of a method based on hidden markov model for clustering web userscsandit
 
I/O-Efficient Techniques for Computing Pagerank : NOTES
I/O-Efficient Techniques for Computing Pagerank : NOTESI/O-Efficient Techniques for Computing Pagerank : NOTES
I/O-Efficient Techniques for Computing Pagerank : NOTESSubhajit Sahu
 
Prediction Model Using Web Usage Mining Techniques
Prediction Model Using Web Usage Mining TechniquesPrediction Model Using Web Usage Mining Techniques
Prediction Model Using Web Usage Mining TechniquesEditor IJCATR
 
Integrating vague association mining with markov model
Integrating vague association mining with markov modelIntegrating vague association mining with markov model
Integrating vague association mining with markov modelijsc
 
Integrating Vague Association Mining with Markov Model
Integrating Vague Association Mining with Markov Model  Integrating Vague Association Mining with Markov Model
Integrating Vague Association Mining with Markov Model ijsc
 
Google in Quantum Network
Google in Quantum NetworkGoogle in Quantum Network
Google in Quantum NetworkMesurex
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfrayyverma
 
A vague improved markov model approach for web page prediction
A vague improved markov model approach for web page predictionA vague improved markov model approach for web page prediction
A vague improved markov model approach for web page predictionIJCSES Journal
 
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...IOSR Journals
 
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANKFINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANKcscpconf
 
R sk nn- knn search on road networks by incorporating social influence
R sk nn- knn search on road networks by incorporating social influenceR sk nn- knn search on road networks by incorporating social influence
R sk nn- knn search on road networks by incorporating social influencefinalsemprojects
 

Ähnlich wie The Google Pagerank algorithm - How does it work? (20)

Deeper Inside PageRank (NOTES)
Deeper Inside PageRank (NOTES)Deeper Inside PageRank (NOTES)
Deeper Inside PageRank (NOTES)
 
Done reread deeperinsidepagerank
Done reread deeperinsidepagerankDone reread deeperinsidepagerank
Done reread deeperinsidepagerank
 
Evaluation of models for predicting user’s next request in web usage mining
Evaluation of models for predicting user’s next request in web usage miningEvaluation of models for predicting user’s next request in web usage mining
Evaluation of models for predicting user’s next request in web usage mining
 
Random web surfer pagerank algorithm
Random web surfer pagerank algorithmRandom web surfer pagerank algorithm
Random web surfer pagerank algorithm
 
Sub-Graph Finding Information over Nebula Networks
Sub-Graph Finding Information over Nebula NetworksSub-Graph Finding Information over Nebula Networks
Sub-Graph Finding Information over Nebula Networks
 
Incremental Page Rank Computation on Evolving Graphs : NOTES
Incremental Page Rank Computation on Evolving Graphs : NOTESIncremental Page Rank Computation on Evolving Graphs : NOTES
Incremental Page Rank Computation on Evolving Graphs : NOTES
 
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
 
Improvement of a method based on hidden markov model for clustering web users
Improvement of a method based on hidden markov model for clustering web usersImprovement of a method based on hidden markov model for clustering web users
Improvement of a method based on hidden markov model for clustering web users
 
I/O-Efficient Techniques for Computing Pagerank : NOTES
I/O-Efficient Techniques for Computing Pagerank : NOTESI/O-Efficient Techniques for Computing Pagerank : NOTES
I/O-Efficient Techniques for Computing Pagerank : NOTES
 
Prediction Model Using Web Usage Mining Techniques
Prediction Model Using Web Usage Mining TechniquesPrediction Model Using Web Usage Mining Techniques
Prediction Model Using Web Usage Mining Techniques
 
I05745368
I05745368I05745368
I05745368
 
Integrating vague association mining with markov model
Integrating vague association mining with markov modelIntegrating vague association mining with markov model
Integrating vague association mining with markov model
 
Integrating Vague Association Mining with Markov Model
Integrating Vague Association Mining with Markov Model  Integrating Vague Association Mining with Markov Model
Integrating Vague Association Mining with Markov Model
 
Google in Quantum Network
Google in Quantum NetworkGoogle in Quantum Network
Google in Quantum Network
 
Markov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdfMarkov chains and page rankGraphs.pdf
Markov chains and page rankGraphs.pdf
 
I04015559
I04015559I04015559
I04015559
 
A vague improved markov model approach for web page prediction
A vague improved markov model approach for web page predictionA vague improved markov model approach for web page prediction
A vague improved markov model approach for web page prediction
 
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
Analysis of Rayleigh Quotient in Extrapolation Method to Accelerate the Compu...
 
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANKFINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
 
R sk nn- knn search on road networks by incorporating social influence
R sk nn- knn search on road networks by incorporating social influenceR sk nn- knn search on road networks by incorporating social influence
R sk nn- knn search on road networks by incorporating social influence
 

Mehr von Kundan Bhaduri

The art and science of giving Feedback
The art and science of giving FeedbackThe art and science of giving Feedback
The art and science of giving FeedbackKundan Bhaduri
 
Developing a marketing strategy for Kellogg's All Bran range of corn flakes
Developing a marketing strategy for Kellogg's All Bran range of corn flakesDeveloping a marketing strategy for Kellogg's All Bran range of corn flakes
Developing a marketing strategy for Kellogg's All Bran range of corn flakesKundan Bhaduri
 
Developing a winning Human Resources Strategy for Unibail-Rodamco
Developing a winning Human Resources Strategy for Unibail-RodamcoDeveloping a winning Human Resources Strategy for Unibail-Rodamco
Developing a winning Human Resources Strategy for Unibail-RodamcoKundan Bhaduri
 
Human Resources management in M&A - A media study
Human Resources management in M&A - A media studyHuman Resources management in M&A - A media study
Human Resources management in M&A - A media studyKundan Bhaduri
 
Developing an evolutionary control system for Project Management
Developing an evolutionary control system for Project ManagementDeveloping an evolutionary control system for Project Management
Developing an evolutionary control system for Project ManagementKundan Bhaduri
 
Implementing Strategy at Posadas Amazonas
Implementing Strategy at Posadas AmazonasImplementing Strategy at Posadas Amazonas
Implementing Strategy at Posadas AmazonasKundan Bhaduri
 

Mehr von Kundan Bhaduri (6)

The art and science of giving Feedback
The art and science of giving FeedbackThe art and science of giving Feedback
The art and science of giving Feedback
 
Developing a marketing strategy for Kellogg's All Bran range of corn flakes
Developing a marketing strategy for Kellogg's All Bran range of corn flakesDeveloping a marketing strategy for Kellogg's All Bran range of corn flakes
Developing a marketing strategy for Kellogg's All Bran range of corn flakes
 
Developing a winning Human Resources Strategy for Unibail-Rodamco
Developing a winning Human Resources Strategy for Unibail-RodamcoDeveloping a winning Human Resources Strategy for Unibail-Rodamco
Developing a winning Human Resources Strategy for Unibail-Rodamco
 
Human Resources management in M&A - A media study
Human Resources management in M&A - A media studyHuman Resources management in M&A - A media study
Human Resources management in M&A - A media study
 
Developing an evolutionary control system for Project Management
Developing an evolutionary control system for Project ManagementDeveloping an evolutionary control system for Project Management
Developing an evolutionary control system for Project Management
 
Implementing Strategy at Posadas Amazonas
Implementing Strategy at Posadas AmazonasImplementing Strategy at Posadas Amazonas
Implementing Strategy at Posadas Amazonas
 

Kürzlich hochgeladen

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 

Kürzlich hochgeladen (20)

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 

The Google Pagerank algorithm - How does it work?

  • 1. QAB Term 1 Markov Chains and Google Inc. GUSTAVO ARGUELLO KUNDAN BHADURI VERITY NOBLE IMBA NOV 2010 N1 IE BUSINESS SCHOOL MARIA DE MOLINA 11 MADRID 28002 SPAIN
  • 2. QAB Term 1 Project: Markov Chains and Google Inc. Table of Contents Implementing Markov Chains with Google PageRank ......................................................................................................... 2 Issues to be addressed ......................................................................................................................................................... 3 Techniques that may be used to overcome the problem of solving such a large system ................................................... 4 Exhibit 1: A sample 4-state Markov chain with transition probabilities .............................................................................. 6 Exhibit 2: Sample 4X4 transition Matrix ............................................................................................................................... 6 Exhibit 3: Explaining the basis of Markov’s chain ................................................................................................................ 6 Exhibit 4: Demonstrating the stable state values using simple matrix multiplication ......................................................... 7 Exhibit 5: Calculating the steady state eigen values πA and πE ............................................................................................ 8 Exhibit 6: The improved Google PageRank algorithm.......................................................................................................... 8 Exhibit 7: PageRank of the search string ‘Techbend blog’ ................................................................................................... 9 Exhibit 8: The correlation between a webpage and the rest of the web ............................................................................ 9 Exhibit 9: KundanBhaduri.com and its links to other sites................................................................................................. 10 Exhibit 10: Applying Markov Chain method to calculate the PageRank for ‘TechBend blog’ ........................................... 11 Exhibit 11: Computing a small Eigen value with Power Method ....................................................................................... 12 IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 1
  • 3. QAB Term 1 Project: Markov Chains and Google Inc. Implementing Markov Chains with Google PageRank In its most basic form, a homogeneous Markov chain (Exhibit 1) simply refers to a series of events/actions that follow one another and that are independent of each other, while the transition from one state to another is memory-less. More scientifically, a Markov chain is a collection of random variables {Xt} which holds the property that given the current state, the future is conditionally independent of the past.1 The collection of these variables is shown in a square matrix which is known as the Transition Matrix. Therefore, we can classify a problem to be solvable by the theory of Markov chains if it bears the following characteristics: a) At any point in time, any of the objects should be in one and exactly one defined state. At the end of the period, the object can move to a new state or remain in its original state 2. b) The objects move between states based on the transition probabilities (Exhibit 2) that depend on only the current state. The sum of all probabilities of moving to all possible states should be one. c) The transition probabilities (of going from A to B) remain constant over time. In order to develop an understanding of how to solve the Markov chain, assume that the simple 2-state chain in Exhibit 2 describes a simple website. A user typically clicks a link on the homepage (E) for 70% of the time that leads her to page (A), while the remaining 30% of the time, the user clicks a link that keeps her on the same page (E). Similarly, once the user is on page (A), 40% of the times, the user clicks another link back to (E) and the remaining 60% of the time the user clicks a link that keeps her on the same page (E). The Markov chain can help us find the probabilities of a random user being present on any page after X number of iterations of this chain. The website administrator might want to use this information in order to decide as to which page to focus on for maximising his ad revenue. Please note that Google’s implementation of the Markov Chain is that of a Non-Absrobing Markov Chain. In order to solve this problem, we start by using the tree method of calculating 2nd level probability Pij (2) i.e. the probability of going from any node i to j in the 2nd iteration, where i, j belong to E or A as given in Exhibit 4. Here we observe that the probability of landing on the page A are now 63% and 64% respectively if the user was at E and A respectively at the end of the first iteration. Following this method, if we continue working for up to 7 iterations, we will realize that the probability values have reached a steady state and do not change anymore. In order to find the steady state probability values of both the webpages, we use the steady state equation of π = π*P and solve as shown in Exhibit 5. This establishes the Eigen values of πA and πE as 0.63 and 0.37 respectively. Therefore, we can recommend that it is wiser to spend advertising effort on the page A since in the long run it is twice as likely to attract clicks as page E. As we progress towards looking at how Google ranks pages according to their relevance, it will be interesting to note that their Eigen values play a significant part. Markov chains have significant use in industrial research, organization behaviour, financial markets analysis, human resource planning, marketing forecast etc. A very interesting use of Markov’s chain has been in the music industry. As early as in the 1950s, music composers used the Markov Chain to study the pattern of notes in popular songs3 and thereby create new music sequences based on the studied musical notes. The example of linked webpages that we discussed above can now be extrapolated to calculate the probability of arriving at any webpage for a certain search criteria, if the entire World Wide Web is considered as a large connected, memoryless chain. Based on the relevance criterion, we can estimate the highest relevance factor, and therefore any page’s utility rank for a search string. This is the rationale behind Google’s patented PageRank algorithm. 1 Weisstein, Eric W. "Markov Chain." From MathWorld - A Wolfram Web Resource. http://mathworld.wolfram.com/MarkovChain.html 2 Tamara Lynn Anthony, Rice University: Markov Chains 3 Verbeurgt Karsten, Dinolfo Michael, Fayer Mikhail: Extracting Patterns in Music for Composition via Markov Chains IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 2
  • 4. QAB Term 1 Project: Markov Chains and Google Inc. Google’s PageRank algorithm4 is a stochastic algorithm that determines the significance of a page relative to a search string. This is not the only factor that Google adopts to rank pages, but it is an important one. For Google (or for a web administrator), the PageRank of a page denotes the real probability of a random web surfer reaching that page after clicking on many links. The PageRanks form a probability distribution over web pages, explaining why the sum of PageRank of all pages is 1. Refer to Exhibit 6 for a mathematical representation of the PageRank algorithm. Essentially, the Google PageRank method will rank those pages higher (i.e. more important) that have links to other higher ranked or more important pages. Let us explain the algorithm with a real-life example: One of the co-authors of this report is an active Technology blogger and writes a blog called “The TechBend” at www.KundanBhaduri.com. Exhibit 7 shows that the Google PageRank of the search string “Techbend blog” is highest for www.KundanBhaduri.com and it thus appears on top of Google’s search results. Interestingly, while there are other professional sites and blogs with domain names such as www.TechBend.com etc, yet they do not figure anywhere close to the top of the search results on Google. Let us explore how this was achieved using the application of Markov Chain. Holistically, the internet as we know is a connected graph of interlinked webpages (Exhibit 8). Therefore, it will have an exhaustively large transition probability matrix. One look at Exhibit 9 tells us that for the homepage of The Techbend to rank high on Google’s PageRank, its Eigen value has to be higher than all other competing webpages that have the same context. More specifically, Eigen values on connections to those nodes (webpages) in the matrix have to be high which themselves have high Eigen values with other connections. In other words, the probability of reaching our target page will be high when coming from another high-probability page. We tested this logic with Exhibits 3 and 5 where we saw that A achieved a higher Eigen value because it was more probable to arrive at A from E or to remain on A itself. This logic is at the core of Google’s PageRank. In our example, www.KundanBhaduri.com does achieve a higher PageRank by linking itself with other highly prominent websites such as Techcrunch, Engadget and TED. Since these sites enjoy a higher PageRank, by linking themselves back to The Techbend Blog, the overall probability of a random surfer arriving at www.KundanBhaduri.com is higher than it is for www.TechBend.com. This is explained by a higher Eigen Value (Exhibit 10) and therefore a higher PageRank for The Techbend. An important factor that needs to be emphasized here is that it is not just about the number of links that a webpage exchanges with another but its relative importance in the universe of all such links. Issues to be addressed However, since the internet is an exhaustively large set of nodes (over 1 trillion)5, there are some issues that need to be addressed to make the Markov Chain model functional for Google PageRank. Firstly, the calculation of the Eigen Vector for such a large (and growing) matrix is non-trivial. We will address this issue in the second part of the report. Other than that, the issues related to handling dangling nodes (i.e. dead pages) and calculating an appropriate damping factor are significant. The damping factor refers to the probability that the random user will not abruptly end the session (by either exiting the browser or typing a new URL). In order to avoid a situation of creating an absorbing Markov chain, pages with no outbound links are assumed to link out to all other pages in the collection. Their PageRank scores are therefore divided evenly amongst all other pages. Calculating the preliminary transition matrix of the web is also a significant challenge given the massive size of the worldwide web. Therefore, a workaround to this problem is by ‘guessing’ the transition matrix and then progressively correcting the value. Since Google recalculates the PageRanks every time it crawls through the web, its approximation decreases with each iteration. 4 Hwai-Hui Fu , Dennis K. J. Lin and Hsien-Tang Tsai (Dept. of Bus. Administration, Shu-Te University): Applied Stochastic models in Business and Industry 5 Alpert, Jesse, and Nissan Hajaj. "We Knew the Web Was Big..." Official Google Blog. 25 July 2008. Web. 06 Feb. 2011. IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 3
  • 5. QAB Term 1 Project: Markov Chains and Google Inc. Techniques that may be used to overcome the problem of solving such a large system: Now that we understand how Google was able to apply a form of Markov Chain modelling to create their PageRank system, we will address one of the most significant problems they faced, solving the system π = π P. Solving this equation in a small matrix we can quickly find exact solutions. When the web was much smaller, Google could compute the steady state vector of 26 million pages in about 2 hours6. The resulting computation would then be used for a fixed period of time. However, because of the sheer size of the World Wide Web, which Google asserts the number of websites is now over the 1 trillion mark7, the resulting stochastic matrix will now contain over a trillion rows and columns. Additionally, given the dynamics of Web 2.0, it would no longer be efficient for Google to use the stale data from these computations for a fixed time interval. “Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day”8. In sum, the ever changing, and ever expanding nature of the World Wide Web and its content, coupled with the search engine’s commitment to provide the best information available, only serves to multiply exponentially the problem of solving the aforementioned system. If you think about it, the resulting matrix of the web, with it’s over a trillion columns and rows, is going to be composed mostly of zeroes, given that most webpages link to a very tiny and limited number of additional web pages. In fact, a 2004 study shows that the average number of out-links from a given webpage is just 52, hence only 52 of the remaining trillion elements are non-zero.9 This means that the web matrix is very sparse. In order to solve this problem, one of the main tools that can be used (or a variation thereof that Google appears to have implemented), is called “The Power Method” or “Power Iteration”. This method applied to the Google matrix will converge to the PageRank vector, in other words, it will ultimately help us define the weighting or importance of our webpages relative to the entire matrix. The power method is an iterative process for approximating eigenvalues; we will use this method to find our dominant Eigenvalue and Eigenvector. “Eigenvectors of a square matrix are the non-zero vectors which, after being multiplied by the matrix, remain proportional to the original vector".10 In order to implement this method, we must assume that our matrix, which we will now refer to as matrix A, has a dominant eigenvalue with corresponding dominant eigenvectors. The dominant eigenvector of a matrix is an eigenvector corresponding to the eigenvalue of largest magnitude of that matrix. In order to approximate a dominant eigenvector we choose an initial approximation of one of the dominant eigenvectors of A, which we will call π 0. Then we can form the following sequence11: π 1 = A π0 π 2 = A π 1 = A(A π 0) = A2 π 0 π 3 = A π 2 = A(A2 π 0) = A3 π 0 ⁞ π k = A π k-1 = A(Ak-1 π 0) = Ak π 0 For large powers of k, this method provides a good approximation of the dominant eigenvector in matrix A. The method requires successive iterates until some convergence criterion is satisfied. With our dominant eigenvector, we can find our dominant eigenvalue using the Rayleigh quotient, as follows12: 6 Alpert, Jesse, and Nissan Hajaj. "We Knew the Web Was Big..." Official Google Blog. 25 July 2008. Web. 06 Feb. 2011. <http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html>. 7 Ibid. 8 Ibid. 9 Anuj Nanavati, Arindam Chakraborty, David Deangelis, Hasrat Godil, and Thomas D’Silva, An investigation of documents on the World Wide Web, h p://www.iit.edu/˜dsiltho/Inves ga on.pdf, December 2004. 10 "Eigenvalues and Eigenvectors." Wikipedia, the Free Encyclopedia. 27 Sept. 2010. Web. 10 Feb. 2011. http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors. 11 Larson, Ron, David C. Falvo, and Bruce H. Edwards. Elementary Linear Algebra. Boston: Houghton Mifflin, 2004. 550-58. Print. 12 Ibid. IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 4
  • 6. QAB Term 1 Project: Markov Chains and Google Inc. λ= Aπ ∙π ___________________________ π∙π “In cases for which the power method generates a good approximation of a dominant eigenvector, the Rayleigh quotient provides a correspondingly good approximation of the dominant eigenvalue”13. One of the unique features of the Google matrix, as we briefly mentioned before, is that the total number of nonzero elements in a given row is quite small (due to the small number of hyperlinks that a given webpage might contain) (Exhibit 11). Since all our computations involve this sparse matrix multiplied by vectors, an iteration of the power method is considered very cheap14. Another necessary technique Google implemented to make this system solvable was the fix to the dangling node problem. What happens when a user arrives at a webpage that does not link out to another webpage? Does our random surfer become absorbed by this webpage, does he never leave? This is the dangling node problem, for which our Markov Chain could categorize these nodes as absorbing states, unless we do something to correct this situation. Suppose the Google Matrix was called Matrix H. In order to correct for this, we could create a new matrix S = H + dw, where d is a column vector that identifies dangling nodes and assigns either a 1 if the node is dangling or a 0 otherwise, and w is a row vector (w1, w2, …, wn) used to determine where our random surfer will go in order to not become absorbed. One way of assigning value to this row vector is to say that there is equal probability our surfer will land on any of the n webpages that exist, so the row for w would look like this: ( … ). Whilst there are other ways to assign w, this is the most common, and is sufficient for our purposes. Another important technique that may be used by Google to help solve the system is the inclusion of a damping factor. The damping factor is added in to account for the possibility that a given web surfer may at any time choose not to follow the links on a given webpage that are available to him and type in any URL in order to go to a page that is out of the current chain. In fact, Brin and Page reference the damping factor in their original paper on Google (submitted while at Stanford), “The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85”15. While the damping factor is intended to model the behaviour of a random web surfer, it also serves the additional purpose of speeding up convergence of the power method. This is because the ratio of the two eigenvalues largest in magnitude of the matrix determine how quickly the method converges16. It has been proven that the second largest eigenvalue of the Google matrix is less than or equal to the damping factor used17. The power method converges quickly when the damping factor is less than 1. According to Rebecca Wills, only 29 iterations are required for the difference between iterates to become less than 10-2 when using a damping factor of 0.85, the number of iterations goes up to 44 when the damping factor is raised to 0.9018. Hence, the damping factor increases/speeds the solvability of this complex system by reducing the iterations necessary to assign PageRank vectors. While Google’s problem of solving this enormous system is certainly no easy task, especially not at the speed that they might require. They have been able to overcome these significant obstacles through the unique application of certain existing mathematical algorithms. 13 Larson, Ron, David C. Falvo, and Bruce H. Edwards. Elementary Linear Algebra. Boston: Houghton Mifflin, 2004. 550-58. Print. 14 Wills, Rebecca. “Google’s PageRank: The Math Behind the Search Engine.” The Mathematical Intelligencer 28.4 (Fall 2006): 6-11. 15 Brin, S., and Page L.. "The Anatomy of a Large-scale Hypertextual Web Search Engine." Computer Networks and ISDN Systems 30.1-7 (1998): 107-17. Print. 16 Gene H. Golub and Charles F. Van Loan, Matrix computations, 3rd ed., The Johns Hopkins University Press, 1996. 17 Taher H. Haveliwala and Sepandar D. Kamvar, The second eigenvalue of the Google matrix, Tech. report, Stanford University, 2003. 18 Wills, Rebecca. “Google’s PageRank: The Math Behind the Search Engine.” The Mathematical Intelligencer 28.4 (Fall 2006): 6-11. IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 5
  • 7. QAB Term 1 Project: Markov Chains and Google Inc. Exhibit 1: A sample 4-state Markov chain with transition probabilities P11 1 P12 2 P23 P24 P41 3 4 P34 Exhibit 2: Sample 4X4 transition Matrix Exhibit 3: Explaining the basis of Markov’s chain19 19 Image taken from http://en.wikipedia.org/wiki/Markov_chain IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 6
  • 8. QAB Term 1 Project: Markov Chains and Google Inc. Exhibit 4: Demonstrating the stable state values using simple matrix multiplication 0.3 0.7 0.4 0.6 P= Pij (2) = |P|2ij 0.3 0.7 0.3 0.7 0.37 0.63 0.4 0.6 0.4 0.6 0.36 0.64 * = P3 0.363 0.637 0.364 0.636 P4 0.3637 0.6363 0.3636 0.6364 P5 0.36363 0.63637 0.36364 0.63636 P6 0.363637 0.636363 0.363636 0.636364 P7 0.363636 0.636364 S 0.363636 0.636364 t a b P8 0.363636 0.636364 l 0.363636 0.636364 e P9 0.363636 0.636364 S 0.363636 0.636364 t a P10 0.363636 0.636364 t e 0.363636 0.636364 V P11 0.363636 0.636364 a 0.363636 0.636364 l u P12 0.363636 0.636364 e 0.363636 0.636364 s IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 7
  • 9. QAB Term 1 Project: Markov Chains and Google Inc. Exhibit 5: Calculating the steady state eigen values πA and πE π = π*P 0.3 0.7 Therefore, π π = π π * 0.4 0.6 Solving these two equations: 1. π = 0.3*π +0.4*π 2. π = 0.7*π +0.6*π Also, we know that: 3. πE + πA = 1 Since equations 1 & 2 are similar, solving equations 2 and 3 together: π = 0.7*(1 − π ) +0.6*π Or, = 0.63 And, = 0.37 Exhibit 6: The improved Google PageRank algorithm 1 ( ) ( ) ( ) PR(A) = 1 − ∗ + ∗ + + ⋯ + ∑ ( ) ∑ ( ) ( ) ( ) ( ) Where: • PR(A) is the PageRank of page A • PR(Ti) is the PageRank of pages Ti that link to page A • C(Ti) is the number of outbound links on page Ti • n is the total number of all pages that link to page A • N is the total number of all pages on the web. It is noteworthy that there is an adjusting damping factor involved in the calculation. The above equation represents the final version of the PageRank algorithm with the damping factor being incorporated within the first argument on the RHS of the equation. IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 8
  • 10. QAB Term 1 Project: Markov Chains and Google Inc. Exhibit 7: PageRank of the search string ‘Techbend blog’ Exhibit 8: The correlation between a webpage and the rest of the web20 The importance of these links determines the overall importance of your webpage to the PageRank algorithm 20 Laure Ninove, Cristobald de Kerchove , Paul Van Dooren: Université Catholique de Louvain http://www.esat.kuleuven.be/scd/golub/presentations/Gene_PVD.pdf IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 9
  • 11. QAB Term 1 Project: Markov Chains and Google Inc. Exhibit 9: KundanBhaduri.com and its links to other sites TechCrunch Very high PageRank Rest of Internet Engadget Very high PageRank The homepage of KundanBhaduri.com hosts the blog The TechBend TED Very high PageRank IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 10
  • 12. QAB Term 1 Project: Markov Chains and Google Inc. Exhibit 10: Applying Markov Chain method to calculate the PageRank for ‘TechBend blog’ Following is the probability matrix that shows the likelihood of a user clicking on a page to arrive at the homepage of another website when she is searching for the string “TechBend blog”. All site names here refer to their respective homepages, for the purpose of Markov chain analysis. m et com .co er n m com .co uri m I nt ch. t. ad o nd ge D.c n the … Bh Be Cru gad TE an ch of ch En nd Te st Te Re Ku KundanBhaduri.com 0.6 0.3 0.01 0.03 0.01 … … TechBend.com 0.42 0.1 0.12 0.01 0.11 … … Engadget.com 0.65 0.02 0.1 0.21 0.01 … … TED.com 0.54 0.22 0.1 0 0.09 … … TechCrunch.com 0.64 0.17 0.13 0.01 0 … … … 0.59 0.31 0.02 0.04 0.01 … … Rest of the Internet … … … … … … … Transition Probabilities of KundanBhaduri.com and TechBend.com For the Stable-state matrix π = π*P (1) We assume: Webpage Eigen Value KundanBhaduri.com πA TechBend.com πB Engadget.com πC TED.com πD TechCrunch.com πE Therefore using (1), we get: πA = πA *0.6 + πB*0.42 + πC*0.65 + πD*0.54 + πE*0.64 + …*0.59 + … (2) πB = πA *0.3 + πB*0.1 + πC*0.02 + πD*0.22 + πE*0.17 + …*0.31 + … (3) It is clear from equations (2) and (3) that πA >> πB considering that there are no other webpages on the internet that are more important (i.e. have higher probability rank) than the pages described in the above table. Therefore, we conclude that KundanBhaduri.com will have a higher PageRank than TechBend.com for the search term ‘TechBend blog’ IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 11
  • 13. QAB Term 1 Project: Markov Chains and Google Inc. Exhibit 11: Computing a small Eigen value with Power Method We know that: π = π*P For a hypothetical π of the order 20X20, notice that most of the nodes are zero. This considerably reduces the total cost of computing the π*P value, since sum of all the zero valued π row/column values will be zero. 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 4 0 0 4 0 0 9 0 0 7 0 0 0 1 0 9 0 0 6 0 0 12 0 8 0 0 8 0 0 5 0 0 2 0 8 0 7 0 0 8 0 0 4 0 2 0 2 0 5 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 8 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 7 8 0 6 0 6 0 8 1 0 0 0 0 0 0 0 0 8 0 0 0 0 9 0 0 0 2 0 1 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 0 0 0 0 8 0 0 7 Therefore πA = ∑ ∗ for value of j = 1 and k belongs to a value between πA to πB Since most of the values of the above terms are zero, we only need to count for rows 1 and 4 from the table above. Therefore, πA = 1 * πA + 8 * πD This helps us solve a large Markov transition probability matrix in a trivial way. IMBA NOV 2010 N1: Gustavo Arguello | Kundan Bhaduri | Verity Noble Page | 12