SlideShare a Scribd company logo
1 of 26
PAGERANK
Made by: Mohammad Islam Ansari, Hritik and Navneet
ACKNOWLEDGEMENT
 We would like to give our special thanks to our mathematics teachers who
taught us the necessary knowledge to make this project a success. We are
thankful to the Nit Hamirpur Academic block for providing such an interesting
courses in our semester. This course helped to broaden our horizon and
enabled us to understand the importance for practical work/approach.
 This project is basically blend of the academic knowldege for our course and
practical knowledge.
 We have greatly benefited from this course and its knowledge will surely help
us in the future.
What is PageRank?
 PageRank is an algorithm used by Google search to rank
web pages in their search result. It was named after
Lary Page one of the founder of Google. PageRank
works by counting the number of links to a page to
determine a rough estimate of how important the
website is. Currently, PageRank is not only algorithm
used by Google, but it is the first algorithm that was
used by the company and it is best known.
PageRank is a link analysis algorithm and it assigns a
numerical weighting to each element of hyperlinked
set of documents with the purpose of ‘measuring’ its
relative importance within set.
Life before PageRank
Life after PageRank
Algorithm
The PageRank algorithm outputs a probability distribution used to represent that a person randomly clicking
on links will arrive at any particular page. PageRank can be calculated for collections of documents of any
size. It is assumed that the distribution is evenly divided among all documents in the collection at the
beginning of the computational process. The PageRank computations require several passes, called
"iterations", through the collection to adjust approximate PageRank values to more closely reflect the
theoretical true value.
A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a
"50% chance" of something happening. Hence, a document with a PageRank of 0.5 means there is a 50%
chance that a person clicking on a random link will be directed to said document.
Simplified formula of PageRank
Assume a small universe of four web pages: A, B, C, and D. Links from a page to itself are ignored.
Multiple outbound links from one page to another page are treated as a single link. PageRank is initialized
to the same value for all pages i.e. ¼ or 0.25.
Simple diagram of web pages containing A,B,C and D. The arrows indicate
link from that page to other.
Here there are 1 outbound link from A, similarly outbound link from B, 3 from C, and 1 from D. So the
page rank of A will
R(A) = PR(B) + PR(C) + PR(D)
1 3 1
……..equ. 1
Calculating the PageRank via Iterations
• Suppose that we have same web graph as shown
in slide 7.
• At iteration 1 the pagerank of all websites are
equal i.e 1/n.
• At iteration 2 the pagerank (suppose A) depends
upon inbound link from other websites to given
website i.e.
𝑃𝑅 𝐴 =
1/4
3
• Its goes on same for other websites and next
iterations as shown in table.
• These iterations stores the pagerank of the
webpages.
• Here C has highest pagerank.
Limitations of Iterative method
 It is suitable for large no. of websites containing thousands or more websites.
 This method takes time
 Making iterative table is troublesome.
PageRank with matrix representation
We can use matrix operations instead of iterative approach because we
can do multiple operations at same time. There are 3 methods
 Power method
 Steady-state method
 Random surfer method
1. Power method
 Consider that we have webgraph given as shown. According to
graph the Transition matrix formed is H which is shown.
 Now according to Power method
V2 = H × V
V2 = H × V2 = H2 × V
…
Vn = Hn × V
V: is the column matrix that contains initial PageRank of the pages.
H: is the transition matrix based on webgraph.
 We can measure the ε error rate as the vn+1 – vn difference
between the adjacent page rank values. If this ε error is small
enough, then the algorithm terminates.
 i.e. 𝒗 𝒏+𝟏 − 𝒗 𝒏 < 𝜺
2. Steady-state method
 Steady-state method is all about transforming in eigenvector-eigenvalue problem.
The steady-state is when the eigenvalue is equal to 1.
i.e. Hx = x
 H: transition matrix x: final page rank matrix of given pages.
 Here we have to solve an eigenvalue-eigenvector problem where the eigenvalue
is 1.Because when we apply the H matrix, then nothing happens (we end up with
the same v vector). Which means that this is steady-state. What’s essential that
there is no need for initialization (no need for the 1/n values) and no need to
make iterations.
3. Random surfer modal
Assumptions
 Importance of a web page is measured by its popularity i.e. how many
incoming links it has.
 PageRank can be defined by the probability that a random surfer on the
web starts on a random page + follows hyperlinks and visits the given page.
The transition matrix (H) is useful as well. In this method we also have to
multiply v with H until it reaches the stationary state
i.e. Hv = v
Again we will end with same result as we in Steady state method.
Problems
The models we have discussed so far are not perfect, there are some
cases where these models fails completely and do not give correct
results. These cases are
 Dangling nodes (nodes with no outgoing link)
 Disconnected nodes
Dangling nodes (nodes with no
outgoing link)
 The dangling nodes does not have any outgoing links.
 Now if we try to find rank of webpages then
Fig: Dangling Node is 3
 So in this case the rank of every page is 0. Something is not right, as page 3 has 2 incoming links, so it
must have some importance!
 So in case of Dangling node our method fails.
Disconnected nodes
 A random surfer that starts in the first connected component has no way
of getting to web page 5 since the nodes 1 and 2 have no links to node 5
that he can follow. Linear algebra also fails to help as well.
Fig: Disconnected nodes
 In this case also our method discussed so far fails.
Damping Factor – Final PageRank model
 In order to solve the problems as discussed above the concept of damping factor is
introduced in this model. This model also follows the assumptions of random surfer model.
 Damping factor is the probability that random surfer leaves the given page and navigates to
a completely new one.
 So the final PageRank formula is
G = (1-d)H + dB
G: this is the PageRank matrix or Google-matrix.
d: damping factor
H: transition matrix
B: matrix with all entries 1 i.e.
𝟏
𝐧
𝟏 𝟏 𝟏
𝟏 𝟏 𝟏
𝟏 𝟏 𝟏
G = (1-d)H + dB
 If d is high, it means the random surfer navigates to new pages (teleportation) quite often.
If d is low, it means the random surfer has a tendency to follow links instead of B
15%chances that the surfer will leave page and 85%chances that surfer will follow links
given webpage.
 Here M will have same features as H.
Some times with little probability the surfer leaves the
actual page and navigate to another one also called
“teleportation”.
If we have n websites then the probability of surfer to
any page is 1/n. That’s why B has 1/n term.
Most of the time, with (1-d) probability,
the surfer will follow links in the given
page. It will visit one of the neighbors
of the actual page.
Problem
 The M matrix will be enormous with lots of rows and columns so we cannot handle it
properly.
 That’s why we use power method approximation which is
 Instead of initialize v vector with entries 1/n we use values 1 instead. For random
matrix it is not going to be any faster but Google-matrix is sparse. A given node has
small no. of outgoing links so it will work fine.
 In view of everything discussed above, we conclude that:
 Fact: The PageRank vector for a web graph with transition matrix A, and damping
factor p, is the unique probabilistic eigenvector of the matrix M, corresponding to
the eigenvalue 1.
 Perron-Frobeius theorm
If M is a positive (the values are all greater than 0) and column stochastic matrix (so the sum of the
columns are 1) which is true in this case then according to Perron-Frobeiustheorm
 1 is an eigen value of multiplicity one.
 1 is the largest eigen value: all the other eigen values have absolute value smaller than 1.
 the eigenvectors corresponding to the eigen value 1 have either only positive entries or only
negative entries. In particular, for the eigen value 1 there exists a unique eigenvector with the sum
of its entries equal to 1.
Intuitively, the matrix M "connects" the graph and gets rid of the dangling nodes. A node with no
outgoing edges has now probability to move to any other node.
 Power method convergence theorem
It says if we have a matrix M which is positive, column stochastic ( sum of elements of column is 1),
then we can have w which is the eigenvector corresponding to the eigen value 1.
In that case the sequence v, Mv,M2 v …...Mk v converges to w. Here is going to store the pagerank of
all the websites in WWW.
Where v is the initial matrix with all entries equal to 1/n.
Pseudo Code
 BEGIN
 LOOP
 i 0 to N
 LOOP
 j 0 to N
 Initialize S
 IF( S = 0) THEN
 Aixj = 1/S
 ELSE THEN
 Aixj = 0
 END LOOP
 Initialize Val ,Vec, R
 PRINT Vec
 R 4 x 1 = { 1,1,1,1}
 PRINT R
 T 0
HERE N IS NO. OF NODES
A IS EMPTY MATRIX
S IS THE NO. OF
CONNECTION COMING FROM
SIDE
VAL IS EIGEN VALUE
WHICH ARE FOUND
MATHEMATICALLY OR
MANUALLY
VEC IS EIGEN VECTOR
FOUND MANUALLY
R IS A MATRIX OF ORDER 4 X
1
WITH ALL IDENTITY 1
T IS NO. OF TEST CASES
 WHILE
 LOOP
 T<7
 Rank= A x R
 R =Rank
 T=T+1
 END LOOP
 PRINT R
PROGRAM
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Jun 22 22:45:15 2020
@author: team17
"""
import numpy as np
n=int(input("no of nodes "))
a=np.eye(n,n)
for i in range(0,n):
for j in range(0,n):
s=int(input())
if(s!=0):
a[i][j]=1/s
else:
a[i][j]=0
val,vec=np.linalg.eig(a)
print(vec)
r=np.full((4,1),1)
print(r)
t=0
while(t<7):
ra=R
ra=np.dot(a,r)
rank=np.hstack((r,ra))
r=ra
t=t+1
print(ra)
BIBLIOGRAPHY
 Photos taken from (www.google.com)
 Information regarding topic from (www.wikipedia.com),
(globalsoftwaresupport.com)
THE TEAM
 A Mathematics project by GROUP 17
 Haritik Thakur , Navneet Sharma , Mohammad Islam Ansari

More Related Content

Similar to Page rank method

Random web surfer pagerank algorithm
Random web surfer pagerank algorithmRandom web surfer pagerank algorithm
Random web surfer pagerank algorithmalexandrelevada
 
PageRank in Multithreading
PageRank in MultithreadingPageRank in Multithreading
PageRank in MultithreadingShujian Zhang
 
Done reread thecomputationalcomplexityoflinkbuilding
Done reread thecomputationalcomplexityoflinkbuildingDone reread thecomputationalcomplexityoflinkbuilding
Done reread thecomputationalcomplexityoflinkbuildingJames Arnold
 
page rank explication et exemple formule
page rank explication et exemple  formulepage rank explication et exemple  formule
page rank explication et exemple formuleRamiHarrathi1
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)James Arnold
 
Done reread sketchinglandscapesofpagefarmsnpcomplete
Done reread sketchinglandscapesofpagefarmsnpcompleteDone reread sketchinglandscapesofpagefarmsnpcomplete
Done reread sketchinglandscapesofpagefarmsnpcompleteJames Arnold
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)James Arnold
 
Page Rank
Page RankPage Rank
Page RankDiego
 
Page Rank
Page RankPage Rank
Page Rankdiana
 

Similar to Page rank method (20)

Random web surfer pagerank algorithm
Random web surfer pagerank algorithmRandom web surfer pagerank algorithm
Random web surfer pagerank algorithm
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
PageRank in Multithreading
PageRank in MultithreadingPageRank in Multithreading
PageRank in Multithreading
 
Ranking Web Pages
Ranking Web PagesRanking Web Pages
Ranking Web Pages
 
Done reread thecomputationalcomplexityoflinkbuilding
Done reread thecomputationalcomplexityoflinkbuildingDone reread thecomputationalcomplexityoflinkbuilding
Done reread thecomputationalcomplexityoflinkbuilding
 
page rank explication et exemple formule
page rank explication et exemple  formulepage rank explication et exemple  formule
page rank explication et exemple formule
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
Done reread sketchinglandscapesofpagefarmsnpcomplete(3)
 
Done reread sketchinglandscapesofpagefarmsnpcomplete
Done reread sketchinglandscapesofpagefarmsnpcompleteDone reread sketchinglandscapesofpagefarmsnpcomplete
Done reread sketchinglandscapesofpagefarmsnpcomplete
 
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
Done reread sketchinglandscapesofpagefarmsnpcomplete(2)
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
nueva
nuevanueva
nueva
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 
Page Rank
Page RankPage Rank
Page Rank
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 

Page rank method

  • 1. PAGERANK Made by: Mohammad Islam Ansari, Hritik and Navneet
  • 2. ACKNOWLEDGEMENT  We would like to give our special thanks to our mathematics teachers who taught us the necessary knowledge to make this project a success. We are thankful to the Nit Hamirpur Academic block for providing such an interesting courses in our semester. This course helped to broaden our horizon and enabled us to understand the importance for practical work/approach.  This project is basically blend of the academic knowldege for our course and practical knowledge.  We have greatly benefited from this course and its knowledge will surely help us in the future.
  • 3. What is PageRank?  PageRank is an algorithm used by Google search to rank web pages in their search result. It was named after Lary Page one of the founder of Google. PageRank works by counting the number of links to a page to determine a rough estimate of how important the website is. Currently, PageRank is not only algorithm used by Google, but it is the first algorithm that was used by the company and it is best known. PageRank is a link analysis algorithm and it assigns a numerical weighting to each element of hyperlinked set of documents with the purpose of ‘measuring’ its relative importance within set.
  • 6. Algorithm The PageRank algorithm outputs a probability distribution used to represent that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called "iterations", through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value. A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a "50% chance" of something happening. Hence, a document with a PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to said document.
  • 7. Simplified formula of PageRank Assume a small universe of four web pages: A, B, C, and D. Links from a page to itself are ignored. Multiple outbound links from one page to another page are treated as a single link. PageRank is initialized to the same value for all pages i.e. ¼ or 0.25. Simple diagram of web pages containing A,B,C and D. The arrows indicate link from that page to other. Here there are 1 outbound link from A, similarly outbound link from B, 3 from C, and 1 from D. So the page rank of A will R(A) = PR(B) + PR(C) + PR(D) 1 3 1 ……..equ. 1
  • 8. Calculating the PageRank via Iterations • Suppose that we have same web graph as shown in slide 7. • At iteration 1 the pagerank of all websites are equal i.e 1/n. • At iteration 2 the pagerank (suppose A) depends upon inbound link from other websites to given website i.e. 𝑃𝑅 𝐴 = 1/4 3 • Its goes on same for other websites and next iterations as shown in table. • These iterations stores the pagerank of the webpages. • Here C has highest pagerank.
  • 9. Limitations of Iterative method  It is suitable for large no. of websites containing thousands or more websites.  This method takes time  Making iterative table is troublesome.
  • 10. PageRank with matrix representation We can use matrix operations instead of iterative approach because we can do multiple operations at same time. There are 3 methods  Power method  Steady-state method  Random surfer method
  • 11. 1. Power method  Consider that we have webgraph given as shown. According to graph the Transition matrix formed is H which is shown.  Now according to Power method V2 = H × V V2 = H × V2 = H2 × V … Vn = Hn × V V: is the column matrix that contains initial PageRank of the pages. H: is the transition matrix based on webgraph.  We can measure the ε error rate as the vn+1 – vn difference between the adjacent page rank values. If this ε error is small enough, then the algorithm terminates.  i.e. 𝒗 𝒏+𝟏 − 𝒗 𝒏 < 𝜺
  • 12. 2. Steady-state method  Steady-state method is all about transforming in eigenvector-eigenvalue problem. The steady-state is when the eigenvalue is equal to 1. i.e. Hx = x  H: transition matrix x: final page rank matrix of given pages.  Here we have to solve an eigenvalue-eigenvector problem where the eigenvalue is 1.Because when we apply the H matrix, then nothing happens (we end up with the same v vector). Which means that this is steady-state. What’s essential that there is no need for initialization (no need for the 1/n values) and no need to make iterations.
  • 13. 3. Random surfer modal Assumptions  Importance of a web page is measured by its popularity i.e. how many incoming links it has.  PageRank can be defined by the probability that a random surfer on the web starts on a random page + follows hyperlinks and visits the given page. The transition matrix (H) is useful as well. In this method we also have to multiply v with H until it reaches the stationary state i.e. Hv = v Again we will end with same result as we in Steady state method.
  • 14. Problems The models we have discussed so far are not perfect, there are some cases where these models fails completely and do not give correct results. These cases are  Dangling nodes (nodes with no outgoing link)  Disconnected nodes
  • 15. Dangling nodes (nodes with no outgoing link)  The dangling nodes does not have any outgoing links.  Now if we try to find rank of webpages then Fig: Dangling Node is 3  So in this case the rank of every page is 0. Something is not right, as page 3 has 2 incoming links, so it must have some importance!  So in case of Dangling node our method fails.
  • 16. Disconnected nodes  A random surfer that starts in the first connected component has no way of getting to web page 5 since the nodes 1 and 2 have no links to node 5 that he can follow. Linear algebra also fails to help as well. Fig: Disconnected nodes  In this case also our method discussed so far fails.
  • 17. Damping Factor – Final PageRank model  In order to solve the problems as discussed above the concept of damping factor is introduced in this model. This model also follows the assumptions of random surfer model.  Damping factor is the probability that random surfer leaves the given page and navigates to a completely new one.  So the final PageRank formula is G = (1-d)H + dB G: this is the PageRank matrix or Google-matrix. d: damping factor H: transition matrix B: matrix with all entries 1 i.e. 𝟏 𝐧 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
  • 18. G = (1-d)H + dB  If d is high, it means the random surfer navigates to new pages (teleportation) quite often. If d is low, it means the random surfer has a tendency to follow links instead of B 15%chances that the surfer will leave page and 85%chances that surfer will follow links given webpage.  Here M will have same features as H. Some times with little probability the surfer leaves the actual page and navigate to another one also called “teleportation”. If we have n websites then the probability of surfer to any page is 1/n. That’s why B has 1/n term. Most of the time, with (1-d) probability, the surfer will follow links in the given page. It will visit one of the neighbors of the actual page.
  • 19. Problem  The M matrix will be enormous with lots of rows and columns so we cannot handle it properly.  That’s why we use power method approximation which is  Instead of initialize v vector with entries 1/n we use values 1 instead. For random matrix it is not going to be any faster but Google-matrix is sparse. A given node has small no. of outgoing links so it will work fine.  In view of everything discussed above, we conclude that:  Fact: The PageRank vector for a web graph with transition matrix A, and damping factor p, is the unique probabilistic eigenvector of the matrix M, corresponding to the eigenvalue 1.
  • 20.  Perron-Frobeius theorm If M is a positive (the values are all greater than 0) and column stochastic matrix (so the sum of the columns are 1) which is true in this case then according to Perron-Frobeiustheorm  1 is an eigen value of multiplicity one.  1 is the largest eigen value: all the other eigen values have absolute value smaller than 1.  the eigenvectors corresponding to the eigen value 1 have either only positive entries or only negative entries. In particular, for the eigen value 1 there exists a unique eigenvector with the sum of its entries equal to 1. Intuitively, the matrix M "connects" the graph and gets rid of the dangling nodes. A node with no outgoing edges has now probability to move to any other node.  Power method convergence theorem It says if we have a matrix M which is positive, column stochastic ( sum of elements of column is 1), then we can have w which is the eigenvector corresponding to the eigen value 1. In that case the sequence v, Mv,M2 v …...Mk v converges to w. Here is going to store the pagerank of all the websites in WWW. Where v is the initial matrix with all entries equal to 1/n.
  • 21. Pseudo Code  BEGIN  LOOP  i 0 to N  LOOP  j 0 to N  Initialize S  IF( S = 0) THEN  Aixj = 1/S  ELSE THEN  Aixj = 0  END LOOP  Initialize Val ,Vec, R  PRINT Vec  R 4 x 1 = { 1,1,1,1}  PRINT R  T 0 HERE N IS NO. OF NODES A IS EMPTY MATRIX S IS THE NO. OF CONNECTION COMING FROM SIDE VAL IS EIGEN VALUE WHICH ARE FOUND MATHEMATICALLY OR MANUALLY VEC IS EIGEN VECTOR FOUND MANUALLY R IS A MATRIX OF ORDER 4 X 1 WITH ALL IDENTITY 1 T IS NO. OF TEST CASES
  • 22.  WHILE  LOOP  T<7  Rank= A x R  R =Rank  T=T+1  END LOOP  PRINT R
  • 23. PROGRAM #!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Mon Jun 22 22:45:15 2020 @author: team17 """ import numpy as np n=int(input("no of nodes ")) a=np.eye(n,n) for i in range(0,n): for j in range(0,n): s=int(input()) if(s!=0): a[i][j]=1/s else: a[i][j]=0
  • 25. BIBLIOGRAPHY  Photos taken from (www.google.com)  Information regarding topic from (www.wikipedia.com), (globalsoftwaresupport.com)
  • 26. THE TEAM  A Mathematics project by GROUP 17  Haritik Thakur , Navneet Sharma , Mohammad Islam Ansari