SlideShare ist ein Scribd-Unternehmen logo
1 von 18
PageRank
Rank search results
1
2
⋮
Problem Statement
• Express internet as a graph 𝐺 𝑉, 𝐸
𝑉 = 𝑤𝑒𝑏𝑝𝑎𝑔𝑒𝑠 , 𝐸 = 𝑢, 𝑣 |𝑢 𝑙𝑖𝑛𝑘𝑠 𝑡𝑜 𝑣
• Goal: find a reasonable 𝑹𝒂𝒏𝒌: 𝑉 ⟼ ℕ
Transition(click) probability
• Each click on links (𝑢, 𝑣) take you from 𝑢 to 𝑣
• Assigned probabilities to out-edges. Ex: Equal likely
Transition
• 𝑇 = 𝑡
0 0 0 .2 .8 0
• 𝑇 = 𝑡 + 1
0 .2 .2 .4 0 .2
• It’s just “Reallocate” !
Steady distribution
• Questions :
Are there any long-term behavior after many of clicks ?
Is it unique ? ( related to initial state? )
• If there is, we call the limiting distribution steady dist.
meaning: probability that stay at certain page in long run
• Seems like steady probabilities are good for ranking.
Transition matrix
• Transition (Adjacency) matrix 𝑃
𝑃 𝑛 𝑥0 = 𝑋 𝑛 (r.v./pmf)
104/1000
004/1000
02/10003/1
004/1103/1
02/14/1003/1
000010
• Properties
𝑃 𝑇
1 = 1, 𝜆0 = 1
𝝀𝒊 ≤ 𝟏 (spectral radius = 1)
𝑷𝝅 = 𝝅 ≽ 𝟎 (exist non-negative e.v.)
(Frobenius theorem)
Transition distribution
Frobenius theorem
• 𝐴 𝑛×𝑛 ≽ 0 ⟹ 𝜆0 ≥ 𝜆𝑖≠0 AND 𝐴𝑥 = 𝜆0 𝑥 has 𝑥 ≽ 0
(largest 𝜆 of non-negative matrix is non-negative and has non-negative e.vector )
(𝐴 ≠ 0)
• Case: transition matrix 𝑃
let 𝑃 𝑇 𝑦 = 𝜆𝑦, 𝑦 𝑘 = 𝑚𝑎𝑥{ 𝑦𝑖 }
𝝀 𝒚 𝒌 = 𝜆𝑦 𝑘 = 𝑃 𝑇
𝑦 𝑘
= 𝑖 𝑝𝑖𝑘 𝑦𝑖 ≤ 𝑖 𝑝𝑖𝑘 𝑦𝑖 ≤ 𝑖 𝑝𝑖𝑘 𝑦𝑖 = |𝒚 𝒌|
𝜆 ≤
𝑦 𝑘
𝑦 𝑘
= 1 = 𝜆0
Perron-Frobenius theorem
• 𝐴 𝑛×𝑛 ≻ 0 ⟹ 𝜆0 > 𝜆𝑖≠0 AND 𝐴𝑥 = 𝜆0 𝑥 has 𝑥 ≻ 0
(largest 𝜆 of positive matrix is strictly larger and has positive eigenvector)
• Case: positive transition matrix 𝑃
1. 𝑃𝑥 = 𝑥 must have 𝑥 ≻ 0 (or ≺ 0)
2. 𝑥 is unique with 1 𝑇
𝑥 = 1
otherwise 𝑥 − 𝑥′ ≻ 0 not hold
Power method
• Iterate 𝑥 𝑛+1 =
𝐴𝑥 𝑛
𝐴𝑥 𝑛
until converge,
𝑥∞ is a eigenvector of largest eigenvalue
• Note: Numerically, 𝐴 is always diagonalizable
𝑥 𝑛 =
𝜆0
𝑛
𝑁𝑜𝑟𝑚𝑠
𝑐0 𝑣0 +
𝜆1
𝜆0
𝑛
𝑐1 𝑣1 +
𝜆2
𝜆0
𝑛
𝑐2 𝑣2 + ⋯
• Case: 𝑃 ≻ 0, transition matrix are self-normed (1-norm)
1 𝑇
𝑥0 = 1, 𝑥 𝑛+1 = 𝑃𝑥 𝑛, 𝑃∞
𝑥0 = 𝜋 (steady distribution)
Adjust to positive matrix
• Make transitions positive with taste
𝑃′ = 𝛽𝑃 + 1 − 𝛽 𝑄, 0 < 𝛽 < 1, 𝑞𝑖𝑗 =
1
𝑁
Steady distribution
• When does 𝑃 ≽ 0 also has steady distribution ?
• ∃𝑘 ∈ ℕ. 𝑃′ = 𝑃 𝑘 ≻ 0
⟹ 𝑃 𝑛
𝑥0 = 𝑃 𝑘 𝑚
𝑃 𝑙
𝑥0 = 𝑃′ 𝑚
𝑥𝑙 ⟶ 𝜋
• What 𝑃 𝑘 ≻ 0 means ?
𝑃 𝑘
𝑒𝑖 ≻ 0 (𝑒𝑖 = [0,0,0, … 1 … , 0])
⟹ possible to surf between pages ⟹ strongly connected (irreducible)
⟹ all pages are possible at time k ⟹ aperiodic
• aperiodic ∧ irreducible ⟺ ∃𝑘. (𝑃 𝑘 ≻ 0)
Intuition behind theorems
• Think 𝑃 as linear interpolation transformation
• 𝑃 = 𝐼𝑃, think 𝐼 as initial basis 𝐵0 = 𝑏1
0
, 𝑏2
0
, 𝑏3
0
𝐵 𝑛 = 𝐵0 𝑃 𝑛
, 𝐵0 𝑃 = 𝐵1 = , ,
• 𝑓𝑒𝑎𝑠𝑖𝑏𝑙𝑒 𝑑𝑖𝑠𝑡. 𝑎𝑟𝑒𝑎, Δ 𝑛 ∝ 𝐵 𝑛 = 𝑃
𝑛
• 0 ≤ 𝑃 < 1 ⟹ Δ∞ = 0
Except Δ∞ converge to edges (𝑟𝑎𝑛𝑘 Δ∞ > 0)
we have 𝑏 𝑘
∞
= 𝜋 (like Nested Interval Thm)
• Note: To have steady 𝜋, 𝑃 𝑘 ≻ 0 is not necessary.
Application of
Perron-Frobenius theorem
Markov Chain Monte Carlo
Markov Chain Monte Carlo
• Idea: construct a transition process 𝑃 with desired steady dist. 𝜋
𝑋 𝑛 = 𝑃 𝑛 𝑒𝑖 ,with large 𝑘. 𝑋 𝑘, 𝑋 𝑘+1, 𝑋 𝑘+2 ⋯ ∽ 𝜋 (not independent)
𝑃 ∙ is apply transition (click link) not probability calculation
• Note: 𝐷 𝜋 ≮ ∞ ⟹ 𝑃 𝐷 𝜋 ×𝐷 𝜋 = 𝑝1, 𝑝2, 𝑝3 ⋯ ,
a infinite matrix with each column are transition distribution 𝒑𝒊
• To construct 𝒑𝒊 , use known distribution 𝒒𝒊 with accept-reject
(Not exactly,
see next slide)
𝐷 𝜋
MCMC
• 𝜋𝑗 𝒑𝒋 𝒊 = 𝜋𝑖 𝒑𝒊 𝒋 , ∀𝑖, 𝑗 ⟹ 𝑷𝜋 = 𝜋
• 𝑷𝜋 = 𝜋 AND 𝑷 ≻ 0 ⟹ 𝜋 is steady dist.
• 𝜋𝑗 𝜶𝒊𝒋 𝒒𝒋 𝒊 = 𝜋𝑖 𝜶𝒋𝒊 𝒒𝒊 𝒋
Note: not really make 𝒑𝒊 ≡ 𝒒𝒊, 𝜶
Instead, we make 𝜋𝑗 𝜶𝒊𝒋 𝒒𝒋 𝒊 = 𝜋𝑖 𝜶𝒋𝒊 𝒒𝒊 𝒋 (pairwise).
Don’t even need to know what 𝑷 is.
• 𝒒𝒊: 𝐷(𝜋) ⟼ 𝑅+with positive density ⟹ 𝑷 ≻ 0
• let 𝑚𝑎𝑥 𝜶𝒊𝒋, 𝜶𝒋𝒊 = 1 for efficiency (Metropolis-Hastings)
MCMC algorithm
• 1. Initialize 𝑥0 ∈ 𝐷 𝜋 , 𝑛 = 0
2. draw 𝑥′ from 𝑞 𝑥 𝑛
; 𝑢 from 𝑈 0, 1
3. IF 𝑢 < 𝛼 𝑥′ 𝑥 𝑛
# accepting
𝑥 𝑛+1 ≔ 𝑥′
𝑛 ≔ 𝑛 + 1
GO step 2.
GO step 2.
• Then 𝑥 𝑘, 𝑥 𝑘+1, 𝑥 𝑘+2 ⋯ ~𝜋
Simulated annealing
• Consider series of transition strategies 𝑃[0], 𝑃[1], … 𝑃[𝑛] …
and distributions 𝑃[𝑛] 𝜋 𝑛 = 𝜋 𝑛
Satisfy (1). ∀𝑛. ∃𝑘. 𝑃[𝑛]
𝑘
≻ 0
(2). 𝑙𝑖𝑚
𝑛→∞
𝜋 𝑛 = 𝜋
then 𝑃 𝑛 … 𝑃 2 𝑃 1 𝑃 0 𝑑 → 𝜋 𝑎𝑠 𝑛 → ∞
• Set 𝑝 𝑛 𝑖 𝑗 > 𝑝 𝑛 𝑗 𝑖 and 𝑙𝑖𝑚
𝑛→∞
𝑝 𝑛 𝑗 𝑖
𝑝 𝑛 𝑖 𝑗
= 0
if 𝑓 𝑗 > 𝑓(𝑖) and with (1) satisfied.
we have : 𝜋 = 𝑒 𝑥∗ 𝑤ℎ𝑒𝑟𝑒 𝑥∗ = 𝑎𝑟𝑔𝑚𝑎𝑥
𝑥
𝑓(𝑥) .
⇒
𝒏=𝟎
∞
𝑷 𝒏 becomes a optimization strategy!!

Weitere ähnliche Inhalte

Ähnlich wie Page rank - from theory to application

Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsSantiagoGarridoBulln
 
Lecture_9_LA_Review.pptx
Lecture_9_LA_Review.pptxLecture_9_LA_Review.pptx
Lecture_9_LA_Review.pptxSunny432360
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementationJongsu "Liam" Kim
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesJinho Lee
 
Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Thatchaphol Saranurak
 
DIFFERENTAL CALCULUS DERIVATIVES FIRST PART
DIFFERENTAL CALCULUS DERIVATIVES FIRST PARTDIFFERENTAL CALCULUS DERIVATIVES FIRST PART
DIFFERENTAL CALCULUS DERIVATIVES FIRST PARTteacherlablidas
 
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...rofiho9697
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2weekcsl9496
 
Quadratic form and functional optimization
Quadratic form and functional optimizationQuadratic form and functional optimization
Quadratic form and functional optimizationJunpei Tsuji
 
L5_Bilinear Transformation.pdf
L5_Bilinear Transformation.pdfL5_Bilinear Transformation.pdf
L5_Bilinear Transformation.pdftemp2tempa
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!ChenYiHuang5
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Asafak Husain
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
 
Differential Calculus- differentiation
Differential Calculus- differentiationDifferential Calculus- differentiation
Differential Calculus- differentiationSanthanam Krishnan
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...MITSUNARI Shigeo
 

Ähnlich wie Page rank - from theory to application (20)

Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
 
Lecture_9_LA_Review.pptx
Lecture_9_LA_Review.pptxLecture_9_LA_Review.pptx
Lecture_9_LA_Review.pptx
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
Basic calculus (i)
Basic calculus (i)Basic calculus (i)
Basic calculus (i)
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)Max flows via electrical flows (long talk)
Max flows via electrical flows (long talk)
 
DIFFERENTAL CALCULUS DERIVATIVES FIRST PART
DIFFERENTAL CALCULUS DERIVATIVES FIRST PARTDIFFERENTAL CALCULUS DERIVATIVES FIRST PART
DIFFERENTAL CALCULUS DERIVATIVES FIRST PART
 
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...Calculus Review Session Brian Prest Duke University Nicholas School of the En...
Calculus Review Session Brian Prest Duke University Nicholas School of the En...
 
Coursera 2week
Coursera  2weekCoursera  2week
Coursera 2week
 
Quadratic form and functional optimization
Quadratic form and functional optimizationQuadratic form and functional optimization
Quadratic form and functional optimization
 
Integral calculus
Integral calculusIntegral calculus
Integral calculus
 
L5_Bilinear Transformation.pdf
L5_Bilinear Transformation.pdfL5_Bilinear Transformation.pdf
L5_Bilinear Transformation.pdf
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
 
Differential Calculus- differentiation
Differential Calculus- differentiationDifferential Calculus- differentiation
Differential Calculus- differentiation
 
Basic calculus (ii) recap
Basic calculus (ii) recapBasic calculus (ii) recap
Basic calculus (ii) recap
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...
 

Kürzlich hochgeladen

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Kürzlich hochgeladen (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Page rank - from theory to application

  • 3. Problem Statement • Express internet as a graph 𝐺 𝑉, 𝐸 𝑉 = 𝑤𝑒𝑏𝑝𝑎𝑔𝑒𝑠 , 𝐸 = 𝑢, 𝑣 |𝑢 𝑙𝑖𝑛𝑘𝑠 𝑡𝑜 𝑣 • Goal: find a reasonable 𝑹𝒂𝒏𝒌: 𝑉 ⟼ ℕ
  • 4. Transition(click) probability • Each click on links (𝑢, 𝑣) take you from 𝑢 to 𝑣 • Assigned probabilities to out-edges. Ex: Equal likely
  • 5. Transition • 𝑇 = 𝑡 0 0 0 .2 .8 0 • 𝑇 = 𝑡 + 1 0 .2 .2 .4 0 .2 • It’s just “Reallocate” !
  • 6. Steady distribution • Questions : Are there any long-term behavior after many of clicks ? Is it unique ? ( related to initial state? ) • If there is, we call the limiting distribution steady dist. meaning: probability that stay at certain page in long run • Seems like steady probabilities are good for ranking.
  • 7. Transition matrix • Transition (Adjacency) matrix 𝑃 𝑃 𝑛 𝑥0 = 𝑋 𝑛 (r.v./pmf) 104/1000 004/1000 02/10003/1 004/1103/1 02/14/1003/1 000010 • Properties 𝑃 𝑇 1 = 1, 𝜆0 = 1 𝝀𝒊 ≤ 𝟏 (spectral radius = 1) 𝑷𝝅 = 𝝅 ≽ 𝟎 (exist non-negative e.v.) (Frobenius theorem) Transition distribution
  • 8. Frobenius theorem • 𝐴 𝑛×𝑛 ≽ 0 ⟹ 𝜆0 ≥ 𝜆𝑖≠0 AND 𝐴𝑥 = 𝜆0 𝑥 has 𝑥 ≽ 0 (largest 𝜆 of non-negative matrix is non-negative and has non-negative e.vector ) (𝐴 ≠ 0) • Case: transition matrix 𝑃 let 𝑃 𝑇 𝑦 = 𝜆𝑦, 𝑦 𝑘 = 𝑚𝑎𝑥{ 𝑦𝑖 } 𝝀 𝒚 𝒌 = 𝜆𝑦 𝑘 = 𝑃 𝑇 𝑦 𝑘 = 𝑖 𝑝𝑖𝑘 𝑦𝑖 ≤ 𝑖 𝑝𝑖𝑘 𝑦𝑖 ≤ 𝑖 𝑝𝑖𝑘 𝑦𝑖 = |𝒚 𝒌| 𝜆 ≤ 𝑦 𝑘 𝑦 𝑘 = 1 = 𝜆0
  • 9. Perron-Frobenius theorem • 𝐴 𝑛×𝑛 ≻ 0 ⟹ 𝜆0 > 𝜆𝑖≠0 AND 𝐴𝑥 = 𝜆0 𝑥 has 𝑥 ≻ 0 (largest 𝜆 of positive matrix is strictly larger and has positive eigenvector) • Case: positive transition matrix 𝑃 1. 𝑃𝑥 = 𝑥 must have 𝑥 ≻ 0 (or ≺ 0) 2. 𝑥 is unique with 1 𝑇 𝑥 = 1 otherwise 𝑥 − 𝑥′ ≻ 0 not hold
  • 10. Power method • Iterate 𝑥 𝑛+1 = 𝐴𝑥 𝑛 𝐴𝑥 𝑛 until converge, 𝑥∞ is a eigenvector of largest eigenvalue • Note: Numerically, 𝐴 is always diagonalizable 𝑥 𝑛 = 𝜆0 𝑛 𝑁𝑜𝑟𝑚𝑠 𝑐0 𝑣0 + 𝜆1 𝜆0 𝑛 𝑐1 𝑣1 + 𝜆2 𝜆0 𝑛 𝑐2 𝑣2 + ⋯ • Case: 𝑃 ≻ 0, transition matrix are self-normed (1-norm) 1 𝑇 𝑥0 = 1, 𝑥 𝑛+1 = 𝑃𝑥 𝑛, 𝑃∞ 𝑥0 = 𝜋 (steady distribution)
  • 11. Adjust to positive matrix • Make transitions positive with taste 𝑃′ = 𝛽𝑃 + 1 − 𝛽 𝑄, 0 < 𝛽 < 1, 𝑞𝑖𝑗 = 1 𝑁
  • 12. Steady distribution • When does 𝑃 ≽ 0 also has steady distribution ? • ∃𝑘 ∈ ℕ. 𝑃′ = 𝑃 𝑘 ≻ 0 ⟹ 𝑃 𝑛 𝑥0 = 𝑃 𝑘 𝑚 𝑃 𝑙 𝑥0 = 𝑃′ 𝑚 𝑥𝑙 ⟶ 𝜋 • What 𝑃 𝑘 ≻ 0 means ? 𝑃 𝑘 𝑒𝑖 ≻ 0 (𝑒𝑖 = [0,0,0, … 1 … , 0]) ⟹ possible to surf between pages ⟹ strongly connected (irreducible) ⟹ all pages are possible at time k ⟹ aperiodic • aperiodic ∧ irreducible ⟺ ∃𝑘. (𝑃 𝑘 ≻ 0)
  • 13. Intuition behind theorems • Think 𝑃 as linear interpolation transformation • 𝑃 = 𝐼𝑃, think 𝐼 as initial basis 𝐵0 = 𝑏1 0 , 𝑏2 0 , 𝑏3 0 𝐵 𝑛 = 𝐵0 𝑃 𝑛 , 𝐵0 𝑃 = 𝐵1 = , , • 𝑓𝑒𝑎𝑠𝑖𝑏𝑙𝑒 𝑑𝑖𝑠𝑡. 𝑎𝑟𝑒𝑎, Δ 𝑛 ∝ 𝐵 𝑛 = 𝑃 𝑛 • 0 ≤ 𝑃 < 1 ⟹ Δ∞ = 0 Except Δ∞ converge to edges (𝑟𝑎𝑛𝑘 Δ∞ > 0) we have 𝑏 𝑘 ∞ = 𝜋 (like Nested Interval Thm) • Note: To have steady 𝜋, 𝑃 𝑘 ≻ 0 is not necessary.
  • 15. Markov Chain Monte Carlo • Idea: construct a transition process 𝑃 with desired steady dist. 𝜋 𝑋 𝑛 = 𝑃 𝑛 𝑒𝑖 ,with large 𝑘. 𝑋 𝑘, 𝑋 𝑘+1, 𝑋 𝑘+2 ⋯ ∽ 𝜋 (not independent) 𝑃 ∙ is apply transition (click link) not probability calculation • Note: 𝐷 𝜋 ≮ ∞ ⟹ 𝑃 𝐷 𝜋 ×𝐷 𝜋 = 𝑝1, 𝑝2, 𝑝3 ⋯ , a infinite matrix with each column are transition distribution 𝒑𝒊 • To construct 𝒑𝒊 , use known distribution 𝒒𝒊 with accept-reject (Not exactly, see next slide) 𝐷 𝜋
  • 16. MCMC • 𝜋𝑗 𝒑𝒋 𝒊 = 𝜋𝑖 𝒑𝒊 𝒋 , ∀𝑖, 𝑗 ⟹ 𝑷𝜋 = 𝜋 • 𝑷𝜋 = 𝜋 AND 𝑷 ≻ 0 ⟹ 𝜋 is steady dist. • 𝜋𝑗 𝜶𝒊𝒋 𝒒𝒋 𝒊 = 𝜋𝑖 𝜶𝒋𝒊 𝒒𝒊 𝒋 Note: not really make 𝒑𝒊 ≡ 𝒒𝒊, 𝜶 Instead, we make 𝜋𝑗 𝜶𝒊𝒋 𝒒𝒋 𝒊 = 𝜋𝑖 𝜶𝒋𝒊 𝒒𝒊 𝒋 (pairwise). Don’t even need to know what 𝑷 is. • 𝒒𝒊: 𝐷(𝜋) ⟼ 𝑅+with positive density ⟹ 𝑷 ≻ 0 • let 𝑚𝑎𝑥 𝜶𝒊𝒋, 𝜶𝒋𝒊 = 1 for efficiency (Metropolis-Hastings)
  • 17. MCMC algorithm • 1. Initialize 𝑥0 ∈ 𝐷 𝜋 , 𝑛 = 0 2. draw 𝑥′ from 𝑞 𝑥 𝑛 ; 𝑢 from 𝑈 0, 1 3. IF 𝑢 < 𝛼 𝑥′ 𝑥 𝑛 # accepting 𝑥 𝑛+1 ≔ 𝑥′ 𝑛 ≔ 𝑛 + 1 GO step 2. GO step 2. • Then 𝑥 𝑘, 𝑥 𝑘+1, 𝑥 𝑘+2 ⋯ ~𝜋
  • 18. Simulated annealing • Consider series of transition strategies 𝑃[0], 𝑃[1], … 𝑃[𝑛] … and distributions 𝑃[𝑛] 𝜋 𝑛 = 𝜋 𝑛 Satisfy (1). ∀𝑛. ∃𝑘. 𝑃[𝑛] 𝑘 ≻ 0 (2). 𝑙𝑖𝑚 𝑛→∞ 𝜋 𝑛 = 𝜋 then 𝑃 𝑛 … 𝑃 2 𝑃 1 𝑃 0 𝑑 → 𝜋 𝑎𝑠 𝑛 → ∞ • Set 𝑝 𝑛 𝑖 𝑗 > 𝑝 𝑛 𝑗 𝑖 and 𝑙𝑖𝑚 𝑛→∞ 𝑝 𝑛 𝑗 𝑖 𝑝 𝑛 𝑖 𝑗 = 0 if 𝑓 𝑗 > 𝑓(𝑖) and with (1) satisfied. we have : 𝜋 = 𝑒 𝑥∗ 𝑤ℎ𝑒𝑟𝑒 𝑥∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑥 𝑓(𝑥) . ⇒ 𝒏=𝟎 ∞ 𝑷 𝒏 becomes a optimization strategy!!