SlideShare a Scribd company logo
1 of 42
Download to read offline
How Does Google? !
!
	

David F. Gleich!
Computer Science!
Purdue University!

A journey into the wondrous mathematics
behind your favorite websites
1
Mathematics underlies an
enormous number of the
websites we use everyday!
2
1.  ‘s PageRank

2.  Multi-armed bandits and
internet experiments
3
4
Larry Page !
Sergey Brin!

•  Created a web-search algorithm
called “backrub”
•  Spun-off a company “Googol”
based on the paper

•  The importance of a page is
determined by the importance of
pages that link to it.
Lawrence Page, Sergey Brin, Rajeev Motwani,Terry
Winograd “The PageRank Citation Ranking: Bringing
Order to the Web” TR, Stanford InfoLab, 1999	

5
A websearch primer
1.  Crawl webpages
2.  Analyze webpage text (information retrieval)
3.  Analyze webpage links
4.  Fit over 200 measures to human evaluations
5.  Produce rankings
6.  Continuously update
6
Pages, nodes, incoming links,
outgoing links, and “importance”
7
“Important” pages
that link to me!
c
b
a
“Important”
pages that
link to
Purdue!
8
Tim Davis andYifan Hu	

Sparse Matrix Gallery
http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
1000 vertices on
8.5-by-11 paper
1,000,000,000,000
vertices (one trillion)

Paper the size of
Manhattan island !
(23 sq miles)?
The web
10
We need something better!
11
A wee web-graph: link
counting is too easy to game!
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

12
A wee web-graph: link
counting is too easy to game!
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

The importance of a
page is determined
by the importance of
pages that link to it.
x1 = 0
x2 =
1
3
x1
x3 =
1
3
x1 +
1
2
x2
x4 =
1
3
x1 + x3 + x5
x5 = x4
x6 =
1
2
x2
13
The importance of a page is determined
by the importance of pages that link to it
xi =
X
j2Bi
1
dj
xj
“Back-links from page i”
Why it was called Backrub!	

“Importance” of page i
“Importance” of page j
Number of links page j uses!
out-degree in graph theory	

x3 =
1
3
x1 +
1
2
x2
1	

2	

3	

1/3 	

1/2 	

14
We can rewrite this equation in a more
mathematically convenient way
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
4 1 2 3 4 5 6
5 1 2 3 4 5 6
6 1 2 3 4 5 6
x 0 x 0 x 0 x 0 x 0 x 0 x
1
x x 0 x 0 x 0 x 0 x 0 x
3
1 1
x x x 0 x 0 x 0 x 0 x
3 2
1
x x 0 x 1x 0 x 1x 0 x
3
x 0 x 0 x 0 x 1x 0 x 0 x
1
x 0 x x 0 x 0 x 0 x 0 x
2
= + + + + +
= + + + + +
= + + + + +
= + + + + +
= + + + + +
= + + + + +
15
1 1
2 2
3 3
4 4
5 5
6 6
x x0 0 0 0 0 0
x x1/ 3 0 0 0 0 0
x x1/ 3 1/ 2 0 0 0 0
or
x x1/ 3 0 1 0 1 0
x x0 0 0 1 0 0
x x0 1/ 2 0 0 0 0
⎡ ⎤ ⎡ ⎤⎡ ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
=⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣ ⎦⎣ ⎦ ⎣ ⎦
x = Px
And even more conveniently!
Element k in column m = "probability" of
going from node m to node k
16
The matrix P for websites
shows a lot of structure
Every dot is a non-zero element indicating a link
Matrices are sparse, and generally with block structure
block structure can be explored to speed up ranking algorithm
17
But this idea doesn’t work for
the wee web-graph
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

Nodes 1, 4 and 5
determine everything!
x1 = 0
x2 =
1
3
x1
x3 =
1
3
x1 +
1
2
x2
x4 =
1
3
x1 + x3 + x5
x5 = x4
x6 =
1
2
x2
x1 = 0
x2 =
1
3
x1 = 0
x3 =
1
3
x1 +
1
2
x2 = 0
x4 =
1
3
x1 + x3 + x5 = x5
x5 = x4
x6 =
1
2
x2 = 0
18
But this idea doesn’t work for
the wee web-graph
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

Node 1 !
“lonely”

Nodes 4 and 5 !
“mutual admiration
societies” 

Node 6 
“anti-social”
These nodes need to be “fixed” to get a
reliable and useful ranking!
19
The gang of four to the rescue
Andrei
Markov
Oscar
Perron
Georg
Frogenius
Richard !
von Mises
20
Let’s fix it up and force node 6 to
choose, or link to everyone
1
2
3
4
5
6
P =
2
6
6
6
6
6
6
4
0 0 0 0 0 0
1/3 0 0 0 0 0
1/3 1/2 0 0 0 0
1/3 0 1 0 1 0
0 0 0 1 0 0
0 1/2 0 0 0 0
3
7
7
7
7
7
7
5
P =
2
6
6
6
6
6
6
4
0 0 0 0 0 1/6
1/3 0 0 0 0 1/6
1/3 1/2 0 0 0 1/6
1/3 0 1 0 1 1/6
0 0 0 1 0 1/6
0 1/2 0 0 0 1/6
3
7
7
7
7
7
7
5
21
Taxation is the way to
representation!
c
b
a
If is a good page, then
it’ll still be a good page if
we “tax” the importance
from a, b, and c

We can redistribute the
taxed amounts to all
including lonely nodes!
22
The importance of a page is determined
by the importance of pages that link to it*
* After tax and any benefits
The total importance that page j !
contributes to page i
Benefits to page i
The taxation rate of all
xi =
X
j2Bi
↵
xj
dj
+ (1 ↵)bi
23
x1
x2
x3
x4
x5
x6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
= α
0 0 0 0 0 1/ 6
1/ 3 0 0 0 0 1/ 6
1/ 3 1/ 2 0 0 0 1/ 6
1/ 3 0 1 0 1 1/ 6
0 0 0 1 0 1/ 6
0 1/ 2 0 0 0 1/ 6
!
"
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
x1
x2
x3
x4
x5
x6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
+(1− α)
b1
b2
b3
b4
b5
b6
!
"
#
#
#
#
#
#
#
#
#
$
%
&
&
&
&
&
&
&
&
&
Perron and Frobenius showed the new
equation always has a unique solution
x = ↵Px + (1 ↵)b
24
1	

2	

3	

4	

5	

6	

1/3 	

1/3 	

1/3 	

1/2 	

1/2 	

What von Mises and Richardson showed
is that guess, check, and correct works!
x(new)
= ↵Px(old)
+ (1 ↵)b
x(start)
=
2
6
6
6
6
6
6
4
0.17
0.17
0.17
0.17
0.17
0.17
3
7
7
7
7
7
7
5
x(1)
=
2
6
6
6
6
6
6
4
0.05
0.10
0.17
0.38
0.19
0.12
3
7
7
7
7
7
7
5
x(2)
=
2
6
6
6
6
6
6
4
0.04
0.06
0.10
0.36
0.36
0.08
3
7
7
7
7
7
7
5
x(1)
=
2
6
6
6
6
6
6
4
0.03
0.04
0.06
0.43
0.39
0.05
3
7
7
7
7
7
7
5
25
26
There’s still a lot of work left to
do to make a search engine
Make it fast!
Watch out for spam
Watch out for manipulation
Personalize

Experiment!
27
1.  ‘s PageRank

2.  Multi-armed bandits and
internet experiments
28
http://adamlofting.com/736/drawn-multi-armed-bandit-experiments/multi-armed-bandit/
Not this!
29
http://upload.wikimedia.org/wikipedia/en/8/82/Las_Vegas_slot_machines.jpg
This!
Pays out !
$0.92/
dollar
Pays out !
$0.98/
dollar
Pays out !
$0.95/
dollar
Pays out !
$0.99/
dollar
30
What in the heck does a multi-armed
bandit have to do with Google?
31
What in the heck does a multi-armed
bandit have to do with Google?
Pays out !
$0.92/
view
Pays out !
$0.66/
view
Pays out !
$0.91/
view to
show ads
Pays out !
-$0.02/view
hide ads
32
How to optimize your website
without exploiting the bandits
Try condition A 100 times, find 45 “wins”
Try condition B 100 times, find 85 “wins”
Try condition C 100 times, find 10 “wins”
…
Choose the best!
33
This field has some of the
best terminology

Explore !

Exploit !

Regret
34
This field has some of the
best terminology

Explore – Visiting Las Vegas!

Exploit – Your new winning strategy!

Regret – That you didn’t quit after
winning the first round
35
This field has some of the
best terminology

Explore – Testing slot machines/
experiments for their reward
Exploit – Playing the best reward
you’ve found so far 
Regret – How much you lost due !
to exploration
36
How to optimize your website
without exploiting the bandits
Try condition A 100 times, find 45 “wins”
Try condition B 100 times, find 85 “wins”
Try condition C 100 times, find 10 “wins”
…
Choose the best!
Pure
exploration!
We only exploit our findings at the end!
37
How to optimize your website
exploiting the bandits
Try condition A 5 times, find 4 wins!
Try condition B 5 times, find 4 wins!
Try condition C 5 times, find 2 wins

Try condition A 7 times, find 3 wins!
Try condition B 7 times, find 5 wins!
Try condition C 1 time, find 0 wins


Pure
exploration!
Exploit our
knowledge
Condition
 A
 B
 C
Est. Return
 0.58
 0.75
 0.33
38
The goal of these problems is to construct
optimal strategies to minimize regret
Regret how much you left “on the table” by exploring	

	

	

	

	

zero-regret strategy is one where 

regret(T trials) is sublinear in T!

as the number of plays T → ∞ 	

E[play best always plays made based on data]
regret 100-each 255/300 140/300 = 0.38
regret 30-mixed 25.5/30 0.45 ⇥ 12 + 0.85 ⇥ 12 + 0.1 ⇥ 6 = 0.31
39
[The bandit problem] was formulated during the [second
world] war, and efforts to solve it so sapped the energies
and minds of Allied analysts that the suggestion was
made that the problem be dropped over Germany, as the
ultimate instrument of intellectual sabotage.	

Peter Whittle (Whittle, 1979)
Discussion of “Bandit processes and dynamical allocation indices”
Their importance to website optimization,
advertising, and recommendation has
rejuvenated research on these problems
with fascinating new questions. 
40
Math is everywhere and
especially your favorite
websites!
Matrices and probability are
key ingredients.
41
PageRank on Wikipedia
= 0.50
United States
C:Living people
France
Germany
England
United Kingdom
Canada
Japan
Poland
Australia
= 0.85
United States
C:Main topic classif.
C:Contents
C:Living people
C:Ctgs. by country
United Kingdom
C:Fundamental
C:Ctgs. by topic
C:Wikipedia admin.
France
= 0.99
C:Contents
C:Main topic classif.
C:Fundamental
United States
C:Wikipedia admin.
P:List of portals
P:Contents/Portals
C:Portals
C:Society
C:Ctgs. by topic
Note Top 10 articles on Wikipedia with highest PageRank
David F. Gleich (Sandia) Sensitivity Purdue 11 / 36
42

More Related Content

Viewers also liked

A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
David Gleich
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
David Gleich
 

Viewers also liked (20)

Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
Direct tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architecturesDirect tall-and-skinny QR factorizations in MapReduce architectures
Direct tall-and-skinny QR factorizations in MapReduce architectures
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Tall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architecturesTall-and-skinny QR factorizations in MapReduce architectures
Tall-and-skinny QR factorizations in MapReduce architectures
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph AlgorithmsMassive MapReduce Matrix Computations & Multicore Graph Algorithms
Massive MapReduce Matrix Computations & Multicore Graph Algorithms
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 

Similar to How does Google Google: A journey into the wondrous mathematics behind your favorite websites

Perkalian kelas 2
Perkalian kelas 2Perkalian kelas 2
Perkalian kelas 2
Ven Dot
 
12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory
Nigel Simmons
 
Sexy Maths
Sexy Maths Sexy Maths
Sexy Maths
sam ran
 
maths easy
maths easymaths easy
maths easy
sam ran
 
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 friSt Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
NICOLEWHITE118
 
2º tablas-multiplicar-mini
2º tablas-multiplicar-mini2º tablas-multiplicar-mini
2º tablas-multiplicar-mini
carolian4
 
The lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docxThe lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docx
oreo10
 
Lesson 1 solving linear equations
Lesson 1   solving linear equationsLesson 1   solving linear equations
Lesson 1 solving linear equations
Angela Phillips
 
Multiplication
MultiplicationMultiplication
Multiplication
hiratufail
 
Multiplication
MultiplicationMultiplication
Multiplication
msnancy
 

Similar to How does Google Google: A journey into the wondrous mathematics behind your favorite websites (20)

Math 5
Math 5 Math 5
Math 5
 
Aprendo las tablas de multiplicar
Aprendo las tablas de multiplicarAprendo las tablas de multiplicar
Aprendo las tablas de multiplicar
 
Perkalian kelas 2
Perkalian kelas 2Perkalian kelas 2
Perkalian kelas 2
 
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
Bayesian Inference (UC Berkeley School of Information; July 25, 2019)
 
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
2016 05-25- HPEDSB Making Math Contextual, Visual and Concrete
 
12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory12X1 T09 01 definitions & theory
12X1 T09 01 definitions & theory
 
Sexy Maths
Sexy Maths Sexy Maths
Sexy Maths
 
maths easy
maths easymaths easy
maths easy
 
RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby RedDot Ruby Conf 2014 - Dark side of ruby
RedDot Ruby Conf 2014 - Dark side of ruby
 
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 friSt Vincent de Paul Y5 Home learning W2 15.1.21 fri
St Vincent de Paul Y5 Home learning W2 15.1.21 fri
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
2º tablas-multiplicar-mini
2º tablas-multiplicar-mini2º tablas-multiplicar-mini
2º tablas-multiplicar-mini
 
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptxG10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
G10M-Q3-L1-Permutation-of-Objects-Grade 10.pptx
 
Skills ii
Skills iiSkills ii
Skills ii
 
The lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docxThe lengths of pregnancies are normally distributed with mean µ = .docx
The lengths of pregnancies are normally distributed with mean µ = .docx
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithms
 
Yr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptxYr7-AlgebraicExpressions (1).pptx
Yr7-AlgebraicExpressions (1).pptx
 
Lesson 1 solving linear equations
Lesson 1   solving linear equationsLesson 1   solving linear equations
Lesson 1 solving linear equations
 
Multiplication
MultiplicationMultiplication
Multiplication
 
Multiplication
MultiplicationMultiplication
Multiplication
 

More from David Gleich

More from David Gleich (8)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

How does Google Google: A journey into the wondrous mathematics behind your favorite websites

  • 1. How Does Google? ! ! David F. Gleich! Computer Science! Purdue University! A journey into the wondrous mathematics behind your favorite websites 1
  • 2. Mathematics underlies an enormous number of the websites we use everyday! 2
  • 3. 1.  ‘s PageRank 2.  Multi-armed bandits and internet experiments 3
  • 4. 4
  • 5. Larry Page ! Sergey Brin! •  Created a web-search algorithm called “backrub” •  Spun-off a company “Googol” based on the paper •  The importance of a page is determined by the importance of pages that link to it. Lawrence Page, Sergey Brin, Rajeev Motwani,Terry Winograd “The PageRank Citation Ranking: Bringing Order to the Web” TR, Stanford InfoLab, 1999 5
  • 6. A websearch primer 1.  Crawl webpages 2.  Analyze webpage text (information retrieval) 3.  Analyze webpage links 4.  Fit over 200 measures to human evaluations 5.  Produce rankings 6.  Continuously update 6
  • 7. Pages, nodes, incoming links, outgoing links, and “importance” 7 “Important” pages that link to me! c b a “Important” pages that link to Purdue!
  • 8. 8
  • 9. Tim Davis andYifan Hu Sparse Matrix Gallery
  • 10. http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html 1000 vertices on 8.5-by-11 paper 1,000,000,000,000 vertices (one trillion) Paper the size of Manhattan island ! (23 sq miles)? The web 10
  • 11. We need something better! 11
  • 12. A wee web-graph: link counting is too easy to game! 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 12
  • 13. A wee web-graph: link counting is too easy to game! 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 The importance of a page is determined by the importance of pages that link to it. x1 = 0 x2 = 1 3 x1 x3 = 1 3 x1 + 1 2 x2 x4 = 1 3 x1 + x3 + x5 x5 = x4 x6 = 1 2 x2 13
  • 14. The importance of a page is determined by the importance of pages that link to it xi = X j2Bi 1 dj xj “Back-links from page i” Why it was called Backrub! “Importance” of page i “Importance” of page j Number of links page j uses! out-degree in graph theory x3 = 1 3 x1 + 1 2 x2 1 2 3 1/3 1/2 14
  • 15. We can rewrite this equation in a more mathematically convenient way 1 1 2 3 4 5 6 2 1 2 3 4 5 6 3 1 2 3 4 5 6 4 1 2 3 4 5 6 5 1 2 3 4 5 6 6 1 2 3 4 5 6 x 0 x 0 x 0 x 0 x 0 x 0 x 1 x x 0 x 0 x 0 x 0 x 0 x 3 1 1 x x x 0 x 0 x 0 x 0 x 3 2 1 x x 0 x 1x 0 x 1x 0 x 3 x 0 x 0 x 0 x 1x 0 x 0 x 1 x 0 x x 0 x 0 x 0 x 0 x 2 = + + + + + = + + + + + = + + + + + = + + + + + = + + + + + = + + + + + 15
  • 16. 1 1 2 2 3 3 4 4 5 5 6 6 x x0 0 0 0 0 0 x x1/ 3 0 0 0 0 0 x x1/ 3 1/ 2 0 0 0 0 or x x1/ 3 0 1 0 1 0 x x0 0 0 1 0 0 x x0 1/ 2 0 0 0 0 ⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣ ⎦⎣ ⎦ ⎣ ⎦ x = Px And even more conveniently! Element k in column m = "probability" of going from node m to node k 16
  • 17. The matrix P for websites shows a lot of structure Every dot is a non-zero element indicating a link Matrices are sparse, and generally with block structure block structure can be explored to speed up ranking algorithm 17
  • 18. But this idea doesn’t work for the wee web-graph 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 Nodes 1, 4 and 5 determine everything! x1 = 0 x2 = 1 3 x1 x3 = 1 3 x1 + 1 2 x2 x4 = 1 3 x1 + x3 + x5 x5 = x4 x6 = 1 2 x2 x1 = 0 x2 = 1 3 x1 = 0 x3 = 1 3 x1 + 1 2 x2 = 0 x4 = 1 3 x1 + x3 + x5 = x5 x5 = x4 x6 = 1 2 x2 = 0 18
  • 19. But this idea doesn’t work for the wee web-graph 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 Node 1 ! “lonely” Nodes 4 and 5 ! “mutual admiration societies” Node 6 “anti-social” These nodes need to be “fixed” to get a reliable and useful ranking! 19
  • 20. The gang of four to the rescue Andrei Markov Oscar Perron Georg Frogenius Richard ! von Mises 20
  • 21. Let’s fix it up and force node 6 to choose, or link to everyone 1 2 3 4 5 6 P = 2 6 6 6 6 6 6 4 0 0 0 0 0 0 1/3 0 0 0 0 0 1/3 1/2 0 0 0 0 1/3 0 1 0 1 0 0 0 0 1 0 0 0 1/2 0 0 0 0 3 7 7 7 7 7 7 5 P = 2 6 6 6 6 6 6 4 0 0 0 0 0 1/6 1/3 0 0 0 0 1/6 1/3 1/2 0 0 0 1/6 1/3 0 1 0 1 1/6 0 0 0 1 0 1/6 0 1/2 0 0 0 1/6 3 7 7 7 7 7 7 5 21
  • 22. Taxation is the way to representation! c b a If is a good page, then it’ll still be a good page if we “tax” the importance from a, b, and c We can redistribute the taxed amounts to all including lonely nodes! 22
  • 23. The importance of a page is determined by the importance of pages that link to it* * After tax and any benefits The total importance that page j ! contributes to page i Benefits to page i The taxation rate of all xi = X j2Bi ↵ xj dj + (1 ↵)bi 23
  • 24. x1 x2 x3 x4 x5 x6 ! " # # # # # # # # # $ % & & & & & & & & & = α 0 0 0 0 0 1/ 6 1/ 3 0 0 0 0 1/ 6 1/ 3 1/ 2 0 0 0 1/ 6 1/ 3 0 1 0 1 1/ 6 0 0 0 1 0 1/ 6 0 1/ 2 0 0 0 1/ 6 ! " # # # # # # # $ % & & & & & & & x1 x2 x3 x4 x5 x6 ! " # # # # # # # # # $ % & & & & & & & & & +(1− α) b1 b2 b3 b4 b5 b6 ! " # # # # # # # # # $ % & & & & & & & & & Perron and Frobenius showed the new equation always has a unique solution x = ↵Px + (1 ↵)b 24
  • 25. 1 2 3 4 5 6 1/3 1/3 1/3 1/2 1/2 What von Mises and Richardson showed is that guess, check, and correct works! x(new) = ↵Px(old) + (1 ↵)b x(start) = 2 6 6 6 6 6 6 4 0.17 0.17 0.17 0.17 0.17 0.17 3 7 7 7 7 7 7 5 x(1) = 2 6 6 6 6 6 6 4 0.05 0.10 0.17 0.38 0.19 0.12 3 7 7 7 7 7 7 5 x(2) = 2 6 6 6 6 6 6 4 0.04 0.06 0.10 0.36 0.36 0.08 3 7 7 7 7 7 7 5 x(1) = 2 6 6 6 6 6 6 4 0.03 0.04 0.06 0.43 0.39 0.05 3 7 7 7 7 7 7 5 25
  • 26. 26
  • 27. There’s still a lot of work left to do to make a search engine Make it fast! Watch out for spam Watch out for manipulation Personalize Experiment! 27
  • 28. 1.  ‘s PageRank 2.  Multi-armed bandits and internet experiments 28
  • 30. http://upload.wikimedia.org/wikipedia/en/8/82/Las_Vegas_slot_machines.jpg This! Pays out ! $0.92/ dollar Pays out ! $0.98/ dollar Pays out ! $0.95/ dollar Pays out ! $0.99/ dollar 30
  • 31. What in the heck does a multi-armed bandit have to do with Google? 31
  • 32. What in the heck does a multi-armed bandit have to do with Google? Pays out ! $0.92/ view Pays out ! $0.66/ view Pays out ! $0.91/ view to show ads Pays out ! -$0.02/view hide ads 32
  • 33. How to optimize your website without exploiting the bandits Try condition A 100 times, find 45 “wins” Try condition B 100 times, find 85 “wins” Try condition C 100 times, find 10 “wins” … Choose the best! 33
  • 34. This field has some of the best terminology Explore ! Exploit ! Regret 34
  • 35. This field has some of the best terminology Explore – Visiting Las Vegas! Exploit – Your new winning strategy! Regret – That you didn’t quit after winning the first round 35
  • 36. This field has some of the best terminology Explore – Testing slot machines/ experiments for their reward Exploit – Playing the best reward you’ve found so far Regret – How much you lost due ! to exploration 36
  • 37. How to optimize your website without exploiting the bandits Try condition A 100 times, find 45 “wins” Try condition B 100 times, find 85 “wins” Try condition C 100 times, find 10 “wins” … Choose the best! Pure exploration! We only exploit our findings at the end! 37
  • 38. How to optimize your website exploiting the bandits Try condition A 5 times, find 4 wins! Try condition B 5 times, find 4 wins! Try condition C 5 times, find 2 wins Try condition A 7 times, find 3 wins! Try condition B 7 times, find 5 wins! Try condition C 1 time, find 0 wins Pure exploration! Exploit our knowledge Condition A B C Est. Return 0.58 0.75 0.33 38
  • 39. The goal of these problems is to construct optimal strategies to minimize regret Regret how much you left “on the table” by exploring zero-regret strategy is one where regret(T trials) is sublinear in T! as the number of plays T → ∞ E[play best always plays made based on data] regret 100-each 255/300 140/300 = 0.38 regret 30-mixed 25.5/30 0.45 ⇥ 12 + 0.85 ⇥ 12 + 0.1 ⇥ 6 = 0.31 39
  • 40. [The bandit problem] was formulated during the [second world] war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage. Peter Whittle (Whittle, 1979) Discussion of “Bandit processes and dynamical allocation indices” Their importance to website optimization, advertising, and recommendation has rejuvenated research on these problems with fascinating new questions. 40
  • 41. Math is everywhere and especially your favorite websites! Matrices and probability are key ingredients. 41
  • 42. PageRank on Wikipedia = 0.50 United States C:Living people France Germany England United Kingdom Canada Japan Poland Australia = 0.85 United States C:Main topic classif. C:Contents C:Living people C:Ctgs. by country United Kingdom C:Fundamental C:Ctgs. by topic C:Wikipedia admin. France = 0.99 C:Contents C:Main topic classif. C:Fundamental United States C:Wikipedia admin. P:List of portals P:Contents/Portals C:Portals C:Society C:Ctgs. by topic Note Top 10 articles on Wikipedia with highest PageRank David F. Gleich (Sandia) Sensitivity Purdue 11 / 36 42