Peters matthew periodictableseo

Modern On Page Factors
1
SMX Advanced
Matthew Peters, PhD
matt@moz.com @mattthemathman

4
“Relevance” vs “Ranking”
Conceptually “relevance” determination and “ranking” can be thought of a two
different steps (even if they are implemented as one in a search engine)

5
Relevance

6
Relevance
Ranking
1
2

7
Is this page relevant to “philadelphia phillies”?

8
query-body similarity: 0.74

9
query-body similarity: 0.74
query-title similarity: 0.8
query-H1 similarity: 1.0
etc …

10
Measuring query-document similarity
Goal: given query + document string, compute “similarity”

11
See “Introduction to Information Retrieval” by Manning et al:
http://nlp.stanford.edu/IR-book/
> 700
papers
Goal: given query + document string, compute “similarity”

12
“philadelphia phillies”
In this context “document” can also refer to title tag, meta description, H1, etc.
0.74

13
Query Model
tokenization
normalization (stemming)
query expansion
intent
0.74

14
Query Model
tokenization
query expansion
intent
Document Model
tokenization
vector space representation
language model
0.74

15
Query Model
tokenization
query expansion
intent
Document Model
tokenization
vector space representation
language model
Scoring function
0.74

16
Query representation
Language identification
Word segmentation
(Japanese, Chinese)
Tokenization + normalization
{reviews, reviewer, reviewing} -> review
Spelling correction

17
Word segmentation
(Japanese, Chinese)
Query expansion
User intent (transactional,
navigational, informational)
Local
Classification
(images, video, news)
Spelling correction

18
Word segmentation
(Japanese, Chinese)
Query expansion
User intent
(transactional, navigational, i
nformational)
Local
Classification
(images, video, news)
Topic Model (LDA)
Entity extraction
Spelling correction

Document representation
TF-IDF

TF-IDF Language Model
P(optimization | search, engine)
>>
P(walking | search, engine)

Probability Ranking Principle
P(R = 1 | d, q) or P(R = 0 |
d, q)
TF-IDF Language Model
P(optimization | search, engine)
>>
P(walking | search, engine)

Which method performs best?
What are the characteristics of sites that rank highly?
14,000+ keywords
Top 50 results
600,000 URLs
Google-US, no personalization
March 2013
Mean Spearman Correlation
Remember: “correlation is not causation”

Which method performs best?
We tried a few different types of smoothing for the language model,
Dirichlet worked best (Zhai and Lafferty SIGIR 2001)

Impact of stemming
Porter stemmer provided a slight increase in correlations

These correlations are still relatively low compared to other factors

50 results
450
random
pages
movie reviews

50 results
450
random
pages
movie reviews For each
query:500 pages
10% relevant
90% irrelevant

50 results
450
random
pages
query:500 pages
10% relevant
90% irrelevant
URL ID PA In SERP?
86 92 1
355 90 0
… … …
27 18 0
URL ID Language
Model
In SERP?
213 0.97 1
156 0.95 1
… … …
355 0.06 0

50 results
450
random
pages
query:500 pages
10% relevant
90% irrelevant
URL ID PA In SERP?
86 92 1
355 90 0
… … …
27 18 0
URL ID Language
Model
In SERP?
213 0.97 1
156 0.95 1
… … …
355 0.06 0
P@50 is the “Precision of the top 50 results”. It is the percentage of top 50
results by PA/Language Model that are actually in the SERP.
Top 50
ranked

Takeaways
Implication: Query-document similarity is based on decades of
research. It’s immune to algorithm change.

Takeaways
Action item: With sophisticated query and document models, no
need to optimize separately for similar words, e.g. “movie
reviews” vs “movie review”.

Takeaways
Action item: Each page is relevant to many different keywords,
so optimize each page for a broad set of related keywords,
instead of a single keyword.

Takeaways
Action item: Each page is relevant to many different keywords,
so optimize each page for a broad set of related keywords,
instead of a single keyword.
Use case: Content creation. What keywords will this new blog
post target? Is it relevant to a set of queries?

Thanks for watching!
Matthew Peters
matt@moz.com @mattthemathman
35

Peters matthew periodictableseo

Recommended

Recommended

More Related Content

Similar to Peters matthew periodictableseo

Similar to Peters matthew periodictableseo (20)

Recently uploaded

Recently uploaded (20)

Peters matthew periodictableseo