3. @RyanJones
The Future of SEO will be different
Source Article: https://www.blog.google/products/search/improving-search-next-20-years/
4. @RyanJones
The Future Is Now
Source Article: https://www.blog.google/products/search/introducing-google-discover/
5. @RyanJones
Why We Search Isn’t Changing
Search is about
Verbs. No, not just
the words on a page.
Search is about
helping users
accomplish a task.
“Do” something.
Photo Credit: Shutterstock.com
7. @RyanJones
Crawling (Googlebot)
• Discovery (Finding URLs)
• Robots.txt, response time, crawl schedule, etc.
Indexing (Caffeine, WRS)
• Extracting content and other ranking signals
• Paragraph vectors, B+ trees, canonical, Javascript, etc.
Retrieval (relevance)
• How relevant is this to my query?
• Query Understanding, entities, synonyms, etc
Ranking (Post Retrieval)
• Order by pagerank desc limit 10
• Penalties, Speed, HTTPS, Mobile Friendly, etc.
How A
Search
Engine
Works
Note: Google
doesn’t separate
these. This is for
illustration
8. @RyanJones
Crawling, Indexing, Rendering, Oh My!
Crawler Index RendererInitial Crawl
Renders URLs
Finds New content (JS)
URLs Found after rendering must be re-crawled
Chrome 41
10. @RyanJones
What The Crawler Sees
The first step of Crawling is to discover what pages exist on the web.
• Start with known pages in the index
• Discover new links on those pages
• Augment with sitemap data
The Crawler….
• Can’t crawl pages blocked with robots.txt
• Can only crawl pages accessible to anonymous users (no cookies, login, etc)
• Respects canonicals, alternate, hreflangs, etc.
• Doesn’t see JavaScript content.This is only seen after the page is rendered.
Tip: Use Search Console to request crawls of pages, and check crawl stats in the index coverage report
11. @RyanJones
Problems with Rendering
If your content is lazy loaded this way, or requires a user action,
search engines will not see it.
The renderer does not:
• Click
• Hover
• Scroll
• Focus
• MouseOver
• Etc…
12. @RyanJones
“It’s not what you
look at that matters,
it’s what you see”
- @jennyhalasz yesterday quoting
Henry DavidThoreau
13. @RyanJones
Lazy Loading Images
<OBJECT data=”Gilleys.png"type="image/png">
<img srcset=”320w.jpg 320w, 480w.jpg 480w,
800w.jpg 800w" sizes="(max-width: 320px) 280px,
(max-width: 480px) 440px, 800px" src=""
alt=”best bar in dallas">
“it’s not what you look at that matters, it’s what you see” - @jennyhalasz quoting Henry David Thorou
14. @RyanJones
An “index” is just a list.
Keyword Score Document ID (the web page)
KoeWetzel 5 282016
JoshAbbott 9 146
WilliamClarkGreen 7 7849
Casey Donahew 8 648
Robert Earl Keen 10 65467
TurnpikeTroubadours 2 38
15. @RyanJones
“The Index” is more like “the indices”
There’s multiple indexes.
There’s also multiple features/stages of indexing.This includes things like:
• Tokenization
• Sentence segmenting
• Spell checking
• Entities
• Natural language processing
17. @RyanJones
Indexing
We typically think of indexing
in tables.
But it can also be done with
vectors.
Keyword Occurrence
The 1557
Road 98
Goes 72
On 435
Forever 17
Kind of like a word cloud but with math.
18. @RyanJones
Tokens & N-Grams
If you haven’t climbed up to Enchanted Rock,Drank a cold Shiner down in Luckenbach,Taken your
baby to the River Walk,Then you ain’t seen MyTexas yet.
Tokens (Unigrams) If,you,haven’t,climbed,up,to.enchanted,rock,drank,a,cold,shiner,down,in,luckenbach,taken,yo
ur,baby,to,the,river,walk,then,you,ain’t,seen,my,Texas,yet
Bigrams If you, you haven’t, haven’t climbed, climbed up, up to, to enchanted, enchanted rock, rock drank,
drank a, a cold, cold shiner, shiner down, down in, in luchenback………
Trigrams If you haven’t, you haven’t climbed, haven’t climbed up, climbed up to, up to enchanted, to
enchanted rock, a cold shiner, drank a cold, cold shiner down, down in luckenback, taken your baby,
your baby to, baby to the, to the river, the river walk, river walk then, walk then you, then you ain’t,
you ain’t seen, myTexas yet
Note: I skipped a few bigrams and trigrams here, but you get the point – I hope.
Note 2: Lyrics, Josh Abbott Band – MyTexas
19. @RyanJones
Indexing
Zipf's law states that given a
large sample of words used,
the frequency of any word is
inversely proportional to its rank
in the frequency table.
20. @RyanJones
A Quick TF-IDF Rant
IDF is a quick way of seeing
which words offer little value
(the, of, and or,)
Tf-IDF adjusts this based on
the frequency of the words
used.
You’ll notice there’s no actual subtraction in tf-idf.
All of this stuff has to do with indexing and relevance – NOT RANKING.
22. @RyanJones
More on TF-IDF Word Term
frequency
Document
frequency
The 44 3
road 6 1
goes 11 2
on 16 3
forever 4 2
Word Doc 1 Doc 2 Doc 3
The 17 24 3
road 6 0 0
goes 3 0 8
on 5 7 4
forever 0 3 1
Word idf
The 0
road 1.098
goes 0.405
on 0
forever .405
Word Doc 1 Doc 2 Doc 3
The 0 0 0
road 6.588 0 0
goes 1.215 0 3.24
On 0 0 0
forever 0 1.215 .405
23. @RyanJones
A Real-World Example
Word Doc 1 Doc 2 Doc 3
car 0.88 .09 .58
auto .10 .71 0
insurance 0 .71 .70
best .46 0 .41
Data here is based on the Reuters-RCV! Collection included with my introduction to information retrieval textbook
Best Car: doc1
Car Insurance: doc 3
Best Insurance: doc3
Auto Insurance: doc2
25. @RyanJones 24
Indexing
What the crawler sees.
Content on the page.
Crawlable links
Markup
Disavow/robots/etc
Mobile Friendly
Viewport
Font sizes
Content scale
Links/Buttons clickable
No overlays / popups
Redirect/canonical/alternate
Page speed
Mobile Indexing Vs Mobile Friendly
27. @RyanJones
Retrieval
Explicit Signals:
• “What the user thinks they want”.
• The user’s actual query
• Search operators
• Language used
Implicit Signals:
• “What the user needs”.
• Searcher Intent.
• QueryType
• Information
• Transactional
• Navigational
• Synonyms
• ResultType (images, web, etc)
This is basically rankbrain at work
Image Source @dannysullivan
https://twitter.com/dannysullivan/status/1044274915388481537
28. @RyanJones
What Makes Up ranking?
• Rankbrain: Understanding the search query
• Panda: Content Quality
• Penguin: Link Spam
• Pidgeon: Local Spam
• Pirate: Copyright Infringement
• Top Heavy: Ads
• Mobile Friendly
• Core Factors:
• Pagerank
• On-site signals
• Authority
29. @RyanJones
PageRank
“domain authority” is NOT a thing Google uses. (at least the way we think of it.)
At a high level,The pagerank of a page is the sum of: the pagerank of every page that
links to it, divided by the number of links on that page.
The actual calculation uses a dampening factor (d) (usually around .85) to simulate
users randomly leaving the website for another site.
30. @RyanJones
What is A Core Algorithm Change?
Read my article on SEJ about the core algorithm update: https://www.searchenginejournal.com/what-is-a-google-broad-core-algorithm-update/264261
1. Pagerank
2. TitleTag
3. H1
4. BoldText
5. Internal Anchors
6. Speed
7. HTTPS
8. Linking to wtfseo.com
1. Pagerank
2. H1
3. HTTPS
4. Internal Anchors
5. Speed
6. TitleTag
7. BoldText
8. Linking to wtfseo.com
Think of core algorithm
change as Google “shuffling”
the order and importance of
their hundreds of ranking
factors
In reality, it’s likely more than this. E.G changing the decay value in the Pagerank calculation or changing the retrieval method or a change to how
synonyms are weighted or word vectors calculated, etc.
31. @RyanJones
Quality Raters & Algorithm Changes
Quality Raters rate algorithm changes against each other.They’re part of how Google
tests algorithm changes.
Quality Raters DO NOT:
• Penalize your website
• Affect your site’s ranking / relevance
32. @RyanJones
Penalties.
I made this image ~10 years ago. I miss Matt.
It’s !
Not !
a !
penalty !
every !
time !
rankings !
drop. !
Think of penalties
(manual and
algorithmic) as
applied AFTER the
retrieval and
ranking phase.
33. @RyanJones
Who to follow / where to learn more.
ME! Follow Me! @RyanJones.
(seriously, what did you expect here?)
But also, buy these books