Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
INFINITE LOOPS
& crawl rank
DIRTY ARCHITECTURE
Dawn Anderson
CAME
INDUSTRY
VIA A DIFFERENT ROUTE
THIS
to
I decided to add an
additional dimension
to the site
TO ‘EXPLODE’ NATURAL
SEARCH TRAFFIC
1.5 Million
URLs
Crawl Rate
Going Down
Indexation
Levels
Going Up
GOOGLE
Only crawling
0.1%Of our pages per
day
Infinite Loop Definition:
An infinite loop is a sequence of
instructions in a computer program which
loops endlessly, eith...
PENGUIN & PANDA
updates came along
TOO MANY URLS
=SEO DEATH
‘WE’RE ALL ‘DOOMED’’
Budget
CRAWL
Roughly proportionate to PageRank
Pages with a lot of links get crawled more
Still applies in current search ...
Rank
CRAWL
A ranking metric for ‘no’ to ‘low’ PageRank
pages??
Pages crawled more often rank higher
Get ‘low’ to ‘no’ Page...
CRAWL
OPTIMISATION
Googlebot goes
AND KEEP
WATCHING
FIND OUT WHERE
CHECK & MONITOR
for over-indexation
500 Page
Website
50,00 URLs in
Google
YOU MAY HAVE DODGY CODE
Shoes.sitemap.xml
Dresses.sitemap.xml
tshirts.sitemap.xml
Check THOROUGHLY, Name
& Categorise XML Sitemaps
yoursite.sitema...
DON’T BE AFRAID
of hard 404’s
Use 410’s where
you can
Giraffe
AVOID
soft 404’s
ENSURE THAT
Dynamic variables / parameters
are checked for validation
Don’t render to just any old
thing with a ‘200 OK’ r...
AVOID A ‘JUMBLE SALE’
BUT
Use Robots.txt,
nofollows, sitemaps,
nav paths & cross
module
internal
linking
‘Herd’ Googlebot
Get Those Low Level
Pages Crawled - Often
Whichever way
you can
Pass equity to
Siblings as
Well as children
Visit the internal links section on GWT
Most Important Page 1
Most Important Page 2
Most Important Page 3
IS THIS YOUR
BLO...
CANONICALISATIONIn web search and search engine optimization (SEO), URL
canonicalization deals with web content that has m...
Deal Well With
Near & near
duplicate content
Via
canonicalization,
301’s & Content
Build Out
STOP LYING & ‘GET
FRESH’
Genuine ‘last
modified dates’
are ALL important
- FORGET PRIORITY
"It's not that Google will
penalize you, it's the
opportunity cost for dirty
architecture based on a finite
crawl budget" ...
Me
@dawnieando
Infinite Loops Dirty Architecture And Too Many Indexed URLs
Nächste SlideShare
Wird geladen in …5
×

Infinite Loops Dirty Architecture And Too Many Indexed URLs

2.582 Aufrufe

Veröffentlicht am

Dawn Anderson's Brighton SEO deck from April 2014. Looks at crawlability issues on large sites and in particular to infinite URLs / infinite loops, dirty architecture and too many indexed URLs.

There is a blog post / article that I wrote for the Brighton SEO newspaper which covers the information in this deck in a lot more detail.

It is here:

http://bit.ly/Ss6Lf1

Veröffentlicht in: Marketing, Technologie, Design
  • Als Erste(r) kommentieren

Infinite Loops Dirty Architecture And Too Many Indexed URLs

  1. 1. INFINITE LOOPS & crawl rank DIRTY ARCHITECTURE Dawn Anderson
  2. 2. CAME INDUSTRY VIA A DIFFERENT ROUTE THIS to
  3. 3. I decided to add an additional dimension to the site TO ‘EXPLODE’ NATURAL SEARCH TRAFFIC
  4. 4. 1.5 Million URLs
  5. 5. Crawl Rate Going Down Indexation Levels Going Up
  6. 6. GOOGLE Only crawling 0.1%Of our pages per day
  7. 7. Infinite Loop Definition: An infinite loop is a sequence of instructions in a computer program which loops endlessly, either due to the loop having no terminating condition, having one that can never be met, or one that causes the loop to start over. ..
  8. 8. PENGUIN & PANDA updates came along
  9. 9. TOO MANY URLS =SEO DEATH ‘WE’RE ALL ‘DOOMED’’
  10. 10. Budget CRAWL Roughly proportionate to PageRank Pages with a lot of links get crawled more Still applies in current search landscape
  11. 11. Rank CRAWL A ranking metric for ‘no’ to ‘low’ PageRank pages?? Pages crawled more often rank higher Get ‘low’ to ‘no’ PageRank pages crawled more than competitors = YOU WIN
  12. 12. CRAWL OPTIMISATION Googlebot goes AND KEEP WATCHING FIND OUT WHERE
  13. 13. CHECK & MONITOR for over-indexation 500 Page Website 50,00 URLs in Google YOU MAY HAVE DODGY CODE
  14. 14. Shoes.sitemap.xml Dresses.sitemap.xml tshirts.sitemap.xml Check THOROUGHLY, Name & Categorise XML Sitemaps yoursite.sitemap.xml
  15. 15. DON’T BE AFRAID of hard 404’s Use 410’s where you can Giraffe AVOID soft 404’s
  16. 16. ENSURE THAT Dynamic variables / parameters are checked for validation Don’t render to just any old thing with a ‘200 OK’ response code or return a soft 404 HOW WILL YOU KNOW IF THERE’S A PROBLEM? You won’t
  17. 17. AVOID A ‘JUMBLE SALE’ BUT
  18. 18. Use Robots.txt, nofollows, sitemaps, nav paths & cross module internal linking ‘Herd’ Googlebot
  19. 19. Get Those Low Level Pages Crawled - Often Whichever way you can Pass equity to Siblings as Well as children
  20. 20. Visit the internal links section on GWT Most Important Page 1 Most Important Page 2 Most Important Page 3 IS THIS YOUR BLOG?? HOPE NOT
  21. 21. CANONICALISATIONIn web search and search engine optimization (SEO), URL canonicalization deals with web content that has more than one possible URL. Having multiple URLs for the same web content can cause problems for search engines - specifically in determining which URL should be shown in search results.[2] Example: •http://wikipedia.com •http://www.wikipedia.com •http://www.wikipedia.com/ •http://www.wikipedia.com/?source=asdf All of these URLs point to the homepage of Wikipedia, but a search engine will only consider one of them to be the canonical form of the URL.(source - Wikipedia)
  22. 22. Deal Well With Near & near duplicate content Via canonicalization, 301’s & Content Build Out
  23. 23. STOP LYING & ‘GET FRESH’ Genuine ‘last modified dates’ are ALL important - FORGET PRIORITY
  24. 24. "It's not that Google will penalize you, it's the opportunity cost for dirty architecture based on a finite crawl budget" (A.J.Kohn) (BLIND FIVE YEAR OLD) REMEMBER THIS
  25. 25. Me @dawnieando

×