This document discusses technical SEO techniques including prefetching and prerendering to improve page load times, using pushState to allow AJAX content to be crawled by search engines, and using crawling and grepping tools to analyze websites for patterns. It recommends using tools like HTTrack to crawl websites and grepWin to search crawled pages for text matches or regex patterns in order to discover things like analytics code, structured data, nofollow links, and more.
10. 1. prefetch and prerender
Prefetch
Downloads the file you request and holds it until clicked. You
can request multiple files.
<link rel="prefetch" href="/images/big.jpeg">
<meta http-equiv="Link" content="</images/big.jpeg>; rel=prefetch">
[HTTP header] Link: </images/big.jpeg>; rel=prefetch
Prerender
Downloads and fully renders the page (including
CSS, JavaScript etc.) and holds for 30 seconds, waiting for the
click.
<link rel="prerender" href="http://example.org/index.html">
https://developer.mozilla.org/en/docs/Link_prefetching_FAQ
https://developers.google.com/chrome/whitepapers/prerender
11. 1. prefetch and prerender
Areas where you could make use of prefetch or prerender:
Checkout areas
Simple multi-page forms
Multi-page articles
12. 1. prefetch and prerender
Use analytics to identify areas where you can use
13. prefetch and prerender
Implementation requires consideration and testing. The system
should not be overused and can cause side effects:
Analytics problems – registering page views incorrectly
Bandwidth problems – overuse could slow down your site
Bandwidth problems – think about external sites too
14. 2. SEO friendly AJAX
AJAX can be good for user experience, bad for search
Previous hashbang #! solution was rubbish
HTML5 includes a JavaScript function called pushState()
Address bar URL, title and history can be changed
If your web application fails
in browsers with scripting
disabled, Jakob Neilsen’s
dog will come to your house
and shit on your carpet.
15. pushState
All major browsers now support pushState
Bing recommends pushState over hashbangs
Google encourages webmasters to look into it
16. pushState
Get the speed benefit of AJAX
Allow users to bookmark and link to AJAX content
Allows users to use their back button
Keeps content accessible to all, including search engines
17. 3. Supercharge your crawling
SEOs need to crawl web sites
Screaming Frog is awesome
Power Mapper is also awesome
There are many others
Most have big limitations
http://www.screamingfrog.co.uk/seo-spider/
http://www.powermapper.com/
21. Grepping
Grep allows us to search for a string of
characters using regex patterns
Blekko allows you to grep the whole internet:
http://blekko.com/webgrep?status=completed
22. Blekko Grep The Web
Thing 872,652 URLs 1,173 Domains
Person 101,078,860 URLs 56,603 Domains
Product 31,437, 841 URLs 33,539 Domains 1,664,598
Domains
23. Crawl and grep
Crawl files
Checks for patterns
Disguard files
Crawl files
Save files
Check for patterns
Conventional Crawlers Crawl and grep
24. How to do it
Install HTTrack
http://www.httrack.com/
Crawl the site using httrack and save the files
Install grepWin
http://tools.tortoisesvn.net/grepWin.html
Grep the pages using grepWin.
Search for text matches or regex patterns
Build a library of regex patterns for future use
25. What you can discover
Check server logs
Grep competitor pages
Grep link prospects
Pagination with Rel=”prev” Rel=”next”
Authorship Rel=”author”
Schema.org markup
Other structured data
Iframes
Nofollow links
Analytics tracking code
Like buttons
Twitter cards, Facebook Markup
Anything else that you want, whenever you want
26. Grepping found…
Top 100 pages for “car insurance” on Google.co.uk
“car insurance” used 2011 times. Average 20 times per page.
6 pages use schema.org, 4 of them for product
52 use canonical link element, 2 use it twice!
1 site uses prefetch
40% of pages on The Daily Mail website contain the word
“immigrant” (sample of 2772 pages)
6% on The Guardian website (sample 3442 pages)
F*** used on 7% and C*** used on 3.5% Guardian pages
Never used on The Daily Mail
27. Lorem ipsum dolor sit
Default text to use during design is “Lorem ipsum dolor…”
I checked the Conservative website. They have one page
which seems to be neglected: