Tune in for Portent SEO Marianne Sweeny’s January webinar: “How to SEO a Terrific – and Profitable – User Experience.” Learn how search engine algorithms are now incorporating IA, UX and content strategy, as well as methods for directing Google, Bing & Co. to perform better for your users.
4. What We’ll Cover Today
Locked in Technology
• Information Retrieval model
• Term frequency
Search Engine Optimization
• Algorithms
• User dependency
Search engine retaliation
• Panda
• Penguin
User Experience and SEO
• Interaction Design
• Site Structure
• Navigation
• Content
• Links
Bs of New SEO
Key takeaways
6. IR is Locked-in Technology
Digital Maoism: Panda
focuses on abstraction
more than real people
Computational-ism:
Penguin sees a world that
can be understood by
computer processes with
humans as sub-processes
6
7. Traditional IR Model
System User
Acquisition Problem
documents, objects information need
Representation Representation
indexing, ... question
File organization Query
indexed documents search formulation
feedba
Matching
ck
searching
Retrieved objects
7
9. Term Frequency/Document Frequency
t = how many times the query appears in a document
T = total number of terms in a document
D = set of all documents
d = number of documents with that term
Pick the document with the highest
10. All Time Algorithm Hits
The sounds of
crying and teeth
grinding inspire
me…
10
11. Hilltop Algorithm
• Quality of links more important than quantity of
links
• Segmentation of corpus into broad topics
• Selection of authority sources within these
topic areas
12. Topic-Sensitive PageRank
Pre-query calculation of factors based on
subset of corpus
• Context of term use in document
• Context of term use in history of queries
• Context of term use by user submitting
query
Consolidation of Hypertext Induced Topic
Selection [HITS] and PageRank
17. A Year in the Life of Panda Updates
February 2012 June/ July 2012
• Expanded image results (ALT text) • New improved ranking function (NS)
• More accurate detection “official • Improved image search (on topic)
pages” • Improved safe search (NS)
• Improved safe search (NS) • Find more high quality content from
• Freshness (3) trusted sources (2)
• Non-custom 404 detection • Freshness improvements
• Use of Web history for refinement • Data refresh for Panda high quality
• Demote link analysis (NS) sites algorithm
• Page quality: help finding high quality
March 2012 pages with unique content
• Panda launch Japan/Korea
• Symbol indexing • Freshness bug fix
• Google+ profile indexing
• Speed up offline Panda processing
• High quality sites algorithm update (NS)
• IMG search refinements
• Anchor text refinement (Penguin preview)
• Freshness (video, “stale results”
• Synonyms update
• Site quality detection update (Panda)
18. More Search Updates 2012
August/September 2012 October/November 2012
• Find more high quality content from • Related query refinements
trusted sources • Parked domain detection algorithm
• Term proximity scoring • Deeper indexing (more “long-tail”
• Handling stale content – more granular documents)
based on document age • Wider auto-complete suggestions
• <title> and content semantic match (search suggest)
improvement • Fresher, more complete blog search
• Bug fix to way links are used in ranking results
(Penguin) • Duplicate content detection
• Panda high quality sites data refresh refinement (better original content
(2) signals)
• Improved video indexing (NS) • Better image freshness determination
• Improved cursor aware predictions (CN, (news queries)
JPN, KO) • Top Results code refinements: not too
• PDF optimization many from same site, e.g. host
• Results based on document freshness crowding, easier to maintain (and
quicker to change)
19. Shut up and stop
thinking kid. Talk like
that can get you de-
indexed.
Sarge, this link
war doesn’t
make sense?
Hey, how
come
you’re not
wearing
pants?
20. Penguin Basics
Spam detection update/algorithm
Targets “over optimization”
• Anchor text distribution
• Inbound link distribution
• Cluster analysis
Good links are (according to Google)
• From respected sites/brands/people
• Exist on popular pages
• Provide value to user
• Are contained in the page text (not sidebar components)
• Are not duplicated too many times throughout linking site
30. Content
• Intersection of what you have/do with how
customers look for what you have/do
• Use online tools to mine customer search
behavior around offerings
• Check to see if you have relevant content
and fill gaps
31.
32. Content: Relational Content Modeling
• Guided Tours: built on analysis of other user
pathways and knowledge of corpus
• Produced Views: page of assembled content
items focused on a single subject
• Task List Drop Downs: “I Want To…” links to
pages of assembled content focused on single
common task
33. Content: Relational Content Modeling
• Related Links: related as in “next steps” not
what Marketing wants to be a next step
• Best Bets: editorially assigned result that may
not be chosen by the search engine
34. Links
Links are intended as a human-mediated
relationship between sites (not a one night stand)
Source and destination must share contextual
relationship
Create a positive experience with links to
information users may not find, may not know
they need
Build link-based relationship model of relevance
• Create or find Authority resources (on and off site)
• Connect to HUBS
36. Be Focused
• Be realistic about where you can place visibly
• Let every page be about some “thing”
• Focus the user attention
• Maximize the real estate to maximize user focus
37. Be Social
Google+ & Google Local
Other Social channels (FB, Twitter, Linkedin)
38. Be Connected
• Connect to related content on and off the site
• Make it easy for users to connect to you
• RSS feed/subscription
• Alerts
• Social channels
• Link building is out, link “earning” is in
• Keep them engaged and on the site
• Make sure site search works
39. Be Aware
Search engine Webmaster accounts to look at
• Indexing profile
• link to the site
Competitive Landscape
• Review competitors placement for certain phrases
• Use SEMrush or Majestic SEO to review key pages
against yours
Emerging Authorities
• Look at the SERP for your phrases, who holds the top
spots, those are the authorities to the search engine
Customer behavior
• Site analytics
• Google Trends & Yahoo Clues
40. Key Takeaways
• Search engines are not smart
• Search engine companies have an agenda
• User Experience was an early influencer of relevance
• User Experience is a recent influencer of relevance
• Search is a user experience
• Creating an optimal user experience is the new SEO
IR’s locked in legacies are centered on text deconstructionthe capacity for sequential instructions to derive meaning,its reliance on systems that do not scale well and while incorporating human behavior, do not fully understand it“Whenever a computer is imagined to be intelliget, what is really happening is that humans have abandoned aspects of the subject at hand in order to remove from consideration whatever the computer is blind to.” You Are Not A Gadget – Jaron Lanier
Slide from LIS 544 IMT 542 INSC 544 by Jeff Huang lazyjeff@uw.edu and Shawn Walker stw3@uw.edu The document with the highest proportion of terms which are part of the query is most relevantDocuments containing more of the term(s) scored higher Longer documents discounted Rare terms weighted higher
PageRank: based on Citation modelBased on Random Surfer modelLinks to page must be votes for quality (surf-worthy)PageRank is a pre-query valuationBased on number of links to the page 1 link=1 voteMost votes wins top placementHas no relationship to the subject of the queryFailings soon uncoveredLink farmsGooglehacks
Computes PR based on a set of representational topics [augments PR with content analysis]Topic derived from the Open Source directoryUses a set of ranking vectors: Pre-query selection of topics + at-query comparison of the similarity of query to topics
Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009Pew Internet Trust Study of Search engine behaviorhttp://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012/Summary-of-findings.aspx56% constructed poor queries55% selected irrelevant results 1 or more times38% overwhelmed by amount of information in results34% found critical information missing from results
Panda update: started rolling out in February 2011Rumored to be response to JC Penney and Overstock incidentsClick throughBounce RateConversion
February 2011: algorithm focused on content quality - originally thought to be aimed at content farms June 2011: update to identify scraped or duplicated contentOctober 2011: unannounced update to rectify site “unfairly impacted” by original updatesJanuary 2012: sites with too much ad space above the fold are devaluedThe slide lists approximately 10% of the changes that Google told us about and what they tell us about likely represents .10% of the changes that they actually make. (source: http://insidesearch.blogspot.com)Re: freshness bug fix: “This change turns off a freshness algorithm component in certain cases when it should be affecting the search results.”Will serve up the newer document when choosing between two (from a given site)
http://insidesearch.blogspot.com/2011/11/search-using-your-terms-verbatim.htmlSince then, we’ve received a lot of requests for a more deliberate way to tell Google to search using your exact terms. We’ve been listening, and starting today you’ll be able to do just that through verbatim search. With the verbatim tool on, we’ll use the literal words you entered without making normal improvements such asmaking automatic spelling correctionspersonalizing your search by using information such as sites you’ve visited beforeincluding synonyms of your search terms (matching “car” when you search [automotive])finding results that match similar terms to those in your query (finding results related to “floral delivery” when you search [flower shops])searching for words with the same stem like “running” when you’ve typed [run]making some of your terms optional, like “circa” in [the scarecrow circa 1963]
Overly repetitive anchor text (“manipulative repetitive anchor text”)Blog comments filled with spam (reviews/comments that contain links to “spam”) – Google’s definition of spam similar to Supreme Court for porn…cannot articulate, just knows it when it sees itObscene contentWeb “clusters” – multiple Web sites on the same host, from same domain owner, linking to article in artificial way
Targets “exact match” keyword-ed links or aggressive anchor text to googlesites penalized had “moneyed keywords” in 65% of their incoming linksObviously aimed at the long standing practice of outsourcing link building to 3rd world countries and the weed-like growth of useless directories (i.e. link farms)Too many links from “related sitesSame nicheSame domain hostSame domain ownerStandard signalsStuffed <title> and metaDescriptionHidden textUnrelated links on and pointing to the pageComputer generated text (i.e. dynamically rendered product pages)
We move from trying to optimize search engine behavior to optimizing what the search engines consumeMove from search engine optimization to information optimizationFocusCollaborativeConnectedCurrent
Where am I?What is here?Where do I go next?
What the user wants is RelevanceNo recriminationsA starting point
Let the user driveEmpower explorationToolsFacetsFiltersMore Like This…Jared Spool did a site search study some time ago that found users successful 37% of the time when using site search and 50+% of the time when navigatingUsers don’t like navigation at the outset but will use it if contextual and in a form that they can influence
Users gravitate towards search boxSearch suggest takes them where you know is bestToolsSuggestions as query is enteredAt page search boxOn search pageSpell check/correctionBest BetsAugmented Search resultsAwardsDisplay PageRank scoreSharingUser Ratings
Distance reflects relevanceURL Depth: the further from the homepage, the less important it must beClick Distance: the further from an authority page, the less important it must bePage Structure Now a FactorGoogle Page Segmentation Patent: Determining Semantically Distinct Regions of a DocumentBased on eye-tracking studies and user behaviorSimilar Yahoo patentURLsKeywords found in URLs are weighted for relevanceHyphens as separators is bestToolsFlat structure within CMSAnalyticsCross linking
Distance reflects relevanceURL Depth: the further from the homepage, the less important it must beClick Distance: the further from an authority page, the less important it must bePage Structure Now a FactorGoogle Page Segmentation Patent: Determining Semantically Distinct Regions of a DocumentBased on eye-tracking studies and user behaviorSimilar Yahoo patentURLsKeywords found in URLs are weighted for relevanceHyphens as separators is bestToolsFlat structure within CMSAnalyticsCross linking
Stuart Brand in his book “How Buildings Learn” advised waiting to put in walkways around the building so that you can see where the pathways form on the grass and groundUsers will tell you how they want to get to contentUsers suffer “navigational blindness”They are not going to be there that longSearch crackFish-eyed viewUsers lose orientation on the siteHelp them to stay grounded without using Back ButtonTry not to be too clever with labeling – clear, common sense labels for sectionsDo not give them too much (Paradox of Choice)Entice them to exploreToolsGoogle Analytics visitor flowGoogle Insights for SearchBreadcrumbsHTML site map
Tools:Core Metadata: 20-30 terms that represent intersection between client objectives and how their customers search for the product/serviceContent analytics: top pages, bounce rate, visitor flowContent audit: keep/kill/revise Relational content model: Next Steps as well as More Information using: guided tours, Best Bets, produced view, etc
Not all links are created equal. Links between pages that share context are worth more (Hilltop and HITS algorithms)DMOZ feeds the Google Directory and is rumored to be the ontology of the Web
Be realistic about what you can place visibly forKeyword researchSearch behavior researchNo thing can be every thing to every oneLet every page be about some “thing”Focus the user attentionThey don’t really notice the sidebars. Do not put important stuff thereNewspaper model – broad topic lead paragraph to more detailInclude important links inline (including navigation)Maximize the real estate to maximize user focus
Not all social channels are equalWhat is the intrinsic value to share, comment, like, engage…
If it barks, sings, dances, plays, changes whatever, annotate with something the search engine can crawl, deconstruct, associate with surrogate and store in the index
We’re smart, search engines are a toolThe agenda is about money from advertising and local taggingStructured things are easier to find and the Web is not structuredAnalytics tell us what, not why – user research tells us whyNeed is an experience – need to know is a state of being