How is SEO changing to support microdata like Schema.org? And why is this metadata good for information retrieval and organic search engine optimization?
In this introductory guest lecture for the University of Washington, I present some of the problems in information retrieval for unstructured content ("blobs") and how to solve for these challenges using Schema.org microdata to define "entities".
There's a simple Schema.org markup exercise to expose students to the basics as well as jokes about horror movies, The Simpsons, Keanu Reeves, and even Joss Whedon just to keep things light-hearted and fun.
You can learn more about Jonathon Colman at http://www.jonathoncolman.org/
Automating Google Workspace (GWS) & more with Apps Script
SEO in the Age of Entities: Using Schema.org for Findability
1.
2.
3. INFO 498: Content Strategy (week #7)
From Blobs to Structured Data
SEO in the Age of Entities
Jonathon Colman, @jcolman
In-House SEO for REI
www.REI.com
4.
5. What is content?
If you boil away all the formatting, what’s
left?
Just text?
If so, then why isn’t full text search good
enough to find what you’re looking for?
What could work better than that?
Any what can we do to content to support
its findability?
7. Huh? Wikipedia
is a source?
https://www.facebook.com/pages/The-Bus-
That-Couldnt-Slow-Down/114241625259749
8. Oh, it’s via a synonym
redirect to…
http://en.wikipedia.org/w/index.php?title=The_Bus_Tha
t_Couldn%27t_Slow_Down&redirect=no
9. Joss Whedon was a
co-writer? WTF?!
http://en.wikipedia.org/wiki/Speed_(1994_film)
10. What is a document?
How can you tell what a document is
about?
How can you tell one document from
another?
What sort of signals do documents give us
that help us derive their meaning?
Do you know them when you see them?
11. veniam, quis nostrud exerci tation ullamcorper suscipit l
ommodo consequat. Duis autem vel eum iriure dolor in h
ate velit esse molestie consequat, vel illum dolore eu feu
os et accumsan et iusto odio dignissim qui blandit praes
augue duis dolore te feugait nulla facilisi. Nam liber tem
d option congue nihil imperdiet doming id quod mazim p
Typi non habent claritatem insitam; est usus legentis in
em. Investigationes demonstraverunt lectores legere me
s. Claritas est etiam processus dynamicus, qui sequitur m
tudium lectorum. Mirum est notare quam littera gothica
us parum claram, anteposuerit litterarum formas human
decima et quinta decima. Eodem modo typi, qui nunc no
ant sollemnes in futurum. Lorem ipsum dolor sit amet, c
ing elit, sed diam nonummy nibh euismod tincidunt ut la
m erat volutpat. Ut wisi enim ad minim veniam, quis nost
orper suscipit lobortis nisl ut aliquip ex ea commodo con
m iriure dolor in hendrerit in vulputate velit esse molestie
eu feugiat nulla facilisis at vero eros et accumsan et iusto
praesent luptatum zzril delenit augue duis dolore te feu
ber tempor cum soluta nobis eleifend option congue nihi
d mazim placerat facer possim assum. Typi non habent cl
This is a Blob.
gentis in iis qui facit eorum claritatem. Investigationes de
s legere me lius quod ii legunt saepius. Claritas est etiam
icus, qui sequitur mutationem consuetudium lectorum. M
ittera gothica, quam nunc putamus parum claram, antep
humanitatis per seacula quarta decima et quinta decima
12. Lorem ipsum: A Study in Dolor Sit Amet
Author: Melissa Weaver
Date: February 18, 2012
Language: Latin, English
Publisher: UW Husky Press
Keywords: consectetuer, adipiscing, elit, sed, diam
Abstract: Nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat
volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit
lobortis nisl ut aliquip ex ea commodo consequat.
Chapter 1: Hendrerit in Vulputate
Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse
molestie consequat, vel illum dolore eu feugiat nulla facilisis at
vero eros et accumsan et iusto odio dignissim qui blandit praesent
This uses Entities.
luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
Nam liber tempor cum soluta nobis eleifend option congue nihil
imperdiet doming id quod mazim placerat facer possim assum...
13. The Problem with Blobs
Unstructured content is useful, but only to
a point
It’s hard to scan, skim, and easily make
sense of – both for humans and robots
It’s hard to search against, particularly in a
crowded collection with lots of competing
content containing similar information
What should a search engine pay
attention to in order to help the user?
14. HTML metadata
Metadata is “data about data”, right?
In HTML, we can express metadata like:
<title>The Problem With Blobs</title>
<meta name=“description” content=“An overview
of why blobs are tricky things to deal with.” />
<meta name="keywords" content=“blob, entity,
seo, content strategy, inf0498" />
Unfortunately, that’s not going to be good
enough. But why not? Let’s see…
16. How can we do better?
Real metadata – in this case, “microdata”.
17. What is Schema.org?
Microdata standard agreed upon by
Google, Bing, and Yahoo
Uses relatively simple on-page code to
turn blobs of content into structured data
Once structured, this content become
interoperable in other systems – you can
display that data wherever the standards
are accepted
Here’s an example…
19. Controlled entities help searchers
Documents can be documents, authors
can be authors, products can be products,
and prices can be prices.
Each of these entities has a definition in
Schema.org and markup that you can use to
define a blob as being actual data.
So if Homer doesn’t know the name of the
movie “Speed”, he can still find it with
searches for its subject, the actors, the
year it came out, the director, etc.
20. Exercise: Use the “Article” schema
Go to http://schema.org/Article
Look at the entities and the code sample
at the bottom
Pick appropriate content from the IAI
Library, such as
http://iainstitute.org/en/learn/research/a
_simplified_model_for_facet_analysis.php
“View Source” and try marking it up with
Schema.org microdata
21. Partial potential results
<div itemscope itemtype=“http://schema.org/Article”>
<h1 itemprop="name">A Simplified Model for Facet Analysis</h1>
<div itemscope itemtype=“http://schema.org/Author”>
<h2 itemprop=“name">Dr. Louise Spiteri</h2><br />
<span
itemprop=“URL">http://dal.academia.edu/LouiseSpiteri</span><br>
<div itemscope itemtype=“http://schema.org/Affiliation”>
Faculty of Management<br />
School of Library and Information Studies<br />
<span itemprop=“Organization”>Dalhousie University</span><br />
<div itemscope itemtype="schema.org/PostalAddress">
<span itemprop=“addressLocality”>Halifax</span><br />
<span itemprop="addressRegion">Nova Scotia</span> <span
itemprop="postalCode">NS B3H 3J5</span><br />
<span itemprop="addressCountry">Canada</span></div><br />
Voice: <span itemprop=“telephone”>(902) 494-2473</span><br />
Fax: <span itemprop=“faxNumber”>(902) 494-2451</span></div><br />
</div>
</div>
22. How to test
Use Google’s Rich Snippets Testing Tool:
http://www.google.com/webmasters/tools/r
ichsnippets
23. Sample test output
For this example blog post:
http://homebiss.blogspot.com/2011/11/markup-
blogger-schemaorg-examples.html
The Google Rich Snippets Testing Tool
shows this output, which includes some
use of Schema.org:
http://www.google.com/webmasters/tools/richsnip
pets?url=http%3A%2F%2Fhomebiss.blogspot.com%
2F2011%2F11%2Fmarkup-blogger-schemaorg-
examples.html&view=
24. What did we just learn?
Schema.org is frakkin’ verbose.
Entities can cascade poly-hierarchically
There are many “right” approaches
Not all entities need to be expressed
Not all entities provide value
Still, it’s hard to know when to stop
In your case, you’re done when the quarter’s over.
25. Common Schema.org entities
Thing > Person
Thing > Organization
Thing > CreativeWork > Article
See also: Blog, BlogPosting, NewsArticle, ScholarlyArticle
Thing > CreativeWork > MediaObject
See also: AudioObject, ImageObject, VideoObject
Thing > Place
See full list at
http://schema.org/docs/full.html
26. Constraints to consider
Helping more people find more things is
great, right?
But in the Real World™:
Assume that there’s a cost to do this
Assume that there’s a cost for maintenance
Assume that the standards will change
Assume that there are other priorities
Assume that conflicts, dependencies exist
27. Takeaways
Jon likes horror movies and The Simpsons
Blobs aren’t evil, just misunderstood!
Structured data entities help define blobs
Structured data entities make blobs easier to
understand, learn from, index, and find
Metadata, microdata, and other methods can be
used to create these entities
SEO standards (such as Schema.org) are
emerging to support entities in popular
search engines.
28. Many thanks!
Jonathon Colman
In-House SEO for REI
Home: about.me/jcolman
Twitter: @jcolman
Pssssst! So you wanna learn
more about SEO? See
http://www.seomoz.org/begin
ners-guide-to-seo