Leveraging the semantic web meetup, Semantic Search, Schema.org and more
2. Aug 2013•0 gefällt mir•1,006 views
Melden
Technologie
Bildung
A history and description of the adoption of Semantic Search by the major search and social engines. Covers schema.org, the knowledege graph and status to date (july 30, 2013). Presented From a Search Engine Point of View.
Leveraging the semantic web meetup, Semantic Search, Schema.org and more
1. Leveraging the Semantic Web, Schema.org, Semantic Search and more
San Diego Semantic Web Meetup
By: Barbara Starr
Twitter: @BarbaraStarr
Email: bstarr@algebraixData.com
2. • Pursued a doctorate in Artificial Intelligence from South
Africa in the 80's.
• Recruited to build intelligent/predictive trading systems
on Wall Street
• Migrated to government-based contracts, several of
which turned into real world products like
– SIRI (PAL from DARPA)
– WATSON (Acquaint - IBM Watson Labs was a team
member)
• From the vantage of a semantic technologist, I keenly
watched the evolution of the Semantic Web.
• “Shocked into the real world” when working as a
consultant @ Overstock
• Today – SVP Product management AlgebraixData
Meta Information
ME
By: Barbara Starr
Twitter: @BarbaraStarr
Email: bstarr@algebraixData.com
Linkedin: http://www.linkedin.com/in/barbarastarr
My favorite author:
Isaac Asimov
Favorite book:
I Robot
Favorite character:
MULTIVAC
3. Additional Metainformation
For the purpose of this talk:
same-as
MY ROBOT or Artificially Intelligent Entity or Search Engine
OWL
I explain things
from a Search
Engine Point of
View!
4. SEARCH ENGINE POINT OF VIEW
How can I exploit
metadata or
“semantic
search”??
5. SEARCH ENGINE POINT OF VIEW
RICH SNIPPETS 2009
tiles
Searchmonkey 2008
I can directly extract
information to
enhance SERP displays
7. SEARCH ENGINE POINT OF VIEW
I can provide direct
answers to queries by
searching on
consumed, verified and
validated information
8. SEARCH ENGINE POINT OF VIEW
I can even aggregate
answers or deduce
them (like a timeline of
events)
9. SEARCH ENGINE POINT OF VIEW
I can even use it in
conjunction with
machine learning
techniques- to eg.
Train other
components
I can detect
relevancy
signals: i.e what
content to show
to what
audience
I can use it to
Assist in
interpreting a
user query
Penn Treebank tagset
?
10. SEARCH ENGINE POINT OF VIEW
Really interesting in terms
of exposing long tail
content too. It makes
things findable for me
when pages are published
with structured markup!
I meant the
beer brewer
in Arizona
11. SEARCH ENGINE POINT OF VIEW
I’m a Search Engine Robot
I could really use
this stuff. And it
is like the tower
of babel out
there!
Microdata
Microformats
RDFa
Multiple conflicting
vocabularies that I will
have to align internally
and multiple syntax
formats as well.
Prior to Schema.org (.e. June 2011)
Goodrelations for e-commerce
?
13. What has been the history?
Percentage of URLs with embedded metadata in various formats
Five-fold increase between
March, 2009 and October,
2010
Another five-fold increase
between October 2010 and
January, 2012
RDFa exploded in 2012 – Source Peter Mika - Yahoo
14. Current state of metadata on the Web
• 31% of webpages, 5% of domains contain some metadata
– Analysis of the Bing Crawl (US crawl, January, 2012)
– RDFa is most common format
• By URL: 25% RDFa, 7% microdata, 9% microformat
• By eTLD (PLD): 4% RDFa, 0.3% microdata, 5.4% microformat
– Adoption is stronger among large publishers
• Especially for RDFa and microdata
• See also
– P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012
– H.Mühleisen, C.Bizer.Web Data Commons - Extracting Structured Data
from Two Large Web Corpora, LDOW 2012
16. Timeline of RDFa and Semantic Web Adoption
As of Semtech 2011
Inevitable passage of
Semantic Web adoption –
culminating in schema.org
17. SEARCH ENGINE POINT OF VIEW
Align and consume
many vocabularies
that may not be of
interest to search
engines?
Rather mandate vocabulary And Syntax - microdata
A Search Engine
alliance has the power
to MANDATE
vocabulary and syntax!
Initial alliance: Google, Yahoo, Bing. Then Yandex and subsequently Pinterest
21. SEARCH ENGINE POINT OF VIEW
Make sure you are
not cloaking by
feeding one set of
information to me
and another to
human users!
Ensure your data
feeds match
information with
the structured
markup or
“metadata” on
your web pages.
22. Your Logo
SEARCH ENGINE POINT OF VIEW
Serving
RELEVANT
ANSWERS are
IMPERATIVE!
& central to my
very being!
25. SEARCH ENGINE POINT OF VIEW
Adding context in
search verticals really
helps me serve up
relevant information
(Seriously increases my
recall), as does
geospatial information.
Consumed information -
Structured Data Dashboard
Google’s “SearchVerticals”
Notice any correlations?
I would advise you to!
26. OH! and be sure to
check out Moores law
SEARCH ENGINE POINT OF VIEW
I also have a pretty
good understanding of
big data and web
intelligence so I can
leverage them!
SIRI
“Amazing fact: same
amount of computing to
answer one Google Search
query as all the computing
done -- in flight and on the
ground -- for the entire
Apollo program!
27. SEARCH ENGINE POINT OF VIEW
I can leverage
metadata for
better image
search
SIRI
I can combine it with
computer vision
techniques.
I can enhance
user’s shopping
experience.
29. SEARCH ENGINE POINT OF VIEW
? Know rather than
Recognize?
INTRODUCING THE KNOWLEDGE GRAPH
Symbolic
reasoning vs
stochastic
reasoning (Latter is
more like NLP or
page rank)
30. SEARCH ENGINE POINT OF VIEW
♫
Folks finding answers
on my page never
even have to click
through to yours!
And speaking of
the knowledge
graph or
knowledge
carousel!
I can even now
start to derive
associations or
relationships
between entities.
31. SEARCH ENGINE POINT OF VIEW
Check out this great highlighter.
The information is available
only to me and not to any other
search or social engines!
Can you believe I have been
accused of hijacking semantic
markup?
I find it so helpful that I
would really like to be
able to keep all that
validated verified
information to myself!
32. SEARCH ENGINE POINT OF VIEW
And extended my data
highlighter to include the
following types of entities
(check your webmaster
tools for this)
I have since created
the structured markup
helper! And added
support for JSON-LD as
well as microdata)
33. SEARCH ENGINE POINT OF VIEW
They are also leveraging it
in their newly released
graph search!
Not only that, they are even
building an entity graph not
dissimilar from my
knowledge graph!
My social counterparts
have been leveraging
structured markup
(rdfa) for their
opengraph protocol for
quite some time.
The Open Graph Protocol enables you to
integrate your Web pages into the social graph Example of crowdsourced
entity graph info source - places
34. SEARCH ENGINE POINT OF VIEW
My social counterparts
ought to have a field day in
terms of both targeted
advertising and in creating
engaging user experiences
by leveraging their more
recent innovations.
35. SEARCH ENGINE POINT OF VIEW
Knowledge Graphs are
now ubiquitous, and
the term has become
common vernacular!
LINKED IN SNAPSHOTS
ADDED PUBMED
Knowledge Graph
Knowledge Graph
36. SEARCH ENGINE POINT OF VIEW
I am starting to use
hashtags in search so
I can merge topics
and entities in graphs,
like some of my social
counterparts!
LINKED IN SNAPSHOTS
ADDED PUBMED
Knowledge Graph
Knowledge Graph
37. SEARCH ENGINE POINT OF VIEW
I am even now measuring
my trending “entities” in my
top charts, rather than
“strings”.
38. SEARCH ENGINE POINT OF VIEW
LIST IS GROWING FAST!
LATEST DRAFT ON ACTION
TYPES – July 2013
Via publicvocabs@w3
39. SEARCH ENGINE POINT OF VIEW
Check the list to see
what is coming out
next! Schema.org is
dynamic and is
growing!
Mark up information not
yet consumed by search
engines to get the
advantage of extra lift
when it is adopted.
40. SEARCH ENGINE POINT OF VIEW
Thank you for your
time!
And just a bye-the-bye,
this technology is still in
it’s nascent stages. Can
you imagine what I will
be able to do soon?
Barbara Starr
Email: bstarr@AlgebraixData
Twitter: @BarbaraStarr
Resources to help you!
Make sure to use
them wisely!
Remember, if you want
to make the search
engines happy, put
yourself in their shoes!
PageRank is now only 1
of over 200 signals that
Google uses!
41. Resources at this point in time
Caveat: Some training may be required for some of the tools
Programming Languages:
JavaSCript: Microdatajs
Live microdata
Php: Microdataphp
Ruby: RDF Microdata
RDF Lib plugin
PerlRuby: RDF Microdata Gem
Mida
Java: Sindice any23 library
Publishing
Form Based tools:
Schema Creator
Microdata generator
Standalone tools
Web.instadata
Editors:
Topbraid Composer
Protege
Platforms:
Drupal
Joomla
Wordpress (about 7 of them)
Virtuoso
Topbraid Composer
Validators, Testers and More Check.rdfa.info Sindice Inspector
Rich Snippets Testing Tool Bing Validator
Structured data Linter Online Parser?viewer and RSS generator
Validator.nu Google Structured Data Tester
44. Other Semantic Web Resources
OpenCalais – Can extract information about people, places and things
AlchemyAPI – named entity extraction, topic recognition, keyword tagging, more ….
Cogito – Expert System
Franz Inc. – Gruff
Pool Party
JSON-LD playground
YAHOO! Glimmer
Many More….
Barbara Starr
Twitter: @BarbaraStarr
Email: bstarr@algebraixdata.com
Linkedin: http://www.linkedin.com/in/barbarastarrFor more info contact:
Caveat: Some training may be required for some of the tools
Topbraid Composer
45. By Barbara Starr
Twitter: @BarbaraStarr
Linkedin :http://www.linkedin.com/in/barbarastarr
E-mail : bstarr@algebraixdata.com
Bye for now