It is often mistakenly thought that Google does natural language processing in its search results, as of 2018 it still doesn't. This presentation looks at how Google started, its historical approach to language, and how it is working towards NLP along with new methods of machine learning that are supporting the "strings to things" interpretation of text and voice and how Rank Brain plays into all of this.
7. #Ungagged #Vegas @schachin Kristine Schachinger
Google Goes To Work
http://infolab.stanford.edu/pub/papers/google.pdf
8. #Ungagged #Vegas @schachin Kristine Schachinger
In 2018 …
Roughly half of the world's population or
3.8 billion people use the internet every day.
9. #Ungagged #Vegas @schachin Kristine Schachinger
Google
processes
TRILLIONS
of queries a
year & has
indexed
BILLIONS
of Websites.
10. #Ungagged #Vegas @schachin Kristine Schachinger
IN 2015, THERE WERE
2,834,650,000,000 Google searches with
an average 7,766,000,000 searches a day.
11. #Ungagged #Vegas @schachin Kristine Schachinger
Breaks down to …
7.7 billion average searches per day
or over 63,000 search queries per second.
16. #Ungagged #Vegas @schachin Kristine Schachinger
Unstructured data (or unstructured information) is information that
either does not have a pre-defined data model or is not organized in a
pre-defined manner. Unstructured information is typically text-heavy,
but may contain data such as dates, numbers, and facts as well.
https://www.google.co.uk/search?q=definition+unstructured+data&oq=definition+unstructured+data&aqs=chrome..69i57j0l5.5175j0j7&sourceid=chrome&ie=UTF-8
17. #Ungagged #Vegas @schachin Kristine Schachinger
Unstructured data (or unstructured information) is information that
either does not have a pre-defined data model or is not organized in a
pre-defined manner. Unstructured information is typically text-heavy,
but may contain data such as dates, numbers, and facts as well.
https://www.google.co.uk/search?q=definition+unstructured+data&oq=definition+unstructured+data&aqs=chrome..69i57j0l5.5175j0j7&sourceid=chrome&ie=UTF-8
20. #Ungagged #Vegas @schachin Kristine Schachinger
TF-IDF
Term Frequency Inverse
Document Frequency
ie the frequency of keywords
https://moz.com/blog/7-advanced-seo-concepts
21. #Ungagged #Vegas @schachin Kristine Schachinger
As queries number in the trillions
unstructured data becomes inefficient.
Data needs structure.
23. #Ungagged #Vegas @schachin Kristine Schachinger
So Google moved from
Relational Databases to
Knowledge Graphs.
Knowledge Graphs
24. #Ungagged #Vegas @schachin Kristine Schachinger
NOTE
Knowledge Graphs
DO NOT EQUAL
THE KNOWLEDGE GRAPH
Knowledge Graphs
25. #Ungagged #Vegas @schachin Kristine Schachinger
“Graph-based knowledge representation has been
researched for decades and the term knowledge
graph does not constitute a new technology.
Rather, it is a buzzword reinvented by Google
and adopted by other companies and academia to
describe different knowledge representation
applications.”
Knowledge Graphs
http://ceur-ws.org/Vol-1695/paper4.pdf
26. #Ungagged #Vegas @schachin Kristine Schachinger
Enter Semantic Search
and TensorFlow
https://web.archive.org/web/20090516213508/http://blog.searchenginewatch.com/090512-201139
31. #Ungagged #Vegas @schachin Kristine Schachinger
Google Squared
Google Squared returns search results in a spreadsheet format. It structures
the unstructured data on web pages. So a search for Small Dogs returns
results with names, description, size, weight, origin, etc., in columns and
rows.” ~Techcrunch
https://techcrunch.com/2009/05/12/what-is-google-squared-it-is-how-google-will-crush-wolfram-alpha-exclusive-video/
32. #Ungagged #Vegas @schachin Kristine Schachinger
https://searchengineland.com/up-close-google-squared-19313
Before the Knowledge Graph
33. #Ungagged #Vegas @schachin Kristine Schachinger
Google Squared
“Call it structured data if you like,
I call it a surefire recipe for making
a bad dog buying decision.”
https://readwrite.com/2009/06/03/google_squared_is_live_who_knew_structured_data_co/
34. #Ungagged #Vegas @schachin Kristine Schachinger
Google Kills Google Squared.
RIP Google Squared 2009-2011
35. #Ungagged #Vegas @schachin Kristine Schachinger
(Knowledge Graphs)
”…quite possibly ...
one of Google's significant achievements”
Nathania Johnson of Search Engine Watch
https://web.archive.org/web/20090516213508/http://blog.searchenginewatch.com/090512-201139
Knowledge Graphs
37. #Ungagged #Vegas @schachin Kristine Schachinger
The Holy Grail
of Search?
NLP
(Natural Language
Processing)
38. #Ungagged #Vegas @schachin Kristine Schachinger
“Strings to Things"
But Google doesn’t process
Natural Language.
39. #Ungagged #Vegas @schachin Kristine Schachinger
G-Squared was the early stages of Google moving
search from strings (unstructured data)
or the “bag of words” approach
to “things” (structured data)
“Strings to Things"
40. #Ungagged #Vegas @schachin Kristine Schachinger
“Things” are known objects
with known (or learned) relationships.
“Strings to Things"
41. #Ungagged #Vegas @schachin Kristine Schachinger
https://searchengineland.com/up-close-google-squared-19313
Before THE Knowledge Graph – Wonder Wheel
42. #Ungagged #Vegas @schachin Kristine Schachinger
https://searchengineland.com/up-close-google-squared-19313
Before the Knowledge Graph – Wonder Wheel
44. #Ungagged #Vegas @schachin Kristine Schachinger
Knowledge Graphs are based on known relationships.
THE Knowledge Graph is Google’s graph database.
THE Knowledge Graph
45. #Ungagged #Vegas @schachin Kristine Schachinger
The Knowledge Graph (Google) is seeded by things known.
Instead of just text without meaning, The KG is a relational
graph with known objects and mapped relationships.
THE Knowledge Graph
47. #Ungagged #Vegas @schachin Kristine Schachinger
"Four years ago this July, Google
acquired Metaweb,
bringing Freebase and
linked open data to Google,"
he wrote.
Google software engineer Barak Michener
http://www.eweek.com/database/google-releases-cayley-open-source-graph-database
THE Knowledge Graph Seeds
48. #Ungagged #Vegas @schachin Kristine Schachinger
Also includes trusted
sources such as the
CIA Fact Book, Wikipedia,
Wikidata etc.
http://www.eweek.com/database/google-releases-cayley-open-source-graph-database
THE Knowledge Graph Seeds
49. #Ungagged #Vegas @schachin Kristine Schachinger
Why the Knowledge Graph?
To help better match user intent.
To understand what users want.
THE Knowledge Graph
50. #Ungagged #Vegas @schachin Kristine Schachinger
The Knowledge Graph enables you to search for things, people or places
that Google knows about—landmarks, celebrities, cities, sports teams,
buildings, geographical features, movies, celestial objects, works of art
and more—and instantly get information that’s relevant to your query
THE Knowledge Graph
54. #Ungagged #Vegas @schachin Kristine Schachinger
Knowledge Graph entities
The Knowledge Graph has millions of entries that describe real-world entities like people, places, and things. These
entities form the nodes of the graph.
The following are some of the types of entities found in the Knowledge Graph:
Book
BookSeries
EducationalOrganization
Event
GovernmentOrganization
LocalBusiness
Movie
MovieSeries
MusicAlbum
MusicGroup
MusicRecording
Organization
Periodical
Person
Place
SportsTeam
TVEpisode
TVSeries
VideoGame
VideoGameSeries
WebSite
THE Knowledge Graph ENTITIES
55. #Ungagged #Vegas @schachin Kristine Schachinger
Entities + Relationships=
THE Knowledge Graph
THE Knowledge Graph
58. #Ungagged #Vegas @schachin Kristine Schachinger
Google as an Answer Engine
https://www.google.com/search/howsearchworks/responses/#?modal_active=none
60. #Ungagged #Vegas @schachin Kristine Schachinger
Hummingbird
The name was derived from
the speed and accuracy of the hummingbird.
“Strings to Things"
61. #Ungagged #Vegas @schachin Kristine Schachinger
Hummingbird Arrives 2013
Google moves from matching keyword terms to
Google trying to process Natural Language Queries.
“Strings to Things"
62. #Ungagged #Vegas @schachin Kristine Schachinger
But Google doesn’t process
Natural Language very well.
“Strings to Things"
64. #Ungagged #Vegas @schachin Kristine Schachinger
KEY FACTOR word2vec:
Vector space models (VSMs) represent (embed)
words in a continuous vector space where
semantically similar words are mapped to nearby
points ('are embedded nearby each other').
Hummingbird
https://www.tensorflow.org/tutorials/representation/word2vec
65. #Ungagged #Vegas @schachin Kristine Schachinger
Embedded Word Model
Hummingbird
https://www.tensorflow.org/tutorials/representation/word2vec
66. #Ungagged #Vegas @schachin Kristine Schachinger
“…words that appear in the same contexts share semantic meaning. The
different approaches that leverage this principle can be divided into two
categories: count-based methods (e.g. Latent Semantic Analysis),
and predictive methods (e.g. neural probabilistic language models).”
Hummingbird
https://www.tensorflow.org/tutorials/representation/word2vec
70. #Ungagged #Vegas @schachin Kristine Schachinger
Hummingbird adds a
semantic layer to the
search algorithms like
synonyms and close
variants.
https://moz.com/blog/7-advanced-seo-concepts
71. #Ungagged #Vegas @schachin Kristine Schachinger
Hummingbird adds a
semantic layer to the
search algorithms that
uses “semantic distance
and term relationships”.
https://moz.com/blog/7-advanced-seo-concepts
72. #Ungagged #Vegas @schachin Kristine Schachinger
Hummingbird adds a
semantic layer to the
search algorithms that
uses “phrase based
Indexing and co-
occurrence.”
https://moz.com/blog/7-advanced-seo-concepts
73. #Ungagged #Vegas @schachin Kristine Schachinger
Page Segmentation.
This part of the
algorithm determines
meaning through
placement.
https://moz.com/blog/7-advanced-seo-concepts
74. #Ungagged #Vegas @schachin Kristine Schachinger
Entity Salience.
This part of the
algorithm determines
meaning through known
relationships.
https://moz.com/blog/7-advanced-seo-concepts
75. #Ungagged #Vegas @schachin Kristine Schachinger
So Hummingbird moves from
strict word count based modeling
(ie keyword counts) to
probabilistic modeling
(ie predictive interpretation)
via known word vectors.
Hummingbird
80. #Ungagged #Vegas @schachin Kristine Schachinger
BUT …..
Google Search still doesn’t process
Natural Language.
This means we must add an “interpreter”.
83. #Ungagged #Vegas @schachin Kristine Schachinger
What is Structured Data?
Structured data for SEO purposes is on-page markup that
enables search engines to better understand the information
currently on your site’s web pages, and then use this information
to improve search results listing by better matching user intent.
84. #Ungagged #Vegas @schachin Kristine Schachinger
What is Structured Data?
This structured data is defined by using schema to act as the
interpreter. This is the definition we add to the page using
schema code.
Google allows 3 types.
• RDFa
• Microdata
• JSON-LD
85. #Ungagged #Vegas @schachin Kristine Schachinger
Schema
JSON-LD is the recommended schema code.
JSON-LD stands for JavaScript Object Notation for Linked Data
This is just a way to implement schema outside the HTML mark-up
structure. RDFa and Microformats required the code to be implemented
via HTML.
86. #Ungagged #Vegas @schachin Kristine Schachinger
Schema
Benefit is it can be removed from the HTML structure, which
makes it easier to write, implement, and maintain.
For a good breakdown on what JSON is at the code level.
Portent’s JSON Implementation Guide is very helpful.
https://www.portent.com/blog/seo/json-ld-implementation-guide.htm
88. #Ungagged #Vegas @schachin Kristine Schachinger
Schema
IMPORTANT! Test your JSON-LD.
Use the Google Structured Mark-Up Helper.
https://search.google.com/structured-data/testing-tool
89. #Ungagged #Vegas @schachin Kristine Schachinger
Schema
NOTE this tool only tells you if it is semantically correct, NOT if
you are using the proper schema.
Make sure to check with Google’s Guides on schema implementation.
Improper use or implementation can result in a manual action.
• https://developers.google.com/search/docs/guides/intro-structured-data
• https://developers.google.com/search/docs/guides/prototype
90. #Ungagged #Vegas @schachin Kristine Schachinger
Schema
IMPORTANT! Your JSON
content MUST match what is in
the page exactly.
If they differ, you will likely get a
manual action as Google sees
this as cloaking.
93. #Ungagged #Vegas @schachin Kristine Schachinger
We can act as the interpreter and help “teach”
Google what our site is about.
94. #Ungagged #Vegas @schachin Kristine Schachinger
Adding semantic mark-up
(structured data via schema) allows us to tell
Google what WE SAY our site is about and WHAT
RELATIONSHIPS we define within it.
95. #Ungagged #Vegas @schachin Kristine Schachinger
We can act as the interpreter and help “teach”
Google the context of our content.
97. #Ungagged #Vegas @schachin Kristine Schachinger
We can help give Google a clearer understanding.
That helps us help Google better answer
the questions users ask
and to better surface our content for those users
We give our data meaning
Google Understands
101. #Ungagged #Vegas @schachin Kristine Schachinger
Rank Brain is used for Unknown Queries where entity
meanings/relationships are unclear or unknown.
106. #Ungagged #Vegas @schachin Kristine Schachinger
Why?
Google does not use NLP in Search
(Natural Language Processing)
Rank Brain.
107. #Ungagged #Vegas @schachin Kristine Schachinger
Uses Structured Data, Entities, & Known Relationships
Person, Place, Thing = Noun = Entities.
Nouns or Persons/Places/People/Things are what we call entities. Entities are
known to Google and their meaning is defined in the databases Google references.
Rank Brain.
108. #Ungagged #Vegas @schachin Kristine Schachinger
• Words go in.
• Words get assigned a mathematical address in a vector.
• Similar and related words sit close to each other in the vector space.
• Words are retrieved based on your query and the words it locates in the “best fit” vector.
• These word “interpretations” are used to return results.
• If the relationships are weak or unknown, enter Rank Brain.
• Behind the scenes, data is continually fed into the machine learning process, so as to make
those results more relevant the next time.
Rank Brain – Known Relationships.
109. #Ungagged #Vegas @schachin Kristine Schachinger
Rank Brain Also Uses Users Queries & Clicks
to Help It Understand Query Intent.
114. #Ungagged #Vegas @schachin Kristine Schachinger
Just write in natural and conversational language.
Create holistic content.
115. #Ungagged #Vegas @schachin Kristine Schachinger
Write holistic content.
Use terms that are semantically related.
For a detailed explanation Google explains here > https://www.youtube.com/watch?v=vzoe2G5g-w4&feature=youtu.be&t=32m19s
116. #Ungagged #Vegas @schachin Kristine Schachinger
Write holistic content.
DOES YOUR CONTENT HAVE DEPTH AND WIDTH?
For a detailed explanation Google explains here > https://www.youtube.com/watch?v=vzoe2G5g-w4&feature=youtu.be&t=32m19s
117. #Ungagged #Vegas @schachin Kristine Schachinger
Takeaways.
• Think Search Queries NOT Simple Keywords
• Write in natural, conversational language
• Write using holistic content
• Focus on depth and breadth with related terms
• Add Structured Data
Takeaways.