This document provides an overview of Google BERT and what it means for SEOs and marketers. Some key points:
- BERT uses bidirectional transformers to better understand the context of words in search queries and content. It helps Google resolve ambiguity and understand nuanced language.
- BERT was first introduced as an academic research paper in 2018 and was quickly adopted by Google and other major tech companies to improve natural language understanding.
- While BERT only impacts around 10% of queries, it represents a major improvement in Google's ability to understand user intent and has important implications for SEO, international search, and conversational search.
3. @dawnieando
• A Google algorithmic
update
• Google announce BERT to the
organic search world in a VERY
geeky way
• Mentions of the 15% of new
queries every day
• Touches on ‘The Vocabulary
Problem’ (many ways of querying
the same thing)
October 2019 - Welcome To Search, BERT
4. @dawnieando
• Probably the biggest
improvement in search EVER
• The biggest change in search
in five years, since RankBrain
Fundamentally… Google BERT is
5. @dawnieando
!Layman’s Terms: it can be
used to help Google better
understand the context of
words in search queries &
content
So, just what is Google BERT update?
6. @dawnieando
• Used globally in all
languages on featured
snippets
• BERT to impact rankings for 1
in 10 queries
• Initially for English language
queries in US
The bottom line search announcement
7. @dawnieando
Dec 2019 – BERT expands internationally
• Over 70 languages
• Still only impacts 10% of
queries despite the
considerable expansion
• Still all featured snippets
globally
8. @dawnieando
• BERT deals with
ambiguity & ‘nuance’ in
queries & content
• Unlikely to impact short
queries
• More likely to impact
conversational queries
• Unlikely to impact
branded queries
Why just 10% of Google Queries Impacted?
9. @dawnieando
• The SEO community is
abuzz
• BERT is a big deal
• Likened to ‘Rank Brain’ in
some of the ‘interesting’
interpretations
• Some confusions around
‘What BERT is and what it
means for search’
SEO’s React
10. @dawnieando
!A neural network-based
technique for natural language
processing pre-training
!An anagram of Bi-Directional
Encoder Representations from
Transformers
BERT in Geek Speak
12. @dawnieando
• Search algorithm update
• Open source pre-trained model / framework for
natural language understanding
• Academic research paper
• Evolving tool for computational linguistics efficiency
• Beginning of MANY BERT’ish language models
Important: BERT is Many Things
13. @dawnieando
So What’s The Backstory?
Where%did%BERT%come%from?
Where%did%the%need%for%BERT%arise?
The$Impact$of$BERT$for$SEO$&$beyond?
What%next?
14. @dawnieando
• Academic Paper
• Research Project by Devlin et al
• Published a year before the
update in October 2018
• Bert: Pre-training of deep
bidirectional transformers for
language understanding
BERT started as a research paper in 2018
15. @dawnieando
• Open sourced so anyone can
build a BERT
• BERT created a sea-change
leap-forward in natural language
understanding in information
retrieval very quickly
• Provided a pre-trained language
model which required only fine-
tuning
BERT Open Sourced in 2018
16. @dawnieando
The whole of the English
Wikipedia & The Books
Corpus combined.
Over 2,500 million words
BERT Has Been Pre-Trained On Many Words
17. @dawnieando
Vanilla BERT provides a pre-
trained starting point layer for
neural networks in machine
learning & natural language
diverse tasks
The machine learning community got very
excited about BERT
18. @dawnieando
• BERT is fine-tuned on a variety of
downstream NLP tasks, including
question and answer datasets
BERT Can Be Fine-Tuned in A Short Space of Time
19. @dawnieando
• Vanilla BERT can be used ‘out of the box’
or fine-tuned
• Provides a great starting point & saves
huge amounts of time & money
• Those wishing to, ‘can build upon’, and
improve BERT
BERT Saves Researchers Time AND Money
20. @dawnieando
• Microsoft – MT-DNN
• Facebook – RoBERTa
• XLNet
• ERNIE – Baidu
• Lots of other
contenders
Since 2018 Major tech companies extend BERT
25. @dawnieando
Language models like
BERT help machines
understand the nuance
in word’s context and
surrounding text
cohesion
What Purpose Does BERT Serve & How?
26. @dawnieando
• Dates back over 60 years old to the Turing Test paper
• Aims at understanding the way words fit together with
structure and meaning.
• NLU is Connected to the field of linguistics (computational
linguistics)
• Over time, increasingly computational linguistics
overflows to a growing online web of content
What is Natural Language Understanding?
30. @dawnieando
“The meaning of a word is its use in a
language” (Ludwig, Wittgenstein,
Philosopher, 1953)
Image attribution: Mortiz, Nahr
(Public domain)
Single Words Have No Meaning
31. @dawnieando
The word ‘like’ in this sentence, is both a:
!(VBP) : (‘verb’ (non 3rd-person, singular,
present) )
!(IN) : (Preposition or subordinating
conjunction)
An Example of Word’s Meaning Changing
• I -> PRP
• Like -> VBP
• That -> IN
• He -> PRP
• Is -> VBZ
• Like -> IN
• That -> DT
33. @dawnieando
E.g. Verbs, nouns, adjectives
• Penn-treebank tagger -> 36
different parts of speech
• CLAWS7 (C7) -> 146 different
parts of speech
• Brown Corpus Tagger -> 81
different parts of speech
Words Are ‘Part of Speech’ When Combined
34. @dawnieando
• He kicked the bucket
• I have yet to tick that off
my bucket list
• The bucket was filled with
water
The Meaning of The Word ‘Bucket’ Changes
35. @dawnieando
Words Need ’Text Cohesion’
The$‘Glue’$which$adds$meaning
May$historically$be$‘stop$words’
Surrounding)words)can)change)‘intent’
They%add%‘context’
36. @dawnieando
”Ambiguity is the greatest bottleneck to computational
knowledge acquisition, the killer problem of all natural
language processing.”
(Stephen Clark, formerly of Cambridge University & now a full-
time research scientist with Google Deep Mind)
Ambiguity Is Problematic
37. @dawnieando
• Words with a similar meaning to something else
• Example: humorous, comical, hilarious, hysterical are ALL
synonyms of funny
Synonymous (Synonyms)
38. @dawnieando
Ambiguity & Polysemy
• Ambiguity is at a sentence level
• Polysemous words are arguably the
most problematic due to ‘nuanced’
nature
39. @dawnieando
• Words usually with the
same root and multiple
meanings
• Example: “Run” has 396
Oxford English Dictionary
definitions
Polysemous (Polysemy)
41. @dawnieando
• Words spelt the same but with very different ‘root’ of word
meanings
• Example: pen (writing implement), pen (pig pen)
• Example: rose (stood up / ascended), rose (flower)
• Example: bark (dog sound), bark (tree bark)
Homonyms
42. @dawnieando
Spelt differently with
VERY different
meanings but
sound exactly the
same
• Draft, draught
• Dual, duel
• Made, maid
• For, fore, four
• To, too, two
• There, their
• Where, wear, were
Homophones – Difficult To Disambiguate Verbally
46. @dawnieando
EXAMPLES
• Zipfian Distribution
• Firthian Linguistics
• Treebanks
• Language can be tied back to
mathematical spaces & algorithms
Language Has Natural Patterns & Phenomena
47. @dawnieando
Example: Zipfian Distribution (Power Law)
• The frequency of any
word in a collection is
inversely proportional to
its rank in the frequency
table
• Applies to any word
frequency ANYWHERE
• Image is 30 Wikipedias
48. @dawnieando
To illustrate Zipfian Distribution (Most used Words):
Rank Word Frequency)of)Use)in)a)Corpus
1 the
2 be 1/2
3 to 1/3
4 of 1/4
5 and 1/5
6 a 1/6
7 in 1/7
8 that 1/8
9 have 1/9
10 I 1/10
49. @dawnieando
“You shall know a word by the
company it keeps” (Firth, 1957)
Firthian Linguistics
One Such Phenomenon is Co-occurrence
50. @dawnieando
Words with similar meaning tend
to live near each other in a body
of text
Word’s ‘nearness’ can be
measured in mathematical vector
spaces – a context vector is
‘word’s company’
Distributional Relatedness & Firthian Linguistics
51. @dawnieando
Co-occurrence, Similarity & Relatedness
• Language models
are trained on
large bodies of
text to learn
‘distributional
similarity’ (co-
occurrence)
52. @dawnieando
Context Vectors & Word Embeddings
• And build vector
space models for word
embeddings
• Models learn the
weights of similarity &
relatedness distances
54. @dawnieando
• He kicked the bucket
• I have yet to tick that off
my bucket list
• The bucket was filled with
water
Remember ‘bucket’ Without Text Cohesion?
55. @dawnieando
Word’s Context Still Needed Gaps Filling
• Past models used
context-free
embeddings
• A moving
‘context window’
was used to gain
word’s context
56. @dawnieando
But Even Then True Context Needs Both Sides of a
Word
• Past models were
‘uni-directional’
• The context
window moved
from left to right
or right to left
61. @dawnieando
!Transformer is a big
deal
!Derived from a 2017
paper called ‘Attention
is all you Need’ (Vaswani, A.,
Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A.N., Kaiser, Ł. and Polosukhin,
I., 2017)
What About The Transformer Part?
64. @dawnieando
River Bank or Financial Bank?
By identifying ‘cheque’ or
‘deposit’ in the company
of ‘bank’ BERT can
disambiguate from a ‘river’
bank
65. @dawnieando
So Where is BERT’s Value in Google Search
• Named entity determination
• Textual entailment (next sentence prediction)
• Coreference resolution
• Question answering
• Word sense disambiguation
• Automatic summarization
• Polysemy resolution
68. @dawnieando
!A single word can change the whole intent of a query
!Conversational queries particularly so
!The ‘stop words’ are actually part of text-cohesion
!Historically ‘stop-words’ were often ignored
!The next sentence matters
BERT and Intent Understanding
69. @dawnieando
Example:
“I remember what my
Grandad said just
before he kicked the
bucket.”
Next Sentence Prediction (Textual Entailment)
Often the next sentence REALLY matters
71. @dawnieando
• There have been lots of improvement by others upon
BERT
• Google have likely improved dramatically on BERT too
• There were some issues with next-sentence prediction
• Facebook built RoBERTa
BERT Probably Doesn’t Resemble The Original BERT
Paper
72. @dawnieando
• Named entity determination
• Coreference resolution
• Question answering
• Word sense disambiguation
• Automatic summarization
• Polysemy resolution
Featured Snippets
Knowledge Graph & Web Page Extraction
Together
73. @dawnieando
!BERT is multilingual from mono-lingual
!Other language specific BERTs are being built
!Transformer was trained on international translations
!Language has transferrable phenomena
BERT and International SEO
Expect Big Things
74. @dawnieando
• Deepset – German BERT
• CamemBERT – French BERT
• AlBERTo – Italian BERT
• RobBERT - Dutch RoBERTa model
BERT & International SEO
75. @dawnieando
!The challenges of Pygmalion
!Conversational search can now ‘scale’
!BERT takes away some of the human
labelling effort necessary
!Next sentence prediction could impact
assistants and clarifying questions
BERT and Conversational Search
Expect Big Things
76. @dawnieando
Semantic Heterogeneity Issues in Entity Oriented
Search (Semantic Search)
!Helps with anaphora & cataphora
resolution (resolving pronouns of entities)
!Helps with coreference resolution
!Helps with named entity determination
!Next sentence prediction could impact
assistants and clarifying questions
78. @dawnieando
• It’s supposed to be natural
• In the same way you can’t optimize for Rank
Brain you can’t optimize for BERT
• BERT is a tool / learning process in search for
disambiguation & contextual understanding of
words
• BERT is a ‘black-box’ algorithm
Why can’t you optimize for BERT?
79. @dawnieando
• Black-box algorithm
• Hugging Face coined the phrase
BERTology
• Now a field of study exploring why
BERT makes choices
• Some concerns over bias &
responsible AI
Black Box Algorithms & BERTology
80. @dawnieando
!Cluster together content and interlink well on topic & nuance
!Avoid ‘too-similar’ completing categories - merge
!Consider not just the content in the page but the content in
the linked pages & sections
!Consider the content of the ‘whole domain’ as everything
contributes in co-occurrence
!Be extra vigilant when ‘pruning
Utilising Co-Occurrence Strategically
Employ Relatedness
82. @dawnieando
Anyone can build a BERT to train their own
language processing system for a variety of
natural language understanding downstream
tasks.
Fine-tuning can be carried out in a short time
BERT represents a union of data science and SEO
Anyone Can Use BERT – BERT is a Tool
83. @dawnieando
• Automatic categorization & subcategorization of
content
• Automatic generation of meta-descriptions
• Automatic summarization of extracts & teasers
• Categorising user-generated content / posts
probably better than humans
How Could BERT Be Harnessed For Efficiency
in SEO? A Few Examples
84. @dawnieando
• J R Oakes - @jroakes
• Hamlet Batista - @hamletbatista
• Andrea Volpini - @cyberandy
• Gefen Hermesh - @ghermesh
SEOs Are Getting Busy With BERTishness
86. @dawnieando
• Original BERT was computationally expensive to
run
• ALBERT stands for A Lite BERT
• Increased efficiency
• ALBERT is BERT’s natural successor
• ALBERT much leaner whilst providing similar
results
• A joint research work between Google & Toyota
ALBERT – BERT’s Successor
87. @dawnieando
Reformer (Google) – Transformer’s Successor
Understands word’s context
from the perspective of a
‘whole novel’.
https://venturebeat.com/2020/01/16/goog
les-ai-language-model-reformer-can-
process-the-entirety-of-novels/
88. @dawnieando
Growth has been huge in the natural language
processing community – Current Superglue
Leaderboard
BERT Was Just The Start
• Google T5 is winning
• Even more
advanced
technology
• Transfer-learning
• Expect big things