Google BERT - What SEOs and Marketers Need to Know

@dawnieando
Google BERT
What SEOs and Marketers Need to Know

@dawnieando
Here’s another Bert, & I am Dawn Anderson
Managing Director
of BerteyBert, my Pomeranian

@dawnieando
• A Google algorithmic
update
• Google announce BERT to the
organic search world in a VERY
geeky way
• Mentions of the 15% of new
queries every day
• Touches on ‘The Vocabulary
Problem’ (many ways of querying
the same thing)
October 2019 - Welcome To Search, BERT

@dawnieando
• Probably the biggest
improvement in search EVER
• The biggest change in search
in five years, since RankBrain
Fundamentally… Google BERT is

@dawnieando
!Layman’s Terms: it can be
used to help Google better
understand the context of
words in search queries &
content
So, just what is Google BERT update?

@dawnieando
• Used globally in all
languages on featured
snippets
• BERT to impact rankings for 1
in 10 queries
• Initially for English language
queries in US
The bottom line search announcement

@dawnieando
Dec 2019 – BERT expands internationally
• Over 70 languages
• Still only impacts 10% of
queries despite the
considerable expansion
• Still all featured snippets
globally

@dawnieando
• BERT deals with
ambiguity & ‘nuance’ in
queries & content
• Unlikely to impact short
queries
• More likely to impact
conversational queries
• Unlikely to impact
branded queries
Why just 10% of Google Queries Impacted?

@dawnieando
• The SEO community is
abuzz
• BERT is a big deal
• Likened to ‘Rank Brain’ in
some of the ‘interesting’
interpretations
• Some confusions around
‘What BERT is and what it
means for search’
SEO’s React

@dawnieando
!A neural network-based
technique for natural language
processing pre-training
!An anagram of Bi-Directional
Encoder Representations from
Transformers
BERT in Geek Speak

@dawnieando
!Bi-directional
!Encoder
!Representations
From
!Transformers
Let’s Visit The B – E – R – T Explanations Later

@dawnieando
• Search algorithm update
• Open source pre-trained model / framework for
natural language understanding
• Academic research paper
• Evolving tool for computational linguistics efficiency
• Beginning of MANY BERT’ish language models
Important: BERT is Many Things

@dawnieando
So What’s The Backstory?
Where%did%BERT%come%from?
Where%did%the%need%for%BERT%arise?
The$Impact$of$BERT$for$SEO$&$beyond?
What%next?

@dawnieando
• Academic Paper
• Research Project by Devlin et al
• Published a year before the
update in October 2018
• Bert: Pre-training of deep
bidirectional transformers for
language understanding
BERT started as a research paper in 2018

@dawnieando
• Open sourced so anyone can
build a BERT
• BERT created a sea-change
leap-forward in natural language
understanding in information
retrieval very quickly
• Provided a pre-trained language
model which required only fine-
tuning
BERT Open Sourced in 2018

@dawnieando
The whole of the English
Wikipedia & The Books
Corpus combined.
Over 2,500 million words
BERT Has Been Pre-Trained On Many Words

@dawnieando
Vanilla BERT provides a pre-
trained starting point layer for
neural networks in machine
learning & natural language
diverse tasks
The machine learning community got very
excited about BERT

@dawnieando
• BERT is fine-tuned on a variety of
downstream NLP tasks, including
question and answer datasets
BERT Can Be Fine-Tuned in A Short Space of Time

@dawnieando
• Vanilla BERT can be used ‘out of the box’
or fine-tuned
• Provides a great starting point & saves
huge amounts of time & money
• Those wishing to, ‘can build upon’, and
improve BERT
BERT Saves Researchers Time AND Money

@dawnieando
• Microsoft – MT-DNN
• Facebook – RoBERTa
• XLNet
• ERNIE – Baidu
• Lots of other
contenders
Since 2018 Major tech companies extend BERT

@dawnieando
Training Datasets Like MSMARCO Are Used To Fine
Tune With Question & Answer Datasets

@dawnieando
Real Bing Questions Feed MSMARCO
Microsoft Machine
Reading
Comprehension
Dataset. Real Bing
User Queries for NLU
Research

@dawnieando
You think SEOs are competitive? ML Engineers are
more so
• GLUE
• SuperGLUE
• MSMARCO
• SQuAD
…And Leaderboards

@dawnieando
SuperGLUE
was created
because
GLUE got too
easy
Progress was phenomenal with many new SOTAs

@dawnieando
Language models like
BERT help machines
understand the nuance
in word’s context and
surrounding text
cohesion
What Purpose Does BERT Serve & How?

@dawnieando
• Dates back over 60 years old to the Turing Test paper
• Aims at understanding the way words fit together with
structure and meaning.
• NLU is Connected to the field of linguistics (computational
linguistics)
• Over time, increasingly computational linguistics
overflows to a growing online web of content
What is Natural Language Understanding?

@dawnieando
• Natural language
understanding
requires:
• Word’s context
• Common sense reasoning
Natural Language Recognition is NOT
Understanding

@dawnieando
Humans mostly
understand nuance and
jargon from multiple
meanings in written and
spoken word because
of ‘context’
Humans ‘Naturally’ Understand Context

@dawnieando
• Synonymous
• Polysemous
• Homonymous
But Words Can Be VERY Problematic for
Machines & Sometimes Even for Humans

@dawnieando
“The meaning of a word is its use in a
language” (Ludwig, Wittgenstein,
Philosopher, 1953)
Image attribution: Mortiz, Nahr
(Public domain)
Single Words Have No Meaning

@dawnieando
The word ‘like’ in this sentence, is both a:
!(VBP) : (‘verb’ (non 3rd-person, singular,
present) )
!(IN) : (Preposition or subordinating
conjunction)
An Example of Word’s Meaning Changing
• I -> PRP
• Like -> VBP
• That -> IN
• He -> PRP
• Is -> VBZ
• Like -> IN
• That -> DT

@dawnieando
Linguists
Tag ‘Parts
of Speech’

@dawnieando
E.g. Verbs, nouns, adjectives
• Penn-treebank tagger -> 36
different parts of speech
• CLAWS7 (C7) -> 146 different
parts of speech
• Brown Corpus Tagger -> 81
different parts of speech
Words Are ‘Part of Speech’ When Combined

@dawnieando
• He kicked the bucket
• I have yet to tick that off
my bucket list
• The bucket was filled with
water
The Meaning of The Word ‘Bucket’ Changes

@dawnieando
Words Need ’Text Cohesion’
The$‘Glue’$which$adds$meaning
May$historically$be$‘stop$words’
Surrounding)words)can)change)‘intent’
They%add%‘context’

@dawnieando
”Ambiguity is the greatest bottleneck to computational
knowledge acquisition, the killer problem of all natural
language processing.”
(Stephen Clark, formerly of Cambridge University & now a full-
time research scientist with Google Deep Mind)
Ambiguity Is Problematic

@dawnieando
• Words with a similar meaning to something else
• Example: humorous, comical, hilarious, hysterical are ALL
synonyms of funny
Synonymous (Synonyms)

@dawnieando
Ambiguity & Polysemy
• Ambiguity is at a sentence level
• Polysemous words are arguably the
most problematic due to ‘nuanced’
nature

@dawnieando
• Words usually with the
same root and multiple
meanings
• Example: “Run” has 396
Oxford English Dictionary
definitions
Polysemous (Polysemy)

@dawnieando
•Over%40%%of%English%words%are%
polysemous)(McCarthy,)1997;)
Durkin'&'Manning,'1989)

@dawnieando
• Words spelt the same but with very different ‘root’ of word
meanings
• Example: pen (writing implement), pen (pig pen)
• Example: rose (stood up / ascended), rose (flower)
• Example: bark (dog sound), bark (tree bark)
Homonyms

@dawnieando
Spelt differently with
VERY different
meanings but
sound exactly the
same
• Draft, draught
• Dual, duel
• Made, maid
• For, fore, four
• To, too, two
• There, their
• Where, wear, were
Homophones – Difficult To Disambiguate Verbally

@dawnieando
Fork handles
Four candles
Very difficult to
disambiguate in spoken
word
Worse When Words are Joined Together

@dawnieando
Did you want four candles or fork
handles?
Much Comedy Comes From ‘Play
on Words’

@dawnieando
Which Does Not Bode Well For Voice Search

@dawnieando
EXAMPLES
• Zipfian Distribution
• Firthian Linguistics
• Treebanks
• Language can be tied back to
mathematical spaces & algorithms
Language Has Natural Patterns & Phenomena

@dawnieando
Example: Zipfian Distribution (Power Law)
• The frequency of any
word in a collection is
inversely proportional to
its rank in the frequency
table
• Applies to any word
frequency ANYWHERE
• Image is 30 Wikipedias

@dawnieando
To illustrate Zipfian Distribution (Most used Words):
Rank Word Frequency)of)Use)in)a)Corpus
1 the
2 be 1/2
3 to 1/3
4 of 1/4
5 and 1/5
6 a 1/6
7 in 1/7
8 that 1/8
9 have 1/9
10 I 1/10

@dawnieando
“You shall know a word by the
company it keeps” (Firth, 1957)
Firthian Linguistics
One Such Phenomenon is Co-occurrence

@dawnieando
Words with similar meaning tend
to live near each other in a body
of text
Word’s ‘nearness’ can be
measured in mathematical vector
spaces – a context vector is
‘word’s company’
Distributional Relatedness & Firthian Linguistics

@dawnieando
Co-occurrence, Similarity & Relatedness
• Language models
are trained on
large bodies of
text to learn
‘distributional
similarity’ (co-
occurrence)

@dawnieando
Context Vectors & Word Embeddings
• And build vector
space models for word
embeddings
• Models learn the
weights of similarity &
relatedness distances

@dawnieando
Context-Free Word Embeddings
• Past models have been context-free
embeddings
• They lacked the ‘text-cohesion necessary to
understand a word in context

@dawnieando
• He kicked the bucket
• I have yet to tick that off
my bucket list
• The bucket was filled with
water
Remember ‘bucket’ Without Text Cohesion?

@dawnieando
Word’s Context Still Needed Gaps Filling
• Past models used
context-free
embeddings
• A moving
‘context window’
was used to gain
word’s context

@dawnieando
But Even Then True Context Needs Both Sides of a
Word
• Past models were
‘uni-directional’
• The context
window moved
from left to right
or right to left

@dawnieando
They Didn’t Look At Words On Either Side
Simultaneously

@dawnieando
!Bi-directional
!Encoder
!Representations
From
!Transformers
So What About That B – E – R – T explanation?

@dawnieando
• BERT can see the
word’s context on
both sides of a
word in a context
window
Bi-Directional is The B in BERT

@dawnieando
!Encoder Representations
relates to the input and
output process of ‘word’s
context’ & embeddings
What About Encoder Representations?

@dawnieando
!Transformer is a big
deal
!Derived from a 2017
paper called ‘Attention
is all you Need’ (Vaswani, A.,
Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A.N., Kaiser, Ł. and Polosukhin,
I., 2017)
What About The Transformer Part?

@dawnieando
Transformer & Attention
Works out how important words are to each
other in a given context & focuses attention

@dawnieando
!Bi-directional
!Encoder
!Representations
From
!Transformers
This Technology Provides Word’s Context

@dawnieando
River Bank or Financial Bank?
By identifying ‘cheque’ or
‘deposit’ in the company
of ‘bank’ BERT can
disambiguate from a ‘river’
bank

@dawnieando
So Where is BERT’s Value in Google Search
• Named entity determination
• Textual entailment (next sentence prediction)
• Coreference resolution
• Question answering
• Word sense disambiguation
• Automatic summarization
• Polysemy resolution

@dawnieando
BERT recognizes the
word ‘to’ makes all
the difference to the
intent of the query
BERT and Disambiguating Nuance

@dawnieando
BERT recognizes
the ambiguous
word ‘stand’s
meaning and
importance in the
context of the
query
BERT and Disambiguating Nuance

@dawnieando
!A single word can change the whole intent of a query
!Conversational queries particularly so
!The ‘stop words’ are actually part of text-cohesion
!Historically ‘stop-words’ were often ignored
!The next sentence matters
BERT and Intent Understanding

@dawnieando
Example:
“I remember what my
Grandad said just
before he kicked the
bucket.”
Next Sentence Prediction (Textual Entailment)
Often the next sentence REALLY matters

@dawnieando
“How far do you
reckon I can kick this
bucket?”
Not What You Expected?

@dawnieando
• There have been lots of improvement by others upon
BERT
• Google have likely improved dramatically on BERT too
• There were some issues with next-sentence prediction
• Facebook built RoBERTa
BERT Probably Doesn’t Resemble The Original BERT
Paper

@dawnieando
• Named entity determination
• Coreference resolution
• Question answering
• Word sense disambiguation
• Automatic summarization
• Polysemy resolution
Featured Snippets
Knowledge Graph & Web Page Extraction
Together

@dawnieando
!BERT is multilingual from mono-lingual
!Other language specific BERTs are being built
!Transformer was trained on international translations
!Language has transferrable phenomena
BERT and International SEO
Expect Big Things

@dawnieando
• Deepset – German BERT
• CamemBERT – French BERT
• AlBERTo – Italian BERT
• RobBERT - Dutch RoBERTa model
BERT & International SEO

@dawnieando
!The challenges of Pygmalion
!Conversational search can now ‘scale’
!BERT takes away some of the human
labelling effort necessary
!Next sentence prediction could impact
assistants and clarifying questions
BERT and Conversational Search
Expect Big Things

@dawnieando
Semantic Heterogeneity Issues in Entity Oriented
Search (Semantic Search)
!Helps with anaphora & cataphora
resolution (resolving pronouns of entities)
!Helps with coreference resolution
!Helps with named entity determination
!Next sentence prediction could impact
assistants and clarifying questions

@dawnieando
Bing has been BERTing since April 2019
• Impacts ALL Bing
queries globally

@dawnieando
• It’s supposed to be natural
• In the same way you can’t optimize for Rank
Brain you can’t optimize for BERT
• BERT is a tool / learning process in search for
disambiguation & contextual understanding of
words
• BERT is a ‘black-box’ algorithm
Why can’t you optimize for BERT?

@dawnieando
• Black-box algorithm
• Hugging Face coined the phrase
BERTology
• Now a field of study exploring why
BERT makes choices
• Some concerns over bias &
responsible AI
Black Box Algorithms & BERTology

@dawnieando
!Cluster together content and interlink well on topic & nuance
!Avoid ‘too-similar’ completing categories - merge
!Consider not just the content in the page but the content in
the linked pages & sections
!Consider the content of the ‘whole domain’ as everything
contributes in co-occurrence
!Be extra vigilant when ‘pruning
Utilising Co-Occurrence Strategically
Employ Relatedness

@dawnieando
Categorisation & Subcategorisation Are King
• Employ strong
conceptual logic in your
site architecture
• Be careful with random
blogs
• If you must ‘tag’, tag
thoughfully

@dawnieando
Anyone can build a BERT to train their own
language processing system for a variety of
natural language understanding downstream
tasks.
Fine-tuning can be carried out in a short time
BERT represents a union of data science and SEO
Anyone Can Use BERT – BERT is a Tool

@dawnieando
• Automatic categorization & subcategorization of
content
• Automatic generation of meta-descriptions
• Automatic summarization of extracts & teasers
• Categorising user-generated content / posts
probably better than humans
How Could BERT Be Harnessed For Efficiency
in SEO? A Few Examples

@dawnieando
• J R Oakes - @jroakes
• Hamlet Batista - @hamletbatista
• Andrea Volpini - @cyberandy
• Gefen Hermesh - @ghermesh
SEOs Are Getting Busy With BERTishness

@dawnieando
Efficiency is Also A Focus
DistilBERT (Hugging'Face)
ALBERT'(Google)
Fast%BERT

@dawnieando
• Original BERT was computationally expensive to
run
• ALBERT stands for A Lite BERT
• Increased efficiency
• ALBERT is BERT’s natural successor
• ALBERT much leaner whilst providing similar
results
• A joint research work between Google & Toyota
ALBERT – BERT’s Successor

@dawnieando
Reformer (Google) – Transformer’s Successor
Understands word’s context
from the perspective of a
‘whole novel’.
https://venturebeat.com/2020/01/16/goog
les-ai-language-model-reformer-can-
process-the-entirety-of-novels/

@dawnieando
Growth has been huge in the natural language
processing community – Current Superglue
Leaderboard
BERT Was Just The Start
• Google T5 is winning
• Even more
advanced
technology
• Transfer-learning
• Expect big things

@dawnieando
SEE YOU AT THE NEXT SMX!

Google BERT - What SEOs and Marketers Need to Know

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Google BERT - What SEOs and Marketers Need to Know

Similar to Google BERT - What SEOs and Marketers Need to Know (20)

More from Dawn Anderson MSc DigM

More from Dawn Anderson MSc DigM (20)

Recently uploaded

Recently uploaded (20)

Google BERT - What SEOs and Marketers Need to Know