1. Seman&c
Analysis
in
Language
Technology
http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm
Word Senses
Marina
San(ni
san$nim@stp.lingfil.uu.se
Department
of
Linguis(cs
and
Philology
Uppsala
University,
Uppsala,
Sweden
Spring
2016
2. Previous
Lecture:
Sen$ment
Analysis
• Affec(ve
meaning
is
a
kind
of
connota(onal
meaning
• The
importance
of
sen(ment
lexicons
• Methods
for
the
automa(c
expansion
of
manually-‐
annotated
lexicons
• A
baseline
algorithm:
Naive
Bayes
• Prac(cal
ac(vity:
ML-‐based
sen(ment
classifier:
movie
reviews,
product
reviews,
restaurant
reviews…
• Results…
uhm…
somehow
biassed
(short
text
vs
long
texts);
never
a
full
posi(ve
polarity;
etc.
2
Lecture
4:
Word
Senses
3. How
is
*meaning*
handled
in
Seman$c-‐Based
LT-‐Applica$ons?
• Seman(c
Role
Labelling/Predicate-‐Argument
Structure
• Main
trend:
• crea(on
of
annotated
resources
(PropBank,
FrameNet,
etc.);
• use
of
supervised
machine
learning:
classifiers
are
trained
on
annotated
resources,
such
as
PropBank
and
FrameNet.
• Sen(ment
Analysis
• Main
trends:
• iden(fica(on
of
sen(ment-‐bearing
features
or
iden(fica(on
of
representa(ve
features
for
the
problem
• use
of
supervervised
learning
à
cf
the
results
of
NLTK
classifier
• Word
sense
disambigua(on
(???)
• Informa(on
extrac(on
(???)
• Ques(on
Answering
(???)
• Ontologies
(???)
Lecture
4:
Word
Senses
3
4. Reminder:
Glossary
Entries
• Which
concepts
are
the
most
salient
in
Lect
3?
• Update
your
Glossary…
Lecture
4:
Word
Senses
4
6. Word
Senses
• Master
Students
à
NLP
course
• Bachelor
Students
à
Seman(cs
6
Lecture
4:
Word
Senses
7. From
Lect
2:
PropBank
&
Selec$onal
Restric$ons
• PropBank
is
organized
by
word
senses
:
word
senses
are
different
aspects
of
meaning
of
a
word
• Selec(onal
restric(ons…
we
can
use
seman(c
constraints
to
disambiguate
senses.
Ex:
eat,
serve….
bear
me
some
pa(ence…
7
Lecture
4:
Word
Senses
8. Acknowledgements
Most
slides
borrowed
from:
Dan
Jurafsky
and
James
H.
Mar(n
Some
slides
borrowed
from
D.
Jurafsky
and
C.
Manning
and
D.
Radev
(Coursera)
J&M(2015,
draf):
hgps://web.stanford.edu/~jurafsky/slp3/
9. Outline
• Word
Meaning
• WordNet
and
Other
Lexical
Resources
• Selec(onal
Restric(ons
9
Lecture
4:
Word
Senses
10. Logic:
meaning
representa$on:
uppercase
words!
• Constant
• Variables
• Predicates
• Boolean
connec(ves
• Quan(fiers
• Brackets
and
comma
to
group
the
symbols
together
• Ex:
A
woman
crosses
Sunset
Boulevard
Lecture
4:
Word
Senses
10
Formal
Seman(cs
11. Defini$ons
• Lexical
seman$cs
is
the
study
of
the
meaning
of
words
and
the
systema(c
meaning-‐related
connec(ons
between
words.
• A
word
sense
is
the
locus
of
word
meaning;
defini(ons
and
meaning
rela(ons
are
defined
at
the
level
of
the
word
sense
rather
than
wordforms.
• Homonymy
is
the
rela(on
between
unrelated
senses
that
share
a
form.
• Polysemy
is
the
rela(on
between
related
senses
that
share
a
form.
• Synonymy
holds
between
different
words
with
the
same
meaning.
• Hyponymy
and
hypernymy
rela(ons
hold
between
words
that
are
in
a
class
inclusion
rela(onship.
• Meronymy
type
of
hierarchy
that
deals
with
part–whole
rela(onships.
• WordNet
is
a
large
database
of
lexical
rela(ons
for
English
11
Lecture
4:
Word
Senses
13. Reminder:
lemma
and
wordform
• A
lemma
or
cita$on
form
• Same
stem,
part
of
speech,
rough
seman(cs
• A
wordform
• The
“inflected”
word
as
it
appears
in
text
Wordform
Lemma
banks
bank
sung
sing
duermes
dormir
Lecture
4:
Word
Senses
13
Cf.
token/type
ra(o:
crude
measure
of
lexical
densi(y:
If a text is 1,000 words long,
it is said to have 1,000
"tokens". But a lot of these
words will be repeated, and
there may be only say 400
different words in the text.
"Types", therefore, are the
different words.
The ratio between types and
tokens in this example would
be 40%. (source: wordsmith
tools)
14. Lemmas
have
senses
• One
lemma
“bank”
can
have
many
meanings:
• …a bank can hold the investments in a custodial
account…!
• “…as agriculture burgeons on the east bank the
river will shrink even more”
• Sense
(or
word
sense)
• A
discrete
representa(on
of
an
aspect
of
a
word’s
meaning.
• The
lemma
bank
here
has
two
senses
1!
2!
Sense
1:
Sense
2:
Lecture
4:
Word
Senses
14
16. Homonymy
Homonyms:
words
that
share
a
form
but
have
unrelated,
dis(nct
meanings:
• bank1:
financial
ins(tu(on,
bank2:
sloping
land
• bat1:
club
for
hiqng
a
ball,
bat2:
nocturnal
flying
mammal
1. Homographs
(bank/bank,
bat/bat)
2. Homophones:
1. Write
and
right
2. Piece
and
peace
Lecture
4:
Word
Senses
16
17. Homonymy
causes
problems
for
NLP
applica$ons
• Informa(on
retrieval
• “bat care”!
• Machine
Transla(on
• bat:
murciélago
(animal)
or
bate
(for
baseball)
• Text-‐to-‐Speech
• bass
(stringed
instrument)
vs.
bass
(fish)
• There
would
be
no
ambiguity
for
Speech
to
Text:
why?
Lecture
4:
Word
Senses
17
18. Polysemy
• 1.
The
bank
was
constructed
in
1875
out
of
local
red
brick.
• 2.
I
withdrew
the
money
from
the
bank
• Are
those
the
same
sense?
• Sense
2:
“A
financial
ins(tu(on”
• Sense
1:
“The
building
belonging
to
a
financial
ins(tu(on”
• A
polysemous
word
has
related
meanings
• Most
non-‐rare
words
have
mul(ple
meanings
Lecture
4:
Word
Senses
18
20. • Lots
of
types
of
polysemy
are
systema(c
• School, university, hospital!
• All
can
mean
the
ins(tu(on
or
the
building.
• A
systema(c
rela(onship:
• Building
Organiza(on
• Other
such
kinds
of
systema(c
polysemy:
Author
(Jane Austen wrote Emma)
Works
of
Author
(I love Jane Austen)
Tree
(Plums have beautiful blossoms) !
!Fruit
(I ate a preserved plum)!
Metonymy
or
Systema$c
Polysemy:
A
systema$c
rela$onship
between
senses
Lecture
4:
Word
Senses
20
21. How
do
we
know
when
a
word
has
more
than
one
sense?
Lecture
4:
Word
Senses
21
22. How
do
we
know
when
a
word
has
more
than
one
sense?
• The
“zeugma”
test:
Two
senses
of
serve?
• Which flights serve breakfast?!
• Does Lufthansa serve Philadelphia?!
• ?Does
Lufhansa
serve
breakfast
and
San
Jose?
• Since
this
conjunc(on
sounds
weird,
• we
say
that
these
are
two
different
senses
of
“serve”
Lecture
4:
Word
Senses
22
23. Synonyms
• Word
that
have
the
same
meaning
in
some
or
all
contexts.
• filbert
/
hazelnut
• couch
/
sofa
• big
/
large
• automobile
/
car
• vomit
/
throw
up
• Water
/
H20
• Two
lexemes
are
synonyms
• if
they
can
be
subs(tuted
for
each
other
in
all
situa(ons
• If
so
they
have
the
same
proposi$onal
meaning
Lecture
4:
Word
Senses
23
24. Synonyms
• But
there
are
few
(or
no)
examples
of
perfect
synonymy.
• Even
if
many
aspects
of
meaning
are
iden(cal
• S(ll
may
not
preserve
the
acceptability
based
on
no(ons
of
politeness,
slang,
register,
genre,
etc.
• Example:
• Water/H20
• Big/large
• Brave/courageous
high
brow:
la(nate
words
Lecture
4:
Word
Senses
24
25. Synonymy
is
a
rela$on
between
senses
rather
than
words
• Consider
the
words
big
and
large
• Are
they
synonyms?
• How
big
is
that
plane?
• Would
I
be
flying
on
a
large
or
small
plane?
• How
about
here:
• Miss
Nelson
became
a
kind
of
big
sister
to
Benjamin.
• ?Miss
Nelson
became
a
kind
of
large
sister
to
Benjamin.
• Why?
• big
has
a
sense
that
means
being
older,
or
grown
up
• large
lacks
this
sense
Lecture
4:
Word
Senses
25
28. Antonyms
• Senses
that
are
opposites
with
respect
to
one
feature
of
meaning
• Otherwise,
they
are
very
similar!
dark/light short/long !fast/slow !rise/fall!
hot/cold! up/down! in/out!
• More
formally:
antonyms
can
• define
a
binary
opposi(on
or
be
at
opposite
ends
of
a
scale
•
long/short, fast/slow!
• Be
reversives:
• rise/fall, up/down!
Lecture
4:
Word
Senses
28
29. Hyponymy
and
Hypernymy
• One
sense
is
a
hyponym
of
another
if
the
first
sense
is
more
specific,
deno(ng
a
subclass
of
the
other
• car
is
a
hyponym
of
vehicle
• mango
is
a
hyponym
of
fruit
• Conversely
hypernym/superordinate
(“hyper
is
super”)
• vehicle
is
a
hypernym
of
car
• fruit
is
a
hypernym
of
mango
Superordinate/hyper vehicle fruit furniture
Subordinate/hyponym car mango chair Lecture
4:
Word
Senses
29
30. Hyponymy
more
formally
• Extensional:
• The
class
denoted
by
the
superordinate
extensionally
includes
the
class
denoted
by
the
hyponym
• Entailment:
• A
sense
A
is
a
hyponym
of
sense
B
if
being
an
A
entails
being
a
B
• Hyponymy
is
usually
transi(ve
• (A
hypo
B
and
B
hypo
C
entails
A
hypo
C)
• Another
name:
the
IS-‐A
hierarchy
• A
IS-‐A
B
(or
A
ISA
B)
• B
subsumes
A
Lecture
4:
Word
Senses
30
31. Hyponyms
and
Instances
• WordNet
has
both
classes
and
instances.
• An
instance
is
an
individual,
a
proper
noun
that
is
a
unique
en(ty
• San Francisco is
an
instance
of
city!
• But
city
is
a
class
• city
is
a
hyponym
of
municipality...location...!
31
Lecture
4:
Word
Senses
36. How
is
“sense”
defined
in
WordNet?
• The
synset
(synonym
set),
the
set
of
near-‐synonyms,
instan(ates
a
sense
or
concept,
with
a
gloss
• Example:
chump
as
a
noun
with
the
gloss:
“a
person
who
is
gullible
and
easy
to
take
advantage
of”
• This
sense
of
“chump”
is
shared
by
9
words:
chump1, fool2, gull1, mark9, patsy1, fall guy1,
sucker1, soft touch1, mug2!
• Each
of
these
senses
have
this
same
gloss
• (Not
every
sense;
sense
2
of
gull
is
the
aqua(c
bird)
Lecture
4:
Word
Senses
36
gullible=naive
45. WordNet
3.0
• A
hierarchically
organized
lexical
database
• On-‐line
thesaurus
+
aspects
of
a
dic(onary
• Some
other
languages
available
or
under
development
• (Arabic,
Finnish,
German,
Portuguese…)
Category
Unique
Strings
Noun
117,798
Verb
11,529
Adjec(ve
22,479
Adverb
4,481
Lecture
4:
Word
Senses
45
49. WordNet
3.0
• Where
it
is:
• hgp://wordnetweb.princeton.edu/perl/webwn
• Libraries
• Python:
WordNet
from
NLTK
• hgp://www.nltk.org/Home
• Java:
• JWNL,
extJWNL
on
sourceforge
Lecture
4:
Word
Senses
50. Synset
• MeSH
(Medical
Subject
Headings)
• 177,000
entry
terms
that
correspond
to
26,142
biomedical
“headings”
• Hemoglobins
Entry
Terms:
Eryhem,
Ferrous
Hemoglobin,
Hemoglobin
Defini$on:
The
oxygen-‐carrying
proteins
of
ERYTHROCYTES.
They
are
found
in
all
vertebrates
and
some
invertebrates.
The
number
of
globin
subunits
in
the
hemoglobin
quaternary
structure
differs
between
species.
Structures
range
from
monomeric
to
a
variety
of
mul(meric
arrangements
MeSH:
Medical
Subject
Headings
thesaurus
from
the
Na$onal
Library
of
Medicine
Lecture
4:
Word
Senses
50
52. Uses
of
the
MeSH
Ontology
• Provide
synonyms
(“entry
terms”)
• E.g.,
glucose
and
dextrose
• Provide
hypernyms
(from
the
hierarchy)
• E.g.,
glucose
ISA
monosaccharide
• Indexing
in
MEDLINE/PubMED
database
• NLM’s
bibliographic
database:
• 20
million
journal
ar(cles
• Each
ar(cle
hand-‐assigned
10-‐20
MeSH
terms
Lecture
4:
Word
Senses
52
56. Selec$onal
Restric$ons
Consider
the
two
interpreta(ons
of:
I
want
to
eat
someplace
nearby.
a) sensible:
Eat
is
intransi(ve
and
“someplace
nearby”
is
a
loca(on
adjunct
b) Speaker
is
Godzilla:
a
monster
that
likes
ea(ng
buildings!!!
Eat
is
transi(ve
and
“someplace
nearby”
is
a
direct
object
How
do
we
know
speaker
didn’t
mean
b)
?
Because
the
THEME
of
ea(ng
tends
to
be
something
edible
56
Lecture
4:
Word
Senses
57. Selec$onal
restric$ons
are
associated
with
senses
• The
restaurant
serves
green-‐lipped
mussels.
• THEME
is
some
kind
of
food
• Which
airlines
serve
Denver?
• THEME
is
an
appropriate
loca(on
57
Lecture
4:
Word
Senses
apply
zeugma
test
58. Represen$ng
selec$onal
restric$ons
t consists of a single variable that stands for the event, a predicat
f event, and variables and relations for the event roles. Ignoring t
ctures and using thematic roles rather than deep event roles, the
on of a verb like eat might look like the following:
9e,x,y Eating(e)^Agent(e,x)^Theme(e,y)
representation, all we know about y, the filler of the THEME ro
iated with an Eating event through the Theme relation. To sti
l restriction that y must be something edible, we simply add a ne
:
9e,x,y Eating(e)^Agent(e,x)^Theme(e,y)^EdibleThing(y)58
ntribution of a verb like eat might look like the following:
9e,x,y Eating(e)^Agent(e,x)^Theme(e,y)
th this representation, all we know about y, the filler of the THEME role, is th
s associated with an Eating event through the Theme relation. To stipulate t
ectional restriction that y must be something edible, we simply add a new term
t effect:
9e,x,y Eating(e)^Agent(e,x)^Theme(e,y)^EdibleThing(y)
=> dish
=> nutriment, nourishment, nutrition...
=> food, nutrient
=> substance
=> matter
=> physical entity
=> entity
Figure 22.6 Evidence from WordNet that hamburgers are edible.
When a phrase like ate a hamburger is encountered, a semantic analyzer can
form the following kind of representation:
9e,x,y Eating(e)^Eater(e,x)^Theme(e,y)^EdibleThing(y)^Hamburger(y)
This representation is perfectly reasonable since the membership of y in the category
Hamburger is consistent with its membership in the category EdibleThing, assuming
Instead
of
represen(ng
“eat”
as:
Just
add:
And
“eat
a
hamburger”
becomes
But
this
assumes
we
have
a
large
knowledge
base
of
facts
about
edible
things
and
hamburgers
and
whatnot.
59. Let’s
use
WordNet
synsets
to
specify
selec$onal
restric$ons
• The
THEME
of
eat
must
be
WordNet
synset
{food,
nutrient}
“any
substance
that
can
be
metabolized
by
an
animal
to
give
energy
and
build
8ssue”
• Similarly
THEME
of
imagine:
synset
{en(ty}
THEME
of
li=:
synset
{physical
en(ty}
THEME
of
diagonalize:
synset
{matrix}
• This
allows
imagine
a
hamburger
and
li=
a
hamburger,
• Correctly
rules
out
diagonalize
a
hamburger.
59
Lecture
4:
Word
Senses