Poster presented at the Semeval 2015 workshop. Our system clustered words based on their contexts in order to identify their underlying meanings or senses.
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
Duluth : Word Sense Discrimination in the Service of Lexicography
1. Duluth : Word Sense Discrimination
in the Service of Lexicography
SemEval 2015 - Task 15
Corpus Pattern Analysis
Ted Pedersen
University of Minnesota, Duluth
tpederse@d.umn.edu
http://senseclusters.sourceforge.net
2. The Task?
Corpus Pattern Analysis
● CPA parsing : syntactic parsing
and semantic role labeling
● CPA clustering: group together
semantically similar contexts
● CPA lexicography: describe verb
patterns based on syntax and
semantics
4. Duluth systems
● Participated in Subtask 2
● Viewed as classical word sense discrimination (or
induction) problem
– Given N target words in context, group into
k clusters based on the similarity of the
contexts
● Automatically discovered number of senses
● AKA SenseClusters
– http://senseclusters.sourceforge.net
5. Pre-processing
● Remove non alphanumeric values
● Convert all text to lower case
● Convert all numeric values to a single
generic string
6. 1st
order features
● If each context is represented as a
vector of features, find the
contexts with the most values in
common
● How many words in each context
are the same?
● Contexts with larger number of
shared words are considered to be
clusters
7. 1st
order example
● i operate a machine
● my surgeon will operate on me today
● he can operate the lathe
● your doctor operated with skill and
confidence
● … no matches among the contexts
(other than the target word)
8. 2nd
order co-occurrence features
● If each context is represented as a
vector of features, find the
contexts that have the most
friends in common
● Each (content) word in a context is
replaced by a vector of co-
occurring words
9. 2nd
order co-occurrence example
● Machine → part, drill, shop
● Lathe → part, drill, mill
● Surgeon → scalpel, nurse, prescribe
● Doctor → waiting, nurse, prescribe
10. 2nd
order co-occurrence example
● i operate a (part, drill, shop)
● my (scalpel, nurse, prescribe) will
operate on me today
● he can operate the (part, drill, mill)
● your (waiting, nurse, prescribe)
operated with skill and confidence
11. run1
●
2nd
order co-occurrences
● Features found within contexts
– Words that occur within 8
positions of target verb 2 or
more times
– Target word co-occurrences (tco)
– Stop words retained
12. run2
●
2nd
order co-occurrences
● Features found in WordNet glosses
– Adjacent words that occur 5 or
more times together
– Bigrams (bi)
– Any bigram where both words are
stop word is removed
16. Lessons?
● Verbs are (still) hard
– Many methods and previous Semeval
tasks geared towards nouns
● External corpus (WordNet) not helpful
● Unigrams surprisingly effective
● Human lexicographer job security is robust
– for now