SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Align,	
  Disambiguate	
  and	
  Walk	
  	
  :	
  	
  
A	
  Unified	
  Approach	
  for	
  
Measuring	
  Seman7c	
  Similarity	
Mohammad	
  Taher	
  Pilehvar,	
  David	
  Jurgens	
  and	
  
Roberto	
  Navigli	
  
ACL	
  2013	
  
最先端NLP勉強会	
  #5@chiba	
  	
  2013/08/31	
  
紹介者	
  :	
  Koji	
  Matsuda	
  
13/09/03	
 snlp#5	
  matsuda	
 1	
2013/09/03	
  改訂
Sentence	
  Textual	
  Similarity	
  (STS)	
13/09/03	
 snlp#5	
  matsuda	
 2	
Measure	
  the	
  degree	
  of	
  seman7c	
  equivalence	
  between	
  two	
  sentences	
NOTE:	
  Differ	
  from	
  Textual	
  Entailment(TE)	
  and	
  Paraphrase(PARA)	
  
•  	
  TE	
  	
  	
  	
  	
  	
  :	
  	
  STS	
  assumes	
  symmetric	
  and	
  graded	
  equivalence	
  of	
  the	
  pair	
  
•  PARA	
  :	
  	
  STS	
  need	
  incorporates	
  graded	
  seman7c	
  similarity	
[Agirre+,	
  SemEval-­‐2012]	
→ STS	
  is	
  more	
  directly	
  applicable	
  number	
  of	
  NLP	
  tasks	
  
MT,	
  Summariza7on,	
  Deep	
  QA,	
  etc.
Example	
•  Surface	
  Based	
  Approach	
  :	
  
•  labeled	
  DISSIMILAR	
  due	
  to	
  minimal	
  lexical	
  overlap	
  
•  Sense	
  Representa7on	
  Based	
  Approach:	
  
•  enables	
  consider	
  similarity	
  between	
  meanings	
  of	
  the	
  word	
  
•  (e.g.	
  	
  fire	
  and	
  terminate)	
  
•  but,	
  difficult	
  to	
  incorporate	
  those	
  informa7on	
  
•  due	
  to	
  Polysemy,	
  Representa7on	
  of	
  individual	
  sense	
13/09/03	
 snlp#5	
  matsuda	
 3
Seman7c	
  Similarity	
  at	
  mul7ple	
  Levels	
Sense	
 Sense	
Word	
 Word	
Text	
 Text	
13/09/03	
 snlp#5	
  matsuda	
 4
Seman7c	
  Similarity	
  at	
  mul7ple	
  Levels	
Sense	
 Sense	
Word	
 Word	
Text	
 Text	
Seman7c	
  
Signature	
Seman7c	
  
Signature	
1.  How	
  to	
  create	
  Seman7c	
  Signature?	
  
2.  How	
  to	
  calculate	
  Similarity	
  of	
  Seman7c	
  Signatures?	
13/09/03	
 snlp#5	
  matsuda	
 5	
Unified	
  Seman7c	
  Representa7on	
  of	
  
Lexical-­‐item	
  
(arbitrarily-­‐sized	
  piece	
  of	
  text,	
  or	
  sense)
Overview	
  of	
  Proposed	
  Method	
13/09/03	
 snlp#5	
  matsuda	
 6	
Random	
  Walk	
  over	
  	
  
the	
  WordNet	
  Graph	
Compare	
  Sense	
  Level	
  
Seman>c	
  Signatures	
  
-­‐  Cosine	
  
-­‐  Weighted	
  Overlap	
  
-­‐  Top-­‐k	
  Jaccard	
Note:	
  figure	
  from	
  slide	
  by	
  authors
Seman7c	
  Signatures	
•  mul7-­‐seeded	
  random	
  walk	
  over	
  WordNet	
  Graph	
Random	
  walk	
  over	
  
WordNet	
  Graph	
 Seman7c	
  Signature	
  
(mul7nomial	
  distribu7on	
  
over	
  senses(WordNet	
  Synset))	
Sense	
Word	
Text	
Set	
  of	
  
Senses	
seeds	
  
(v(0))	
13/09/03	
 snlp#5	
  matsuda	
 7
Personalized	
  PageRank	
13/09/03	
 snlp#5	
  matsuda	
 8	
Yellow	
  Node	
  	
  	
  :	
  Seed	
  Node(Synset)	
  
Red	
  Node	
  Size:	
  Probability	
  of	
  Synset	
  
Egde	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  WordNet	
  Rela7on	
  	
Note:	
  figure	
  from	
  slide	
  by	
  authors
Alignment-­‐Based	
  Disambigua7on	
•  How	
  to	
  extract	
  “Set	
  of	
  Senses”	
  (seeds)	
  from	
  
Text/Word?	
  
– Need	
  solve	
  WSD	
  
•  They	
  proposed	
  Alignment-­‐Based	
  WSD	
  
– Maximize	
  sum	
  of	
  similarity	
  between	
  two	
  text/
word	
  
– Can	
  use	
  arbitrary	
  similarity	
  measure	
  over	
  senses	
  
13/09/03	
 snlp#5	
  matsuda	
 9
Alignment-­‐Based	
  Disambigua7on	
manager	
 fire	
 worker	
employee	
terminate	
work	
  
boss	
R(man,emp)	
13/09/03	
 snlp#5	
  matsuda	
 10	
Word	
  Level	
  Alignment
Alignment-­‐Based	
  Disambigua7on	
manager	
 fire	
 worker	
employee	
terminate	
work	
  
boss	
R(man,emp)	
R(man,bos)	
R(man,ter)	
R(man,wor)	
13/09/03	
 snlp#5	
  matsuda	
 11	
Word	
  Level	
  Alignment	
←	
  Maximum	
  Relatedness	
  on	
  Word	
  Level
Alignment-­‐Based	
  Disambigua7on	
manager	
 fire	
 worker	
employee	
terminate	
work	
  
boss	
 R(man,bos)	
13/09/03	
 snlp#5	
  matsuda	
 12	
manager#1	
 manager#2	
boss	
  
#1	
boss	
  
#2	
R(m#1,b#1)	
R(m#1,b#2)	
R(m#2,b#1)	
R(m#2,b#2)	
Word	
  Level	
  Alignment	
 Sense	
  Level	
  Alignment
Alignment-­‐Based	
  Disambigua7on	
manager	
 fire	
 worker	
employee	
terminate	
work	
  
boss	
 R(man,bos)	
13/09/03	
 snlp#5	
  matsuda	
 13	
manager#1	
 manager#2	
boss	
  
#1	
boss	
  
#2	
R(m#1,b#1)	
R(m#1,b#2)	
R(m#2,b#1)	
R(m#2,b#2)	
Word	
  Level	
  Alignment	
 Sense	
  Level	
  Alignment	
↑	
  
Maximum	
  Relatedness	
  on	
  Sense
Alignment-­‐Based	
  Disambigua7on	
manager	
 fire	
 worker	
employee	
terminate	
work	
  
boss	
 R(man,bos)	
R(fir,ter)	
R(fir,wor)	
R(wor,emp)	
13/09/03	
 snlp#5	
  matsuda	
 14	
manager#1	
 manager#2	
boss	
  
#1	
boss	
  
#2	
R(m#1,b#2)	
Word	
  Level	
  Alignment	
 Sense	
  Level	
  Alignment
Alignment-­‐Based	
  Disambigua7on	
manager	
 fire	
 worker	
employee	
terminate	
work	
  
boss	
 R(man,bos)	
R(fir,ter)	
R(fir,wor)	
R(wor,emp)	
13/09/03	
 snlp#5	
  matsuda	
 15	
manager#1	
 manager#2	
boss	
  
#1	
boss	
  
#2	
R(m#1,b#2)	
Word	
  Level	
  Alignment	
 Sense	
  Level	
  Alignment	
Result	
  :
Seman7c	
  Signature	
  Similarity	
•  How	
  to	
  calculate	
  similarity	
  of	
  Seman7c	
  
Signatures?	
  
– Parametric	
  
•  Cosine	
  
– Non	
  Parametric(Rank-­‐Based)	
  
•  Weighted	
  Overlap	
  
•  Top-­‐k	
  Jaccard	
  
13/09/03	
 snlp#5	
  matsuda	
 16	
Sense	
  	
  	
  	
  	
  a	
  	
  	
  	
  	
  b	
  	
  	
  	
  c	
  	
  	
  	
  d	
  	
  	
  	
  e	
Sense	
  	
  	
  	
  	
  a	
  	
  	
  	
  	
  b	
  	
  	
  	
  c	
  	
  	
  	
  d	
  	
  	
  	
  e	
Compare
Seman7c	
  Signature	
  Similarity	
•  Weighted	
  Overlap	
  (ADWWO)	
13/09/03	
 snlp#5	
  matsuda	
 17	
Sense	
  	
  	
  	
  	
  a	
  	
  	
  	
  	
  b	
  	
  	
  	
  c	
  	
  	
  	
  d	
  	
  	
  	
  e	
Rank(r1)	
  	
  	
  	
  	
  	
  2	
  	
  	
  4	
  	
  	
  	
  1	
  	
  	
  	
  0	
  	
  	
  3	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  (r2)	
  	
  	
  	
  	
  	
  4	
  	
  	
  1	
  	
  	
  	
  2	
  	
  	
  	
  5	
  	
  	
  0	
•  Top-­‐k	
  Jaccard	
  (ADWJac)	
Sense	
  	
  	
  	
  	
  a	
  	
  	
  	
  	
  b	
  	
  	
  	
  c	
  	
  	
  	
  d	
  	
  	
  	
  e	
Rank(r1)	
  	
  	
  	
  	
  	
  2	
  	
  	
  4	
  	
  	
  	
  1	
  	
  	
  	
  5	
  	
  	
  3	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  (r2)	
  	
  	
  	
  	
  	
  4	
  	
  	
  1	
  	
  	
  	
  2	
  	
  	
  	
  5	
  	
  	
  3	
|{a,c,e}∩	
  {b,c,e}|	
  
	
  
|{a,c,e}∪{b,c,e}|	
Rjac	
  =	
  	
Rwo	
  =	
1	
  
	
  
(2+4)+(4+1)+(1+2)	
Max	
  when	
  same	
  sense	
  has	
  same	
  rank	
 Max	
  when	
  top-­‐k	
  sets	
  has	
  same	
  senses	
  
Overview	
  of	
  Proposed	
  Method	
13/09/03	
 snlp#5	
  matsuda	
 18	
Random	
  Walk	
  over	
  	
  
the	
  WordNet	
  Graph	
Compare	
  Sense	
  Level	
  
Seman>c	
  Signatures	
  
-­‐  Cosine	
  
-­‐  Weighted	
  Overlap	
  
-­‐  Top-­‐k	
  Jaccard	
Note:	
  figure	
  from	
  slide	
  by	
  authors
Experiments	
•  Textual	
  Similarity	
  
– SemEval-­‐2012	
  STS	
  task	
  [Agirre+,	
  SemEval2012]	
  
•  Word	
  Similarity	
  
– TOEFL	
  Dataset	
  	
  
– RG-­‐65	
  Dataset	
  
•  Sense	
  Similarity	
  
– Sense	
  Coarsening	
  (OntoNotes,	
  Senseval-­‐2)	
   	
  	
13/09/03	
 snlp#5	
  matsuda	
 19
Textual	
  Similarity	
•  SemEval	
  2012	
  STS	
  task	
  (task	
  17)	
  
	
  
•  Model	
  
–  Regression	
  (Gaussian	
  Process)	
  
–  Features	
  
•  Main	
  :	
  ADWcos,	
  ADWWO,	
  ADWJac(k=250,500,1000,2500)	
  
•  String-­‐Based	
  :	
  Longest	
  Common	
  Subsequence(Substring),	
  Greedy	
  String	
  Tiling,	
  character/
word	
  n-­‐gram	
  similarity	
  
id	
 Sentence	
 Score(0-­‐5)	
1	
The	
  bird	
  is	
  bathing	
  in	
  the	
  sink.	
0	
Birdie	
  is	
  washing	
  itself	
  in	
  the	
  water	
  basin.	
2	
In	
  May	
  2010,	
  the	
  troops	
  axempted	
  to	
  invade	
  Kabul.	
1	
The	
  US	
  army	
  invaded	
  Kabul	
  on	
  May	
  7th	
  last	
  year,	
  2010.	
3	
John	
  said	
  he	
  is	
  considered	
  a	
  witness	
  but	
  not	
  a	
  suspect.	
2	
"He	
  is	
  not	
  a	
  suspect	
  anymore."	
  John	
  said.	
4	
They	
  flew	
  out	
  of	
  the	
  nest	
  in	
  groups.	
3	
They	
  flew	
  into	
  the	
  nest	
  together.	
400	
   	
  750	
  pairs	
  *	
  5	
  Set	
13/09/03	
 snlp#5	
  matsuda	
 20
Textual	
  Similarity	
  Performance	
Table	
  2	
  :	
  Pearson	
  correla7on	
  coefficient	
13/09/03	
 snlp#5	
  matsuda	
 21
Textual	
  Similarity	
  (detail)	
13/09/03	
 snlp#5	
  matsuda	
 22	
Mpar	
  :	
  	
  MSR	
  Paraphrase	
  Corpus	
  (web	
  news)	
  	
  contain	
  many	
  named-­‐en7ty	
  
Mvid	
  :	
  	
  	
  MSR	
  Video	
  Paraphrase	
  Corpus	
  
SMTe	
  :	
  	
  French	
  to	
  English	
  SMT	
  result	
  and	
  Reference	
  Transla7on	
  pair	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  from	
  Europerl	
  Corpus	
  [ACL	
  2007,	
  2008	
  SMT	
  Workshop]	
  
SMTn	
  :	
  	
  Same	
  as	
  SMTe,	
  but	
  News	
  conversa7on	
  Corpus	
  is	
  used	
  
OnWN	
  :	
  Glosses	
  from	
  OntoNotes	
  and	
  WordNet
Textual	
  Similarity	
  (detail)	
13/09/03	
 snlp#5	
  matsuda	
 23	
DW	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  Without	
  performing	
  any	
  Alignment	
  
ADW-­‐MF	
  :	
  Main	
  feature	
  only	
  (	
  don’t	
  make	
  use	
  of	
  string	
  based	
  feature)	
•  Alignment	
  is	
  helpful	
  
•  In	
  Mper	
  dataset	
  (	
  contain	
  many	
  Named	
  En7ty	
  ),	
  	
  
	
  	
  	
  	
  	
  string-­‐based	
  method	
  is	
  strong	
  baseline	
  	
  	
  	
improve
Word	
  Similarity	
•  TOEFL	
  dataset	
  [Landauer	
  and	
  Dumais,	
  1997]	
  
– Synonym	
  selec7on	
  task	
  
– 80	
  mul7ple-­‐choice	
  ques7ons	
  
•  4	
  choice	
  per	
  ques7on	
  
•  RG-­‐65	
  dataset	
  [Rubenstein	
  amd	
  Goodenough,1965]	
  	
  
– Similarity	
  grading	
  for	
  word	
  pair	
  
– 65	
  word-­‐pair	
  	
  
•  Judged	
  by	
  51	
  human	
  subject	
  
– Scale	
  0	
  -­‐	
  4	
  
13/09/03	
 snlp#5	
  matsuda	
 24	
Note:	
  figure	
  from	
  slide	
  by	
  authors
Word	
  Similarity	
  (TOEFL)	
13/09/03	
 snlp#5	
  matsuda	
 25
Word	
  Similarity	
  (RG-­‐65)	
13/09/03	
 snlp#5	
  matsuda	
 26
Sense	
  Similarity	
•  Coarsening	
  WordNet	
  sense	
  inventory	
13/09/03	
 snlp#5	
  matsuda	
 27	
Note:	
  figure	
  from	
  slide	
  by	
  authors
Sense	
  Coarsing	
Onto	
  :	
  OntoNotes	
  [Hovy+,	
  2006],	
  	
  	
  SE-­‐2	
  :	
  Senseval-­‐2	
  sense	
  groping	
  set	
  [Kilgarriff,	
  2001]	
Binary	
  Classifica7on	
  (senses	
  can	
  be	
  merged	
  or	
  not?)	
  F-­‐Score	
13/09/03	
 snlp#5	
  matsuda	
 28
Conclusions	
•  Unified	
  approach	
  for	
  compu7ng	
  seman7c	
  
similarity	
  at	
  mul7ple	
  lexical	
  levels	
  
– Based	
  on	
  Random-­‐Walk	
  over	
  WordNet	
  Graph	
  
– Alignment	
  based	
  Word	
  Sense	
  Disambigua7on	
  
– Similarity	
  Measure	
  based	
  on	
  ranking	
  of	
  sense	
  
•  Achieves	
  state-­‐of-­‐the-­‐art	
  performance	
  in	
  
three	
  tasks	
  
– Similarity	
  judgment	
  tasks	
  (sense,	
  word,	
  text)	
  
13/09/03	
 snlp#5	
  matsuda	
 29
My	
  Comment	
•  	
  ☺	
  I	
  think	
  that	
  this	
  method	
  provides	
  simple	
  but	
  powerful	
  
representa7on	
  of	
  seman7cs	
  for	
  rela7vely	
  longer	
  sentence	
  and	
  
individual	
  word,	
  or	
  word	
  sense	
  
–  	
  ☺	
  As	
  a	
  result,	
  this	
  method	
  expand	
  solvable	
  type	
  of	
  STS	
  problem	
  
–  	
  ☹	
  But	
  ignore	
  sequence	
  order	
  and	
  parse	
  tree.	
  So	
  I	
  think	
  it	
  is	
  impotant	
  
for	
  represen7ng	
  short	
  phrase	
  or	
  compound.	
  
•  Actually,	
  this	
  work	
  is	
  simply	
  combined	
  method	
  of	
  Personalized	
  
PageRank-­‐based	
  WSD	
  [Agirre	
  and	
  Soroa,	
  EACL	
  2009]	
  and	
  Word-­‐
level	
  Alignment	
  for	
  Similarity	
  Calc	
  [Corley	
  and	
  Mihalcea,	
  ACL	
  2005]	
  
•  	
  ☹	
  As	
  view	
  from	
  the	
  perspec7ve	
  of	
  compo7sional	
  seman7cs,	
  I	
  think	
  
that	
  this	
  work	
  make	
  an	
  incorrect	
  assump7on.	
  
–  Let	
  S(x)	
  as	
  Seman7c	
  Signature	
  of	
  x,	
  they	
  suppose	
  S(xy)	
  ∝	
  S(x)+S(y)	
  ?	
  
•  e.g.	
  S(red	
  car)	
  ∝	
  S(red)	
  +	
  S(car)	
  	
  	
  	
  ?	
  
13/09/03	
 snlp#5	
  matsuda	
 30
Toward	
  STS	
  with	
  various	
  clues	
13/09/03	
 snlp#5	
  matsuda	
 31	
Syntax	
Word	
  Sense	
Domain	
  Knowlegde	
Surface	
Explicit	
 Implicit	
Concrete	
Abstract	
This	
  Work	
  
Composi7onal	
  Seman7cs	
Automa7c	
  Extending	
  
Lexical	
  Resoueces	
  	
Robust	
  Similarity	
  Measures	
Named	
  En7ty	
  
Linking	
  to	
  Knowledge	
  Base
頂いたコメントへの返信/その他メモ	
•  Synset間のリンクは全て用いているのか?(乾先生)	
  
–  Personalized	
  PageRank-­‐based	
  WSDの元論文[Agirre	
  and	
  Soroa,	
  09]では,すべての
rela7onを用いたと述べられている(本論文でも踏襲)	
  
–  しかし,antonymなど,単純に伝播させるべきではないリンクが存在する,というのはそ
うかもしれない	
  
•  意味をぼやかす(周囲のSynsetに伝播させる)ことで,WSDの性能が上が
るというのは一般性がある性質なのか?(乾先生)	
  
–  Knowledge-­‐based	
  WSDにおいては,知識ベースの不完全さ(スパースさ,カバレッジの
低さ)が問題になることが多く,その影響を和らげるためにソフトな情報を用いることは
よく行われている	
  
•  Word	
  to	
  Wordの場合もアラインメントを行うのか?(松原さん)	
  
–  はい,実際は語義レベルでのアラインメントを行っている(図が説明不足でした)	
  
•  アラインメントで,「最大値」をとってきている(好意的な解釈をさがす)ので,
類似度の「下限」のようなものをもとめているといえる	
  
–  多義性が問題になる場合,overes7mateすることがあるように思える	
  
•  文や単語の「ペア」に対して類似度を定義するモデルであるため,
representa7on単体で用いるのは難しい	
  
–  WordNet	
  Synsetのglossとのペアを用いるという手段はある	
13/09/03	
 snlp#5	
  matsuda	
 32

Weitere ähnliche Inhalte

Was ist angesagt?

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
INRIA-OAK
 

Was ist angesagt? (7)

WFST
WFSTWFST
WFST
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data Hub
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
 
Sequence Learning with CTC technique
Sequence Learning with CTC techniqueSequence Learning with CTC technique
Sequence Learning with CTC technique
 
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence LabelingMarek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
 

Ähnlich wie Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Similarity

Dependency Analysis of Abstract Universal Structures in Korean and English
Dependency Analysis of Abstract Universal Structures in Korean and EnglishDependency Analysis of Abstract Universal Structures in Korean and English
Dependency Analysis of Abstract Universal Structures in Korean and English
Jinho Choi
 

Ähnlich wie Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Similarity (20)

Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Regular Expressions for SEO
Regular Expressions for SEORegular Expressions for SEO
Regular Expressions for SEO
 
natural language processing
natural language processing natural language processing
natural language processing
 
Dependency Analysis of Abstract Universal Structures in Korean and English
Dependency Analysis of Abstract Universal Structures in Korean and EnglishDependency Analysis of Abstract Universal Structures in Korean and English
Dependency Analysis of Abstract Universal Structures in Korean and English
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Types of parsers
Types of parsersTypes of parsers
Types of parsers
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a CorpusSvetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
 
inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
 
Topic Segmentation in Dialogue
Topic Segmentation in DialogueTopic Segmentation in Dialogue
Topic Segmentation in Dialogue
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016Atlanta MLconf Machine Learning Conference 09-23-2016
Atlanta MLconf Machine Learning Conference 09-23-2016
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language Processing
 
AINL 2016: Maraev
AINL 2016: MaraevAINL 2016: Maraev
AINL 2016: Maraev
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet Mixture
 
Programming_Language_Syntax.ppt
Programming_Language_Syntax.pptProgramming_Language_Syntax.ppt
Programming_Language_Syntax.ppt
 

Mehr von Koji Matsuda

A Machine Learning Framework for Programming by Example
A Machine Learning Framework for Programming by ExampleA Machine Learning Framework for Programming by Example
A Machine Learning Framework for Programming by Example
Koji Matsuda
 
Information-Theoretic Metric Learning
Information-Theoretic Metric LearningInformation-Theoretic Metric Learning
Information-Theoretic Metric Learning
Koji Matsuda
 
Unified Expectation Maximization
Unified Expectation MaximizationUnified Expectation Maximization
Unified Expectation Maximization
Koji Matsuda
 
Language Models as Representations for Weakly-­Supervised NLP Tasks (CoNLL2011)
Language Models as Representations for Weakly-­Supervised NLP Tasks (CoNLL2011)Language Models as Representations for Weakly-­Supervised NLP Tasks (CoNLL2011)
Language Models as Representations for Weakly-­Supervised NLP Tasks (CoNLL2011)
Koji Matsuda
 
研究室内PRML勉強会 11章2-4節
研究室内PRML勉強会 11章2-4節研究室内PRML勉強会 11章2-4節
研究室内PRML勉強会 11章2-4節
Koji Matsuda
 
研究室内PRML勉強会 8章1節
研究室内PRML勉強会 8章1節研究室内PRML勉強会 8章1節
研究室内PRML勉強会 8章1節
Koji Matsuda
 
Word Sense Induction & Disambiguaon Using Hierarchical Random Graphs (EMNLP2010)
Word Sense Induction & Disambiguaon Using Hierarchical Random Graphs (EMNLP2010)Word Sense Induction & Disambiguaon Using Hierarchical Random Graphs (EMNLP2010)
Word Sense Induction & Disambiguaon Using Hierarchical Random Graphs (EMNLP2010)
Koji Matsuda
 
Approximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPApproximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLP
Koji Matsuda
 

Mehr von Koji Matsuda (19)

Reading Wikipedia to Answer Open-Domain Questions (ACL2017) and more...
Reading Wikipedia to Answer Open-Domain Questions (ACL2017) and more...Reading Wikipedia to Answer Open-Domain Questions (ACL2017) and more...
Reading Wikipedia to Answer Open-Domain Questions (ACL2017) and more...
 
KB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみたKB + Text => Great KB な論文を多読してみた
KB + Text => Great KB な論文を多読してみた
 
Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...
 
知識を紡ぐための言語処理と、 そのための言語資源
知識を紡ぐための言語処理と、そのための言語資源知識を紡ぐための言語処理と、そのための言語資源
知識を紡ぐための言語処理と、 そのための言語資源
 
「今日から使い切る」 ための GNU Parallel による並列処理入門
「今日から使い切る」ための GNU Parallelによる並列処理入門「今日から使い切る」ための GNU Parallelによる並列処理入門
「今日から使い切る」 ための GNU Parallel による並列処理入門
 
場所参照表現タグ付きコーパスの 構築と評価
場所参照表現タグ付きコーパスの構築と評価 場所参照表現タグ付きコーパスの構築と評価
場所参照表現タグ付きコーパスの 構築と評価
 
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介
 
いまさら聞けない “モデル” の話 @DSIRNLP#5
いまさら聞けない “モデル” の話 @DSIRNLP#5いまさら聞けない “モデル” の話 @DSIRNLP#5
いまさら聞けない “モデル” の話 @DSIRNLP#5
 
Practical recommendations for gradient-based training of deep architectures
Practical recommendations for gradient-based training of deep architecturesPractical recommendations for gradient-based training of deep architectures
Practical recommendations for gradient-based training of deep architectures
 
Joint Modeling of a Matrix with Associated Text via Latent Binary Features
Joint Modeling of a Matrix with Associated Text via Latent Binary FeaturesJoint Modeling of a Matrix with Associated Text via Latent Binary Features
Joint Modeling of a Matrix with Associated Text via Latent Binary Features
 
Vanishing Component Analysis
Vanishing Component AnalysisVanishing Component Analysis
Vanishing Component Analysis
 
A Machine Learning Framework for Programming by Example
A Machine Learning Framework for Programming by ExampleA Machine Learning Framework for Programming by Example
A Machine Learning Framework for Programming by Example
 
Information-Theoretic Metric Learning
Information-Theoretic Metric LearningInformation-Theoretic Metric Learning
Information-Theoretic Metric Learning
 
Unified Expectation Maximization
Unified Expectation MaximizationUnified Expectation Maximization
Unified Expectation Maximization
 
Language Models as Representations for Weakly-­Supervised NLP Tasks (CoNLL2011)
Language Models as Representations for Weakly-­Supervised NLP Tasks (CoNLL2011)Language Models as Representations for Weakly-­Supervised NLP Tasks (CoNLL2011)
Language Models as Representations for Weakly-­Supervised NLP Tasks (CoNLL2011)
 
研究室内PRML勉強会 11章2-4節
研究室内PRML勉強会 11章2-4節研究室内PRML勉強会 11章2-4節
研究室内PRML勉強会 11章2-4節
 
研究室内PRML勉強会 8章1節
研究室内PRML勉強会 8章1節研究室内PRML勉強会 8章1節
研究室内PRML勉強会 8章1節
 
Word Sense Induction & Disambiguaon Using Hierarchical Random Graphs (EMNLP2010)
Word Sense Induction & Disambiguaon Using Hierarchical Random Graphs (EMNLP2010)Word Sense Induction & Disambiguaon Using Hierarchical Random Graphs (EMNLP2010)
Word Sense Induction & Disambiguaon Using Hierarchical Random Graphs (EMNLP2010)
 
Approximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPApproximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLP
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Align, Disambiguate and Walk : A Unified Approach forMeasuring Semantic Similarity

  • 1. Align,  Disambiguate  and  Walk    :     A  Unified  Approach  for   Measuring  Seman7c  Similarity Mohammad  Taher  Pilehvar,  David  Jurgens  and   Roberto  Navigli   ACL  2013   最先端NLP勉強会  #5@chiba    2013/08/31   紹介者  :  Koji  Matsuda   13/09/03 snlp#5  matsuda 1 2013/09/03  改訂
  • 2. Sentence  Textual  Similarity  (STS) 13/09/03 snlp#5  matsuda 2 Measure  the  degree  of  seman7c  equivalence  between  two  sentences NOTE:  Differ  from  Textual  Entailment(TE)  and  Paraphrase(PARA)   •    TE            :    STS  assumes  symmetric  and  graded  equivalence  of  the  pair   •  PARA  :    STS  need  incorporates  graded  seman7c  similarity [Agirre+,  SemEval-­‐2012] → STS  is  more  directly  applicable  number  of  NLP  tasks   MT,  Summariza7on,  Deep  QA,  etc.
  • 3. Example •  Surface  Based  Approach  :   •  labeled  DISSIMILAR  due  to  minimal  lexical  overlap   •  Sense  Representa7on  Based  Approach:   •  enables  consider  similarity  between  meanings  of  the  word   •  (e.g.    fire  and  terminate)   •  but,  difficult  to  incorporate  those  informa7on   •  due  to  Polysemy,  Representa7on  of  individual  sense 13/09/03 snlp#5  matsuda 3
  • 4. Seman7c  Similarity  at  mul7ple  Levels Sense Sense Word Word Text Text 13/09/03 snlp#5  matsuda 4
  • 5. Seman7c  Similarity  at  mul7ple  Levels Sense Sense Word Word Text Text Seman7c   Signature Seman7c   Signature 1.  How  to  create  Seman7c  Signature?   2.  How  to  calculate  Similarity  of  Seman7c  Signatures? 13/09/03 snlp#5  matsuda 5 Unified  Seman7c  Representa7on  of   Lexical-­‐item   (arbitrarily-­‐sized  piece  of  text,  or  sense)
  • 6. Overview  of  Proposed  Method 13/09/03 snlp#5  matsuda 6 Random  Walk  over     the  WordNet  Graph Compare  Sense  Level   Seman>c  Signatures   -­‐  Cosine   -­‐  Weighted  Overlap   -­‐  Top-­‐k  Jaccard Note:  figure  from  slide  by  authors
  • 7. Seman7c  Signatures •  mul7-­‐seeded  random  walk  over  WordNet  Graph Random  walk  over   WordNet  Graph Seman7c  Signature   (mul7nomial  distribu7on   over  senses(WordNet  Synset)) Sense Word Text Set  of   Senses seeds   (v(0)) 13/09/03 snlp#5  matsuda 7
  • 8. Personalized  PageRank 13/09/03 snlp#5  matsuda 8 Yellow  Node      :  Seed  Node(Synset)   Red  Node  Size:  Probability  of  Synset   Egde                                  :  WordNet  Rela7on   Note:  figure  from  slide  by  authors
  • 9. Alignment-­‐Based  Disambigua7on •  How  to  extract  “Set  of  Senses”  (seeds)  from   Text/Word?   – Need  solve  WSD   •  They  proposed  Alignment-­‐Based  WSD   – Maximize  sum  of  similarity  between  two  text/ word   – Can  use  arbitrary  similarity  measure  over  senses   13/09/03 snlp#5  matsuda 9
  • 10. Alignment-­‐Based  Disambigua7on manager fire worker employee terminate work   boss R(man,emp) 13/09/03 snlp#5  matsuda 10 Word  Level  Alignment
  • 11. Alignment-­‐Based  Disambigua7on manager fire worker employee terminate work   boss R(man,emp) R(man,bos) R(man,ter) R(man,wor) 13/09/03 snlp#5  matsuda 11 Word  Level  Alignment ←  Maximum  Relatedness  on  Word  Level
  • 12. Alignment-­‐Based  Disambigua7on manager fire worker employee terminate work   boss R(man,bos) 13/09/03 snlp#5  matsuda 12 manager#1 manager#2 boss   #1 boss   #2 R(m#1,b#1) R(m#1,b#2) R(m#2,b#1) R(m#2,b#2) Word  Level  Alignment Sense  Level  Alignment
  • 13. Alignment-­‐Based  Disambigua7on manager fire worker employee terminate work   boss R(man,bos) 13/09/03 snlp#5  matsuda 13 manager#1 manager#2 boss   #1 boss   #2 R(m#1,b#1) R(m#1,b#2) R(m#2,b#1) R(m#2,b#2) Word  Level  Alignment Sense  Level  Alignment ↑   Maximum  Relatedness  on  Sense
  • 14. Alignment-­‐Based  Disambigua7on manager fire worker employee terminate work   boss R(man,bos) R(fir,ter) R(fir,wor) R(wor,emp) 13/09/03 snlp#5  matsuda 14 manager#1 manager#2 boss   #1 boss   #2 R(m#1,b#2) Word  Level  Alignment Sense  Level  Alignment
  • 15. Alignment-­‐Based  Disambigua7on manager fire worker employee terminate work   boss R(man,bos) R(fir,ter) R(fir,wor) R(wor,emp) 13/09/03 snlp#5  matsuda 15 manager#1 manager#2 boss   #1 boss   #2 R(m#1,b#2) Word  Level  Alignment Sense  Level  Alignment Result  :
  • 16. Seman7c  Signature  Similarity •  How  to  calculate  similarity  of  Seman7c   Signatures?   – Parametric   •  Cosine   – Non  Parametric(Rank-­‐Based)   •  Weighted  Overlap   •  Top-­‐k  Jaccard   13/09/03 snlp#5  matsuda 16 Sense          a          b        c        d        e Sense          a          b        c        d        e Compare
  • 17. Seman7c  Signature  Similarity •  Weighted  Overlap  (ADWWO) 13/09/03 snlp#5  matsuda 17 Sense          a          b        c        d        e Rank(r1)            2      4        1        0      3                    (r2)            4      1        2        5      0 •  Top-­‐k  Jaccard  (ADWJac) Sense          a          b        c        d        e Rank(r1)            2      4        1        5      3                    (r2)            4      1        2        5      3 |{a,c,e}∩  {b,c,e}|     |{a,c,e}∪{b,c,e}| Rjac  =   Rwo  = 1     (2+4)+(4+1)+(1+2) Max  when  same  sense  has  same  rank Max  when  top-­‐k  sets  has  same  senses  
  • 18. Overview  of  Proposed  Method 13/09/03 snlp#5  matsuda 18 Random  Walk  over     the  WordNet  Graph Compare  Sense  Level   Seman>c  Signatures   -­‐  Cosine   -­‐  Weighted  Overlap   -­‐  Top-­‐k  Jaccard Note:  figure  from  slide  by  authors
  • 19. Experiments •  Textual  Similarity   – SemEval-­‐2012  STS  task  [Agirre+,  SemEval2012]   •  Word  Similarity   – TOEFL  Dataset     – RG-­‐65  Dataset   •  Sense  Similarity   – Sense  Coarsening  (OntoNotes,  Senseval-­‐2)     13/09/03 snlp#5  matsuda 19
  • 20. Textual  Similarity •  SemEval  2012  STS  task  (task  17)     •  Model   –  Regression  (Gaussian  Process)   –  Features   •  Main  :  ADWcos,  ADWWO,  ADWJac(k=250,500,1000,2500)   •  String-­‐Based  :  Longest  Common  Subsequence(Substring),  Greedy  String  Tiling,  character/ word  n-­‐gram  similarity   id Sentence Score(0-­‐5) 1 The  bird  is  bathing  in  the  sink. 0 Birdie  is  washing  itself  in  the  water  basin. 2 In  May  2010,  the  troops  axempted  to  invade  Kabul. 1 The  US  army  invaded  Kabul  on  May  7th  last  year,  2010. 3 John  said  he  is  considered  a  witness  but  not  a  suspect. 2 "He  is  not  a  suspect  anymore."  John  said. 4 They  flew  out  of  the  nest  in  groups. 3 They  flew  into  the  nest  together. 400    750  pairs  *  5  Set 13/09/03 snlp#5  matsuda 20
  • 21. Textual  Similarity  Performance Table  2  :  Pearson  correla7on  coefficient 13/09/03 snlp#5  matsuda 21
  • 22. Textual  Similarity  (detail) 13/09/03 snlp#5  matsuda 22 Mpar  :    MSR  Paraphrase  Corpus  (web  news)    contain  many  named-­‐en7ty   Mvid  :      MSR  Video  Paraphrase  Corpus   SMTe  :    French  to  English  SMT  result  and  Reference  Transla7on  pair                                  from  Europerl  Corpus  [ACL  2007,  2008  SMT  Workshop]   SMTn  :    Same  as  SMTe,  but  News  conversa7on  Corpus  is  used   OnWN  :  Glosses  from  OntoNotes  and  WordNet
  • 23. Textual  Similarity  (detail) 13/09/03 snlp#5  matsuda 23 DW                      :  Without  performing  any  Alignment   ADW-­‐MF  :  Main  feature  only  (  don’t  make  use  of  string  based  feature) •  Alignment  is  helpful   •  In  Mper  dataset  (  contain  many  Named  En7ty  ),              string-­‐based  method  is  strong  baseline       improve
  • 24. Word  Similarity •  TOEFL  dataset  [Landauer  and  Dumais,  1997]   – Synonym  selec7on  task   – 80  mul7ple-­‐choice  ques7ons   •  4  choice  per  ques7on   •  RG-­‐65  dataset  [Rubenstein  amd  Goodenough,1965]     – Similarity  grading  for  word  pair   – 65  word-­‐pair     •  Judged  by  51  human  subject   – Scale  0  -­‐  4   13/09/03 snlp#5  matsuda 24 Note:  figure  from  slide  by  authors
  • 27. Sense  Similarity •  Coarsening  WordNet  sense  inventory 13/09/03 snlp#5  matsuda 27 Note:  figure  from  slide  by  authors
  • 28. Sense  Coarsing Onto  :  OntoNotes  [Hovy+,  2006],      SE-­‐2  :  Senseval-­‐2  sense  groping  set  [Kilgarriff,  2001] Binary  Classifica7on  (senses  can  be  merged  or  not?)  F-­‐Score 13/09/03 snlp#5  matsuda 28
  • 29. Conclusions •  Unified  approach  for  compu7ng  seman7c   similarity  at  mul7ple  lexical  levels   – Based  on  Random-­‐Walk  over  WordNet  Graph   – Alignment  based  Word  Sense  Disambigua7on   – Similarity  Measure  based  on  ranking  of  sense   •  Achieves  state-­‐of-­‐the-­‐art  performance  in   three  tasks   – Similarity  judgment  tasks  (sense,  word,  text)   13/09/03 snlp#5  matsuda 29
  • 30. My  Comment •   ☺  I  think  that  this  method  provides  simple  but  powerful   representa7on  of  seman7cs  for  rela7vely  longer  sentence  and   individual  word,  or  word  sense   –   ☺  As  a  result,  this  method  expand  solvable  type  of  STS  problem   –   ☹  But  ignore  sequence  order  and  parse  tree.  So  I  think  it  is  impotant   for  represen7ng  short  phrase  or  compound.   •  Actually,  this  work  is  simply  combined  method  of  Personalized   PageRank-­‐based  WSD  [Agirre  and  Soroa,  EACL  2009]  and  Word-­‐ level  Alignment  for  Similarity  Calc  [Corley  and  Mihalcea,  ACL  2005]   •   ☹  As  view  from  the  perspec7ve  of  compo7sional  seman7cs,  I  think   that  this  work  make  an  incorrect  assump7on.   –  Let  S(x)  as  Seman7c  Signature  of  x,  they  suppose  S(xy)  ∝  S(x)+S(y)  ?   •  e.g.  S(red  car)  ∝  S(red)  +  S(car)        ?   13/09/03 snlp#5  matsuda 30
  • 31. Toward  STS  with  various  clues 13/09/03 snlp#5  matsuda 31 Syntax Word  Sense Domain  Knowlegde Surface Explicit Implicit Concrete Abstract This  Work   Composi7onal  Seman7cs Automa7c  Extending   Lexical  Resoueces   Robust  Similarity  Measures Named  En7ty   Linking  to  Knowledge  Base
  • 32. 頂いたコメントへの返信/その他メモ •  Synset間のリンクは全て用いているのか?(乾先生)   –  Personalized  PageRank-­‐based  WSDの元論文[Agirre  and  Soroa,  09]では,すべての rela7onを用いたと述べられている(本論文でも踏襲)   –  しかし,antonymなど,単純に伝播させるべきではないリンクが存在する,というのはそ うかもしれない   •  意味をぼやかす(周囲のSynsetに伝播させる)ことで,WSDの性能が上が るというのは一般性がある性質なのか?(乾先生)   –  Knowledge-­‐based  WSDにおいては,知識ベースの不完全さ(スパースさ,カバレッジの 低さ)が問題になることが多く,その影響を和らげるためにソフトな情報を用いることは よく行われている   •  Word  to  Wordの場合もアラインメントを行うのか?(松原さん)   –  はい,実際は語義レベルでのアラインメントを行っている(図が説明不足でした)   •  アラインメントで,「最大値」をとってきている(好意的な解釈をさがす)ので, 類似度の「下限」のようなものをもとめているといえる   –  多義性が問題になる場合,overes7mateすることがあるように思える   •  文や単語の「ペア」に対して類似度を定義するモデルであるため, representa7on単体で用いるのは難しい   –  WordNet  Synsetのglossとのペアを用いるという手段はある 13/09/03 snlp#5  matsuda 32