Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction

171 Aufrufe

Veröffentlicht am

Presentation at the AIST'17 conference by Dmitry Ustalov. Authors of the original paper: Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, Alexander Panchenko.

Veröffentlicht in: Wissenschaft
  • Login to see the comments

  • Gehören Sie zu den Ersten, denen das gefällt!

Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction

  1. 1. Fighting with Sparsity Ustalov D.A. et al. Fighting with Sparsity of the Synonymy Dictionaries for Automatic Synset Induction Dmitry Ustalov, Mikhail Chernoskutov Ural Federal University Chris Biemann, Alexander Panchenko Universität Hamburg
  2. 2. Fighting with Sparsity Ustalov D.A. et al. Outline •Introduction •The Problem •The Approaches •Evaluation •Discussion •Conclusion 2
  3. 3. Fighting with Sparsity Ustalov D.A. et al. Introduction •Synset Induction is an unsupervised task of discovering synsets in a synonymy graph. •Notable Methods: • MaxMax (Hope & Keller, 2013), • ECO (Gonçalo-Oliveira & Gomes, 2014), • WATSET (Ustalov et al., 2017) ← SOTA. •See the survey in our paper. 3
  4. 4. Fighting with Sparsity Ustalov D.A. et al. The Problem •A synonymy graph contains densely connected subgraphs. •These subgraphs correspond to the synsets. •The synonymy dictionaries are not perfect. •Sometimes they have missing edges. 4
  5. 5. Fighting with Sparsity Ustalov D.A. et al. “As Is” “To Be” 5 The Intuition
  6. 6. Fighting with Sparsity Ustalov D.A. et al. The Approaches •We propose two approaches for reducing graph sparseness by adding potentially pertinent edges. • Synonymy Relation Transitivity (A1) • Similar Synset Merging (A2) •We also evaluate them on two lexical semantic resources for Russian: RuWordNet and YARN. 6
  7. 7. Fighting with Sparsity Ustalov D.A. et al. •Synonymy is an equivalence relation: • reflexiveness, symmetry, transitivity. •We assume that if an edge is missing, the graph still contains several relatively short paths between the synonymous words. •This approach is designed to be executed before the synset induction. 7 A1: Synonymy Transitivity
  8. 8. Fighting with Sparsity Ustalov D.A. et al. A1: Synonymy Transitivity •For each vertex, extract its 2nd order ego network. • Compute the set of candidate edges by connecting the disconnected nodes. • Compute the number of paths between the nodes in candidate edges. • Add an edge iff there exist at least k paths of lengths [i; j]. •Then, the augmented graph is passed to synset induction. 8
  9. 9. Fighting with Sparsity Ustalov D.A. et al. A2: Synset Merging •A similarity measure can be computed between two vectors. • Think of synset embeddings. •We assume that if two synsets are really similar, then they can be merged. •This approach is designed to be executed after the synset induction. 9
  10. 10. Fighting with Sparsity Ustalov D.A. et al. A2: Synset Merging •Obtain synset embeddings using SenseGram (Pelevina et al., 2016). • Just average the word vectors in synsets. •Identify the closely related synsets using m-kNN algorithm (Panchenko et al., 2012). •Merge the t closely related synsets. • The smallest are merged first. 10
  11. 11. Fighting with Sparsity Ustalov D.A. et al. Evaluation •We use WATSET, a soft clustering algorithm for undirected graphs. •WATSET shows SOTA results on synset induction. 11 Ustalov D., Panchenko A., Biemann C. Watset: Automatic Induction of Synsets from a Graph of Synonyms. In: Proc. ACL 2017.
  12. 12. Fighting with Sparsity Ustalov D.A. et al. Evaluation: Measure & Data •Measure: paired precision and recall. •Gold standard: RuWordNet and YARN. •The input graph: Wiktionary + Abramov + UNLDC. •Word vectors are from RDT. 12
  13. 13. Fighting with Sparsity Ustalov D.A. et al. RuWordNet YARN 13 Evaluation: Results Input Graph Synonymy Transitivity Synset Merging
  14. 14. Fighting with Sparsity Ustalov D.A. et al. Evaluation: Results •Obviously, the transitivity approach shown virtually no improvement. •The merging approach substantially increased the recall. •Both methods trade off gains in recall for the drops in precision. 14
  15. 15. Fighting with Sparsity Ustalov D.A. et al. Discussion •Transitivity. No word is a perfect synonym of another. The communities with the new edges become bigger. •Merging. Distributional semantic models tend to connect co-hyponyms instead of synonyms. •Alternatives. Structural Heuristics? Hearst Patterns? Anaphora Resolution? Crowdsourcing? 15
  16. 16. Fighting with Sparsity Ustalov D.A. et al. Conclusion •We fought with sparsity of the synonymy dictionaries using two approaches. • Only synset merging won. •Synset embeddings are easy to obtain. They also show better results on such a challenging task. • Just average the word vectors and compute similarity. 16
  17. 17. Fighting with Sparsity Ustalov D.A. et al. Thank You! • Dmitry Ustalov dmitry.ustalov@gmail.com • nlpub.ru/Watset • nlpub.ru/RDT Join SIGSLAV, an ACL SIG on Slavic languages! sigslav.cs.helsinki.fi We acknowledge the support of the Deutsche Forschungsgemeinschaft (DFG) foundation under the “JOIN-T” project, the DAAD, the RFBR under the projects no. 16-37-00203 мол_а and no. 16- 37-00354 мол_а, and the RFH under the project no. 16-04-12019. The calculations were carried out using the supercomputer “Uran” at the Krasovskii Institute of Mathematics and Mechanics. We also thank four anonymous reviewers for their helpful comments.

×