Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

word2vec - From theory to practice

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 57 Anzeige

word2vec - From theory to practice

Herunterladen, um offline zu lesen

My talk at the Stockholm Natural Language Processing Meetup. I explained how word2vec is implemented and how to use it in Python with gensim. When words are represented as points in space, the spatial distance between words describes a similarity between these words. In this talk, I explore how to use this in practice and how to visualize the results (using t-SNE)

My talk at the Stockholm Natural Language Processing Meetup. I explained how word2vec is implemented and how to use it in Python with gensim. When words are represented as points in space, the spatial distance between words describes a similarity between these words. In this talk, I explore how to use this in practice and how to visualize the results (using t-SNE)

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (20)

Anzeige

Ähnlich wie word2vec - From theory to practice (20)

Anzeige

Aktuellste (20)

word2vec - From theory to practice

  1. 1. Hendrik Heuer Stockholm NLP Meetup word2vec 
 From theory to practice
  2. 2. About me Hendrik Heuer hendrikheuer@gmail.com http://hen-drik.de @hen_drik
  3. 3. –J. R. Firth 1957 “You shall know a word 
 by the company it keeps”
  4. 4. –J. R. Firth 1957 “You shall know a word 
 by the company it keeps” Quoted after Socher
  5. 5. Quoted after Socher
  6. 6. Quoted after Socher
  7. 7. Vectors are directions in space Vectors can encode relationships
  8. 8. man is to woman as king is to ?
  9. 9. word2vec • by Mikolov, Sutskever, Chen, 
 Corrado and Dean at Google • NAACL 2013 • takes a text corpus as input 
 and produces the 
 word vectors as output
  10. 10. Sweden Similar words
  11. 11. Harvard Similar words
  12. 12. word2vec • word meaning and relationships between words are encoded spatially • two main learning algorithms in word2vec: 
 continuous bag-of-words and 
 continuous skip-gram
  13. 13. Goal
  14. 14. continuous bag-of-words • predicting the current word based on the context • order of words in the history does not influence the projection • faster & more appropriate for larger corpora Mikolov et al.
  15. 15. continuous skip-gram • maximize classification of a word based on another word in the same sentence • better word vectors for frequent words, but slower to train Mikolov et al.
  16. 16. Why it is awesome • there is a fast open-source implementation • can be used as features 
 for natural language processing tasks and machine learning algorithms
  17. 17. Machine Translation Quoted after Mikolov
  18. 18. Sentiment Analysis Quoted after SocherRecursive Neural Tensor Network
  19. 19. Image Descriptions Quoted after Vinyals
  20. 20. Using word2vec • Original: http://word2vec.googlecode.com/svn/trunk/ • C++11 version: https://github.com/jdeng/word2vec • Python: http://radimrehurek.com/gensim/models/ word2vec.html • Java: https://github.com/ansjsun/word2vec_java • Parallel java: https://github.com/siegfang/word2vec • CUDAversion: https://github.com/whatupbiatch/cuda-word2vec Quoted after Wang
  21. 21. Using it in Python
  22. 22. Usage
  23. 23. Training a model Quoted after Řehůřek
  24. 24. Training a model with iterator Quoted after Řehůřek
  25. 25. Doing it in C • Download the code: 
 git clone https://github.com/h10r/word2vec-macosx-maverics.git ! • Run 'make' to compile word2vec tool ! • Run the demo scripts: 
 
 ./demo-word.sh 
 
 and 
 
 ./demo-phrases.sh
  26. 26. Doing it in C • Download the code: 
 git clone https://github.com/h10r/word2vec-macosx-maverics.git ! • Run 'make' to compile word2vec tool ! • Run the demo scripts: 
 
 ./demo-word.sh 
 
 and 
 
 ./demo-phrases.sh
  27. 27. Testing a model Quoted after Řehůřek
  28. 28. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  29. 29. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  30. 30. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  31. 31. Should be in Macports py27-scikit-learn @0.15.2 (python, science)
  32. 32. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  33. 33. word2vec 
 From theory to practice Hendrik Heuer Stockholm NLP Meetup ! Discussion: Can anybody here think of ways this might help her or him?
  34. 34. Further Reading • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. • Tomas Mikolov, Wen-tau Yih,and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.
  35. 35. Further Reading • Richard Socher - Deep Learning for NLP (without Magic)
 http://lxmls.it.pt/2014/?page_id=5 • Wang - Introduction to Word2vec and its application to find predominant word senses
 http://compling.hss.ntu.edu.sg/courses/hg7017/pdf/ word2vec%20and%20its%20application%20to%20wsd.pdf • Exploiting Similarities among Languages for Machine Translation, Tomas Mikolov, Quoc V. Le, Ilya Sutskever, http://arxiv.org/abs/1309.4168 • Title Image by Hans Arp
  36. 36. Further Coding • word2vec
 https://code.google.com/p/word2vec/ • word2vec for MacOSX Maverics
 https://github.com/h10r/word2vec-macosx-maverics • Gensim Python Library
 https://radimrehurek.com/gensim/index.html • Gensim Tutorials
 https://radimrehurek.com/gensim/tutorial.html • Scikit-Learn TSNE
 http://scikit-learn.org/stable/modules/generated/ sklearn.manifold.TSNE.html
  37. 37. Hendrik Heuer Stockholm NLP Meetup word2vec 
 From theory to practice

×