2. Overview
â Natural language processing
â Word representations
â Statistical language modeling
â Neural models
â Recurrent neural network models
â Long short-term memory rnn models
3. Natural language
processing
â NLP mostly works with text data (but its methods could be applied
to music, bioinformatic, speech etc.)
â From the perspective of machine learning NL is a sequence of
variable-length sequences of high-dimensional vectors.
9. Distributional
representation
Is there a representation that preserves the similarities of word meanings?
d(v(zebra), v(horse)) < d(v(zebra), v(summer))
âYou shall know a word by the company it keepsâ -
John Rupert Firth
20. n-gram models
n-th order Markov assumption
bigram model of s = (a, cute, wampimuk, is, on, the, tree, .)
1. How likely does âaâ follow â<S>â?
2. How likely does âcuteâ follow âaâ?
3. How likely does âwampimukâ follow âcuteâ?
4. How likely does âisâ follow âwampimukâ?
5. How likely does âonâ follow âisâ?
6. How likely does âtheâ follow âonâ?
7. How likely does âtreeâ follow âtheâ?
8. How likely does â.â follow âtreeâ?
9. How likely does â<S>â follow â.â?
21. n-gram models
n-th order Markov assumption
bigram model of s = (a, cute, wampimuk, is, on, the, tree, .)
the counts are obtained from a training corpus
57. Conclusion
CS224d: Deep Learning for Natural Language Processing
cs224d.stanford.edu
â Neural methods provide us with a powerful set of tools for
embedding language.
â They provide better ways of tying language learning to extra-
linguistic contexts (images, knowledge-bases, cross-lingual data).