Weitere ähnliche Inhalte Mehr von Sebastian Ruder (20) Kürzlich hochgeladen (20) Softmax Approximations for Learning Word Embeddings and Language Modeling (Sebastian Ruder)20. Softmax Ap-
proximations
Sebastian
Ruder
Softmax
Softmax-based
Approaches
Hierarchial
Softmax
Differentiated
Softmax
CNN-Softmax
Sampling-
based
Approaches
Margin-based
Hinge Loss
Noise
Contrastive
Estimation
Negative
Sampling
Bibliography
Bibliography I
[Bengio et al., 2003] Bengio, Y., Ducharme, R., Vincent, P.,
and Janvin, C. (2003).
A Neural Probabilistic Language Model.
The Journal of Machine Learning Research, 3:1137–1155.
[Chen et al., 2015] Chen, W., Grangier, D., and Auli, M.
(2015).
Strategies for Training Large Vocabulary Neural Language
Models.
[Collobert et al., 2011] Collobert, R., Weston, J., Bottou, L.,
Karlen, M., Kavukcuoglu, K., and Kuksa, P. (2011).
Natural Language Processing (almost) from Scratch.
Journal of Machine Learning Research, 12(Aug):2493–2537.
21. Softmax Ap-
proximations
Sebastian
Ruder
Softmax
Softmax-based
Approaches
Hierarchial
Softmax
Differentiated
Softmax
CNN-Softmax
Sampling-
based
Approaches
Margin-based
Hinge Loss
Noise
Contrastive
Estimation
Negative
Sampling
Bibliography
Bibliography II
[Jozefowicz et al., 2016] Jozefowicz, R., Vinyals, O., Schuster,
M., Shazeer, N., and Wu, Y. (2016).
Exploring the Limits of Language Modeling.
[Mikolov et al., 2013] Mikolov, T., Chen, K., Corrado, G., and
Dean, J. (2013).
Distributed Representations of Words and Phrases and their
Compositionality.
NIPS, pages 1–9.
[Mnih and Hinton, 2008] Mnih, A. and Hinton, G. E. (2008).
A Scalable Hierarchical Distributed Language Model.
Advances in Neural Information Processing Systems, pages
1–8.