15. 2 important papers
1. “Efficient Estimation of Word Representations in
Vector Space”
2. “Distributed Representations of Words and
Phrases and their Compositionality”
(這兩篇之前的 “Linguistic Regularities in Continuous
Space Word Representations” 也值得一看)
42. Why MAGIC??
● 就算 A ≈ a , B ≈ b
○ 所以 B - A ≈ b - a
● 但這不代表 B 會是最靠近 b - a + A 的那個?
○ 因為我可能還有 ㄅ ≈ B ≈ b
● 更不用說 A ≈ a , B ≈ b,不代表 A - a 的方向性和
B - b 的方向性
○ B - A ≈ b - a 不見得代表 B - b ≈ A - a
A
a
B
b
55. Evaluation in the papers
● big : biggest = small : ???
● France : Paris = Germany : ???
● Accuracy and training time across
○ vec dimensionality
○ Training corpus size
● I have a ???
○ a) apple b) pen c) applepen
○ 用 skip-gram network 本身預測
64. Negative sampling
● back propagation 時不用計算所有 V 個 softmax
● 取而代之,當做在解分類問題
○ word(o) vs. 其他隨機取的字
● k: 5~20 for small data; 2~5 for big data (reported
in Mikolov’s paper)