1. Retrofitting Word Vectors to
Semantic Lexicons
Manaal Faruqui, Jese Dodge, Sujay K. Jauhar,
Chris Dyer, Eduard Hovy, Noah A. Smith
NACL 2015
読む人:高瀬翔
知識獲得研究会2015/4/21
1
5. 提案手法
• やりたいことは2つ
– コーパスから得たベクトル(入力)と似たベクトルとする
– 外部知識上で関連する単語は似たベクトルとする
• 関連:同義語,上位下位語,言い換え
• 目的関数
– 似せたいベクトル間のユークリッド距離を最小化
• 一項目:コーパスの情報(入力ベクトルに近づける)
• 二項目:外部知識(外部知識上での関連語に近づける)
– E:外部知識上で関連している単語間に張ったエッジの集合
– α,β:ハイパーパラメータ(α=1,β=1 / エッジの次数)
5
en related words
inferred (white)
method works
ord vector mod-
tors to beretrofitted (and correspond to V⌦); shaded
nodes are labeled with the corresponding vectors in
ˆQ, which areobserved. Thegraph can beinterpreted
as a Markov random field (Kindermann and Snell,
1980).
The distance between a pair of vectors is defined
to be the Euclidean distance. Since we want the
inferred word vector to be close to the observed
value ˆqi and close to its neighbors qj , 8j such that
(i, j ) 2 E, theobjectiveto beminimized becomes:
(Q) =
nX
i= 1
2
4↵i kqi − ˆqi k2
+
X
(i,j )2E
βij kqi − qj k2
3
5
where ↵ and β values control the relative strengths
of associations (moredetails in §6.1).
コーパスから得たベクトル(入力)
改良後のベクトル
6. 解き方
• 反復更新で解を求める
– 各 qi について,目的関数を最小化する値への更
新を繰り返す
– qi は入力ベクトルで初期化
• 経験的には10回の反復で近づけたいベクトル
間のユークリッド距離は0.01未満になる
6
orma-
o mul-
gives
valua-
engths
ect of
fitting
com/
s
heset
desse-
resent
ex for
V ⇥ V
lution can be found by solving a system of linear
equations. To do so, we use an efficient iterative
updating method (Bengio et al., 2006; Subramanya
et al., 2010; Das and Petrov, 2011; Das and Smith,
2011). The vectors in Q are initialized to be equal
to thevectorsin ˆQ. Wetakethefirst derivativeof
with respect to one qi vector, and by equating it to
zero arriveat thefollowing onlineupdate:
qi =
P
j :(i,j )2E βij qj + ↵i ˆqi
P
j :(i,j )2E βij + ↵i
(1)
In practice, running this procedure for 10 iterations
converges to changes in Euclidean distance of ad-
jacent vertices of less than 10− 2. The retrofitting
approach described above is modular; it can be ap-
plied to word vector representations obtained from
更新式: