SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
深層意味表現学習
ボレガラ ダヌシカ
英国リバープール大学 准教授
単語自身,意味を持っているか
無いよ.
周辺に現れる単語によって決まるだけ
J. R. Firth 1957
Image credit: www.odlt.org 2
“You shall know a word by
the company it keeps”
Quiz
•X は持ち歩く可能で,相手と通信ができて,ネッ
トも見れて,便利だ.X は次の内どれ?
•犬
•飛行機
•iPhone
•バナナ
3
でもそれは本当?
•だって辞書は単語の意味を定義しているじゃないか
•辞書も他の単語との関係を述べることで単語の意味を説明
している.
•膨大なコーパスがあれば周辺単語を集めてくるだけで単語の
意味表現が作れるので自然言語処理屋には嬉しい.
•practicalな意味表現手法
•色んなタスクに応用して成功しているので意味表現として(定
量的に)は正しい
•単語の意味はタスクに依存する?
•どのタスクが良くて,どのタスクがダメなのか?
4
意味表現構築手法
•分布的意味表現
•Distributional Semantic Representations
•単語xをコーパス中でその周辺に現れる全ての単語との共起頻度分布を持っ
て表す.
•高次元,スパース
•古典的なアプローチ
•分散的意味表現
•Distributed Semantic Representations
•有数(10 1000)の次元/分布/クラスターの組み合わせ/混合として単語xの
意味を表す.
•低次元,密
•深層学習/表現学習ブームで最近人気
5
意味表現を作るアプローチ
•分布的意味表現
•Distributional Semantic Representations
•単語xをコーパス中でその周辺に現れる全ての単語の共起頻度分布を持って
表す.
•高次元,スパース
•古典的なアプローチ
•分散的意味表現
•Distributed Semantic Representations
•有数(10 1000)の次元/分布/クラスターの組み合わせ/混合として単語xの
意味を表す.
•低次元,密
•深層学習/表現学習ブームで最近人気
6
分布的意味表現構築
•「リンゴ」の単語の意味表現を作りなさい.
•S1=リンゴは赤い.
•S2=リンゴは美味しい.
•S3=青森県はリンゴの生産地として有名である.
7
分布的意味表現構築
•「リンゴ」の単語の意味表現を作りなさい.
•S1=リンゴは赤い.
•S2=赤いリンゴは美味しい.
•S3=青森県はリンゴの生産地として有名である.
リンゴ=[(赤い,2),(美味しい,1),(青森県,1),(生産 地,1),(有名,1)]
8
応用例:意味的類似性計測
•「リンゴ」と「みかん」の意味的類似性を計測したい.
•まず,「みかん」の意味表現を作ってみる.
•S4=みかんはオレンジ色.
•S5=みかんは美味しい.
•S6=兵庫県はみかんの生産地として有名である.
9
みかん=[(オレンジ色,1),(美味しい,1),(兵庫県,1),(生産 地,1),(有名,1)]
「リンゴ」と「みかん」
10
リンゴ=[(赤い,2),(美味しい,1),(青森県,1),(生産 地,1),(有名,1)]
みかん=[(オレンジ色,1),(美味しい,1),(兵庫県,1),(生産 地,1),(有名,1)]
両方の単語に対し,「美味しい」,「生産地」,「有名」と
いった共通な文脈語があるので「リンゴ」と「みかん」はかなり
意味的に似ているといえる.
定量的に比較したければ集合同士の重なりとしてみれば良い
Jaccard係数 = ¦リンゴ AND みかん¦ / ¦リンゴ OR みかん¦
¦リンゴ AND みかん¦ = ¦{美味しい,生産地,有名}¦ = 3
¦リンゴ OR みかん¦ =¦{赤い,美味しい,青森県,生産地,有名,オレンジ色,兵庫県}¦ = 7
sim(リンゴ,みかん) = 3/7 = 0.4285
細かい工夫が多数
•文脈として何を選ぶか
•文全体 (sentence-level co-occurrences)
•前後のn単語 (proximity window)
•係り受け関係にある単語 (dependencies)
•文脈の距離によって重みをつける.
•遠ければその共起の重みを距離分だけ減らす
•などなど
11
意味表現を作るアプローチ
•分布的意味表現
•Distributional Semantic Representations
•単語xをコーパス中でその周辺に現れる全ての単語の共起頻度分布を持って
表す.
•高次元,スパース
•古典的なアプローチ
•分散的意味表現
•Distributed Semantic Representations
•有数(10 1000)の次元/分布/クラスターの組み合わせ/混合として単語xの
意味を表す.
•低次元,密
•深層学習/表現学習ブームで最近人気
12
局所的表現 vs. 分散表現
13
•  Clustering,!NearestJ
Neighbors,!RBF!SVMs,!local!
nonJparametric!density!
es>ma>on!&!predic>on,!
decision!trees,!etc.!
•  Parameters!for!each!
dis>nguishable!region!
•  #!dis>nguishable!regions!
linear!in!#!parameters!
#2 The need for distributed
representations
Clustering!
16!
•  Factor!models,!PCA,!RBMs,!
Neural!Nets,!Sparse!Coding,!
Deep!Learning,!etc.!
•  Each!parameter!influences!
many!regions,!not!just!local!
neighbors!
•  #!dis>nguishable!regions!
grows!almost!exponen>ally!
with!#!parameters!
•  GENERALIZE+NON5LOCALLY+
TO+NEVER5SEEN+REGIONS+
#2 The need for distributed
representations
Mul>J!
Clustering!
17!
C1! C2! C3!
input!
ある点のラベルを決める
ときに近隣する数個の点
しか関与しない.
3個のパーテションで
8個の領域が定義される.
(2nの表現能力)
slide credit: Yoshua Bengio
skip-gramモデル
14
私はみそ汁とご飯を頂いた
skip-gramモデル
15
私はみそ汁とご飯を頂いた
私 は みそ汁 と ご飯 を 頂いた
形態素解析
各単語に対してd次元のベクトルが2個割り当てられている
単語xが意味表現学習対象となる場合のベクトルを対象語ベクトル
v(x)といい,赤で示す.
xの周辺で現れる文脈単語cを文脈語ベクトルv(c)で表し,青で示す.
skip-gramモデル
16
私はみそ汁とご飯を頂いた
私 は みそ汁 と ご飯 を 頂いた
形態素解析
v(x) v(c)
例えば「みそ汁」の周辺で「ご飯」が出現するかどうか
を予測する問題を考えよう.
skip-gramモデル
17
私はみそ汁とご飯を頂いた
私 は みそ汁 と ? を 頂いた
形態素解析
v(x) v(c)
c=ご飯, c =ケーキ とすると (x=みそ汁, c=ご飯)という
組み合わせの方が,(x=みそ汁, c =ケーキ)より日本語として
もっともらしいという「意味」を反映させたv(x), v(c), v(c )
を学習したい.
skip-gramモデル
18
私はみそ汁とご飯を頂いた
私 は みそ汁 と ? を 頂いた
形態素解析
v(x) v(c)
提案1 この尤もらしさをベクトルの内積で定義しましょう.
score(x,c) = v(x)Tv(c)
skip-gramモデル
19
私はみそ汁とご飯を頂いた
私 は みそ汁 と ? を 頂いた
形態素解析
v(x) v(c)
提案2 しかし,内積は(- ,+ )の値であり,正規化されていない
ので,都合が悪い.全ての文脈単語c に関するスコアで
割ることで確率にできる.
対数双線型
•log-bilinear model
20
xの周辺でcが
出現する確率
p(c|x) =
exp(v(x)>
v(c))
P
c02V exp(v(x)>v(c0))
xとcの共起しやすさ
語彙集合(V)に含まれる全ての単語
c とxが共起しやすさ
[Mnih+Hinton ICML’07]
何が凄いか
•skip-gramで学習した単語の意味表現ベクトルを2
次元で可視化すると
•v(king) - v(man) + v(woman) v(queen)
21
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Country and Capital Vectors Projected by PCA
China
Japan
France
Russia
Germany
Italy
Spain
Greece
Turkey
Beijing
Paris
Tokyo
Poland
Moscow
Portugal
Berlin
Rome
Athens
Madrid
Ankara
Warsaw
Lisbon
Figure 2: Two-dimensional PCA projection of the 1000-dimensional Skip-gram vectors of countries and their
capital cities. The figure illustrates ability of the model to automatically organize concepts and learn implicitly
the relationships between them, as during the training we did not provide any supervised information about
what a capital city means.
我々の研究成果
22
単語の意味は一意ではない
•同じ単語でも使う場面において異なる意味を表
すことがある.
•軽いノートPC (+)  vs. 軽い男/女 (-)
•同じ単語に対し,複数の意味表現を学習しなけ
ればならない.[Neelakantan+ EMNLP-14]
•ある分野(ドメイン)で良く使われる意味を正
確に予測しなければならない
•意味表現の分野適応 [Bollegala+ ACL-15]
23
ピボット (pivots)
•異なるドメインで似たような意味を持つ単語(意味普
遍な単語/semantic invariant)
•値段,形,安い,高い (excellent, cheap, digital)
•ピボットに関してはそれぞれのドメインにおける意味
表現が近くなって欲しい.
•そうでない(non-pivot)単語に関しては,それぞれのド
メインでピボットを予測できるようになって欲しい.
•イメージ:ピボットを介して,異なるドメインが近く
になる.
24
損失関数
•ranked hinge lossで損失を計測する.[Collobert + Weston ICML08]
•あるレビュー(口コミ)d中で出現しているpivotを使ってdに含
まれているnon-pivotの予測スコアがd中に出現していない
non-pivotより高くなるように意味表現を調整する.
25
ply the
senti-
How-
iment-
n. De-
g from
e sub-
well as
repre-
ty. Al-
of do-
can be
senta-
, prior
show
ns im-
a tar-
et al.,
boundaries. The notation (c, w) 2 d denotes the
co-occurrence of a pivot c and a non-pivot w in a
document d.
We learn domain-specific word representations
by maximizing the prediction accuracy of the non-
pivots w that occur in the local context of a pivot
c. The hinge loss, L(CS, WS), associated with
predicting a non-pivot w in a source document
d 2 DS that co-occurs with pivots c is given by
X
d2DS
X
(c,w)2d
X
w⇤⇠p(w)
max
⇣
0, 1 cS
>
wS + cS
>
w⇤
S
⌘
.
(1)
Here, w⇤
S is the source domain representation of
a non-pivot w⇤ that does not occur in d. The loss
function given by Eq. 1 requires that a non-pivot
w that co-occurs with a pivot c in the document
d is assigned a higher ranking score as measured
by the inner-product between cS and wS than a
non-pivot w⇤ that does not occur in d. We ran-
domly sample k non-pivots from the set of all
sourceドメインで
pivot, cの意味表現
sourceドメインnon-pivot,
w,w*の意味表現. w∈d,
w*∉d
全体のロス関数
26
L(CS, WS) =
X
d2DS
X
(c,w)2d
X
w⇤⇠p(w)
max 0, 1 cS
>
wS + cS
>
w⇤
S
L(CT , WT ) =
X
d2DT
X
(c,w)2d
X
w⇤⇠p(w)
max 0, 1 cT
>
wT + cT
>
w⇤
T .
, w⇤ denotes target domain non-pivots that
ot occur in d, and are randomly sampled
p(w) following the same procedure as in the
ce domain.
e source and target loss functions given re-
ively by Eqs. 1 and 2 can be used on their own
dependently learn source and target domain
representations. However, by definition, piv-
re common to both domains. We use this
erty to relate the source and target word repre-
tions via a pivot-regularizer, R(CS, CT ), de-
as
R(CS , CT ) =
1
2
KX
i=1
||c
(i)
S c
(i)
T ||
2
. (3)
, ||x|| represents the L2 norm of a vector x,
c(i) is the i-th pivot in a total collection of K
s. Word representations for non-pivots in the
ce and target domains are linked via the pivot
@L
@cT
=
(cT cS )
w⇤
T wT + (cT c
Here, for simplicity, we drop
the loss function and write
batch stochastic gradient des
of 50 instances. Adaptive g
2011) is used to schedule t
word representations are init
sional random vectors samp
and unit variance Gaussian.
tive in Eq. 4 is not jointly c
resentations, it is convex w.
of a particular feature (pivo
the representations for all t
held fixed. In our experime
verged in all cases with less
the dataset.
S T
ned as
R(CS , CT ) =
1
2
KX
i=1
||c
(i)
S c
(i)
T ||
2
. (3)
ere, ||x|| represents the L2 norm of a vector x,
nd c(i) is the i-th pivot in a total collection of K
vots. Word representations for non-pivots in the
urce and target domains are linked via the pivot
gularizer because, the non-pivots in each domain
e predicted using the word representations for
e pivots in each domain, which in turn are reg-
arized by Eq. 3. The overall objective function,
(CS, WS, CT , WT ), we minimize is the sum1 of
e source and target loss functions, regularized
a Eq. 3 with coefficient , and is given by
L(CS , WS , ) + L(CT , WT ) + R(CS , CT ). (4)
3 Training
and unit variance Gauss
tive in Eq. 4 is not joint
resentations, it is convex
of a particular feature (
the representations for
held fixed. In our exper
verged in all cases with
the dataset.
The rank-based predic
inspired by the prior w
tions learning for a sin
al., 2011). However, u
ral network in Collober
posed method uses a com
gle layer to reduce the n
must be learnt, thereby
Similar to the skip-gram
2013a), the proposed me
27
E−>B D−>B K−>B
55
60
65
70
75
80
Accuracy
B−>E D−>E K−>E
50
55
60
65
70
75
80
85
Accuracy
B−>D E−>D K−>D
55
60
65
70
75
80
Accuracy
NA GloVe SFA SCL CS Proposed
B−>K E−>K D−>K
50
60
70
80
90
Accuracy
Figure 1: Accuracies obtained by different methods for each source-target pair in cross-domain sentiment classification.
differences reported in Figure 1 can be directly
attributable to the domain adaptation, or word-
representation learning methods compared. All
methods use L2 regularized logistic regression as
the binary sentiment classifier, and the regulariza-
tion coefficients are set to their optimal values on
正しい意味表現を使い分けることで
評判分類の性能があがる!
単語間の関係の表現学習
•2つの単語の間に成立つ関係をどのように表現できるか.
[Bollegala+ AAAI-15]
•単語はベクトルで表現できるなら2つの単語の間の関係が行列で
表現できるはず.
•この「関係行列」はそれぞれの単語の意味表現からそれらの間の
関係に寄与する属性のみを選択するものと解釈できる.
28
男
女
王
水
配
男
女
王
水
配
king queen
0 1 1 0 1
1 0 1 0 1
1 1 1 0 1
0 0 0 0 0
1 1 1 0 1
学習手法
29
cates co-
ediction-
However,
n learn-
nces be-
that ex-
context.
ased ap-
d by de-
ating se-
i 2010).
hown to
able set-
l for us
between
earning.
ree-way
ons ex-
e to data
existing
ree-way
ostrich bird
penguin
X is a large Y [0.8]
X is a Y
[0.7]
both X and Y are fligtless
[0.5]
Figure 1: A relational graph between three words.
automatically extracted ontologies can be represented as re-
lational graphs.
Consider the relational graph shown in Figure 1. For ex-
ample, let us assume that we observed the context ostrich is
a large bird that lives in Africa in a corpus. Then, we ex-
tract the lexical pattern X is a large Y between ostrich and
bird from this context and include it in the relational graph
by adding two vertices each for ostrich and bird, and an edge
from ostrich to bird. Such lexical patterns have been used for
related tasks such as measuring semantic similarity between
xostrich=[:]
xostrich=[:]
xbird=[:]Glarge=[::]
Gfligtless=[::] Gis-a=[::]
s the
e re-
es of
such
elled
ords
co-
rn is
c re-
2 E
raph
abel
two
co-
u, v).
ver-
d by
pon-
both ostrich and penguin are flightless birds and penguin is
a bird will result in the relational graph shown in Figure 1.
Learning Word Representations
Given a relational graph as the input, we learn d dimen-
sional vector representations for each vertex in the graph.
The dimensionality d of the vector space is a pre-defined pa-
rameter of the method, and by adjusting it one can obtain
word representations at different granularities. Let us con-
sider two vertices u and v connected by an edge with label l
and weight w. We represent the two words u and v respec-
tively by two vectors x(u), x(v) 2 Rd
, and the label l by a
matrix G(l) 2 Rd⇥d
. We model the problem of learning op-
timal word representations ˆx(u) and pattern representations
ˆG(l) as the solution to the following squared loss minimisa-
tion problem
argmin
x(u)2Rd,G(l)2Rd⇥d
1
2
X
(u,v,l,w)2E
(x(u)>
G(l)x(v) w)
2
. (1)
The objective function given by Eq. 1 is jointly non-
convex in both word representations x(u) (or alternatively
x(v)) and pattern representations G(l). However, if G(l) is
positive semidefinite, and one of the two variables is held
関係行列 自乗誤差
単語の意味表現ベクトル 共起の強さ u, v, l
最適化
30
cates co-
ediction-
However,
n learn-
nces be-
that ex-
context.
ased ap-
d by de-
ating se-
i 2010).
hown to
able set-
l for us
between
earning.
ree-way
ons ex-
e to data
existing
ree-way
ostrich bird
penguin
X is a large Y [0.8]
X is a Y
[0.7]
both X and Y are fligtless
[0.5]
Figure 1: A relational graph between three words.
automatically extracted ontologies can be represented as re-
lational graphs.
Consider the relational graph shown in Figure 1. For ex-
ample, let us assume that we observed the context ostrich is
a large bird that lives in Africa in a corpus. Then, we ex-
tract the lexical pattern X is a large Y between ostrich and
bird from this context and include it in the relational graph
by adding two vertices each for ostrich and bird, and an edge
from ostrich to bird. Such lexical patterns have been used for
related tasks such as measuring semantic similarity between
xostrich=[:]
xostrich=[:]
xbird=[:]Glarge=[::]
Gfligtless=[::] Gis-a=[::]
• 目的関数はそれぞれの変数x(u), G(l), x(v)に対し,非凸関数となって
いる.
• しかし,これらの変数のうちどれか2つを固定すれば残りの変数に関
して凸関数となる.(但しG(l)は正定値行列でなければならない)
• 従って,目的関数をそれぞれの変数で偏微分し,確率的勾配法を使っ
て最適化することができる.
s the
e re-
es of
such
elled
ords
co-
rn is
c re-
2 E
raph
abel
two
co-
u, v).
ver-
d by
pon-
both ostrich and penguin are flightless birds and penguin is
a bird will result in the relational graph shown in Figure 1.
Learning Word Representations
Given a relational graph as the input, we learn d dimen-
sional vector representations for each vertex in the graph.
The dimensionality d of the vector space is a pre-defined pa-
rameter of the method, and by adjusting it one can obtain
word representations at different granularities. Let us con-
sider two vertices u and v connected by an edge with label l
and weight w. We represent the two words u and v respec-
tively by two vectors x(u), x(v) 2 Rd
, and the label l by a
matrix G(l) 2 Rd⇥d
. We model the problem of learning op-
timal word representations ˆx(u) and pattern representations
ˆG(l) as the solution to the following squared loss minimisa-
tion problem
argmin
x(u)2Rd,G(l)2Rd⇥d
1
2
X
(u,v,l,w)2E
(x(u)>
G(l)x(v) w)
2
. (1)
The objective function given by Eq. 1 is jointly non-
convex in both word representations x(u) (or alternatively
x(v)) and pattern representations G(l). However, if G(l) is
positive semidefinite, and one of the two variables is held
関係行列 自乗誤差
単語の意味表現ベクトル 共起の強さ u, v, l
アナロジー予測の性能
31
Method
capital-
common
capital-
world
city-in-
state
family
(gender)
currency overall
SVD+LEX 11.43 5.43 0 9.52 0 3.84
SVD+POS 4.57 9.06 0 29.05 0 6.57
SVD+DEP 5.88 3.02 0 0 0 1.11
CBOW 8.49 5.26 4.95 47.82 2.37 10.58
skip-gram 9.15 9.34 5.97 67.98 5.29 14.86
GloVe 4.24 4.93 4.35 65.41 0 11.89
Prop+LEX 22.87 31.42 15.83 61.19 25.0 26.61
Prop+POS 22.55 30.82 14.98 60.48 20.0 25.35
Prop+DEP 20.92 31.40 15.27 56.19 20.0 24.68
単語から関係を導出
•v(king) - v(man)はkingとmanの間の関係を表わ
しているはず.そうでなければ,類推問題が解け
ない(関係類似性が計測できない)
•ならば,特定の関係で結ばれている単語同士の
意味表現ベクトルの差分をとれば関係の表現が
作れるはず.[Bollegala+ IJCAI-15]

32
|R(p)| =
(u,v)2R(p)
f(p, u, v) (3)
We represent a word x using a vector x 2 Rd
. The dimen-
sionality of the representation, d, is a hyperparameter of the
proposed method. Prior work on word representation learn-
ing have observed that the difference between the vectors that
represent two words closely approximates the semantic re-
lations that exist between those two words. For example, the
vector v(king) v(queen) has shown to be similar to the vec-
tor v(man) v(woman). We use this property to represent a
pattern p by a vector p 2 Rd
as the weighted sum of dif-
ferences between the two words in all word-pairs (u, v) that
co-occur with p as follows,
p =
1
|R(p)|
X
(u,v)2R(p)
f(p, u, v)(u v). (4)
For example, consider Fig. 1, where the two word-pairs
los
Dif
fun
gen
we
to
lin
T
con
tio
意味表現学習
33
x1 x2
p1
1
x3 x4
p2
lion cat ostrich bird
large Ys such as Xs X is a huge Y
f(p1, x1, x2)
(p1
>
p2)
-f(p1, x1, x2) f(p2, x3, x4) -f(p2, x3, x4)
Figure 1: Computing the similarity between two patterns.
p2 = X is a huge Y. Assuming that there are no other co-
occurrences between word-pairs and patterns in the corpus,
he representations of the patterns p1 and p2 are given respec-
ively by p1 = x1 x2, and p2 = x3 x4. We measure the
relational similarity between (x1, x2) and (x3, x4) using the
nner-product p1
>
p2.
We model the problem of learning word representations as
all words (or patterns) corresponding to the slot variabl
represent a pattern p by the set R(p) of word-pairs (u,
which f(p, u, v) > 0. Formally, we define R(p) and its
|R(p)| as follows,
R(p) = {(u, v)|f(p, u, v) > 0}
|R(p)| =
X
(u,v)2R(p)
f(p, u, v)
We represent a word x using a vector x 2 Rd
. The d
sionality of the representation, d, is a hyperparameter
proposed method. Prior work on word representation
ing have observed that the difference between the vecto
represent two words closely approximates the seman
lations that exist between those two words. For examp
vector v(king) v(queen) has shown to be similar to th
tor v(man) v(woman). We use this property to repre
pattern p by a vector p 2 Rd
as the weighted sum o
ferences between the two words in all word-pairs (u, v
co-occur with p as follows,
p =
1
|R(p)|
X
(u,v)2R(p)
f(p, u, v)(u v).
For example, consider Fig. 1, where the two word
(lion, cat), and (ostrich, bird) co-occur respectively
the two lexical patterns, p1 = large Ys such as Xs
語彙パターンの集合として関係を表現
sionality of the representation, d, is a hyperparamete
proposed method. Prior work on word representatio
ing have observed that the difference between the vec
represent two words closely approximates the sema
lations that exist between those two words. For exam
vector v(king) v(queen) has shown to be similar to
tor v(man) v(woman). We use this property to rep
pattern p by a vector p 2 Rd
as the weighted sum
ferences between the two words in all word-pairs (u
co-occur with p as follows,
p =
1
|R(p)|
X
(u,v)2R(p)
f(p, u, v)(u v).
For example, consider Fig. 1, where the two wo
(lion, cat), and (ostrich, bird) co-occur respective
the two lexical patterns, p1 = large Ys such as Xuとvの間の関係をそれらの意味表現
ベクトルの「引き算」で与える
(i.e. the sequence of tokens that appear in between
en two words in a context). Although we use lexi-
erns as features for representing semantic relations in
rk, our proposed method is not limited to lexical pat-
nd can be used in principle with any type of features
resent relations. The strength of association between
pair (u, v) and a pattern p is measured using the pos-
intwise mutual information (PPMI), f(p, u, v), which
ed as follows,
f(p, u, v) = max(0, log
✓
g(p, u, v)g(⇤, ⇤, ⇤)
g(p, ⇤, ⇤)g(⇤, u, v)
◆
). (1)
(p, u, v) denotes the number of co-occurrences be-
p and (u, v), and ⇤ denotes the summation taken over
ds (or patterns) corresponding to the slot variable. We
nt a pattern p by the set R(p) of word-pairs (u, v) for
f(p, u, v) > 0. Formally, we define R(p) and its norm
as follows,
R(p) = {(u, v)|f(p, u, v) > 0} (2)
|R(p)| =
X
(u,v)2R(p)
f(p, u, v) (3)
resent a word x using a vector x 2 Rd
. The dimen-
y of the representation, d, is a hyperparameter of the
ed method. Prior work on word representation learn-
e observed that the difference between the vectors that
nt two words closely approximates the semantic re-
that exist between those two words. For example, the
v(king) v(queen) has shown to be similar to the vec-
man) v(woman). We use this property to represent a
p by a vector p 2 Rd
as the weighted sum of dif-
p2 = X is a huge Y. Assuming that there are no other co-
occurrences between word-pairs and patterns in the corpus,
the representations of the patterns p1 and p2 are given respec-
tively by p1 = x1 x2, and p2 = x3 x4. We measure the
relational similarity between (x1, x2) and (x3, x4) using the
inner-product p1
>
p2.
We model the problem of learning word representations as
a binary classification task, where we learn representations
for words such that they can be used to accurately predict
whether a given pair of patterns are relationally similar. In
our previous example, we would learn representations for the
four words lion, cat, ostrich, and bird such that the similarity
between the two patterns large Ys such as Xs, and X is a huge
Y is maximized. Later in Section 3.1, we propose an unsuper-
vised method for selecting relationally similar (positive) and
dissimilar (negative) pairs of patterns as training instances to
train a binary classifier.
Let us denote the target label for two patterns p1, p2 by
t(p1, p2) 2 {1, 0}, where the value 1 indicates that p1 and
p2 are relationally similar, and 0 otherwise. We compute the
prediction loss for a pair of patterns (p1, p2) as the squared
loss between the target and the predicted labels as follows,
L(t(p1, p2), p1, p2) =
1
2
(t(p1, p2) (p1
>
p2))
2
. (5)
Different non-linear functions can be used as the prediction
function (·) such as the logistic-sigmoid, hyperbolic tan-
gent, or rectified linear units. In our preliminary experiments
we found hyperbolic tangent, tanh, given by
(✓) = tanh(✓) =
exp(✓) exp( ✓)
exp(✓) + exp( ✓)
(6)
to work particularly well among those different non-
tations are given by,
@L
@p1
= 0
(p1
>
p2)( (p1
>
p2) t(p1, p2))p2, (8)
@L
@p2
= 0
(p1
>
p2)( (p1
>
p2) t(p1, p2))p1. (9)
Here, 0
denotes the first derivative of tanh, which is given
by 1 (✓)
2
. To simplify the notation we drop the arguments
of the loss function.
From Eq. 4 we get,
@p1
@x
=
1
|R(p1)|
(h(p1, u = x, v) h(p1, u, v = x)) , (10)
@p2
@x
=
1
|R(p2)|
(h(p2, u = x, v) h(p2, u, v = x)) , (11)
where,
h(p, u = x, v) =
X
(x,v)2{(u,v)|(u,v)2R(p),u=x}
f(p, x, v),
and
h(p, u, v = x) =
X
(u,x)2{(u,v)|(u,v)2R(p),v=x}
f(p, u, x).
Substituting the partial derivatives given by Eqs. 8-11 in
Eq. 7 we get,
@L
@x
= (p1, p2)[H(p1, x)
X
(u,v)2R(p2)
f(p2, u, v)(u v)
+H(p2, x)
X
(u,v)2R(p1)
f(p1, u, v)(u v)], (12)
where (p1, p2) is defined as
アナロジー予測の性能
34
Table 1: Word analogy results on benchmark datasets.
Method sem. synt. total SAT SemEval
ivLBL CosAdd 63.60 61.80 62.60 20.85 34.63
ivLBL CosMult 65.20 63.00 64.00 19.78 33.42
ivLBL PairDiff 52.60 48.50 50.30 22.45 36.94
skip-gram CosAdd 31.89 67.67 51.43 29.67 40.89
skip-gram CosMult 33.98 69.62 53.45 28.87 38.54
skip-gram PairDiff 7.20 19.73 14.05 35.29 43.99
CBOW CosAdd 39.75 70.11 56.33 29.41 40.31
CBOW CosMult 38.97 70.39 56.13 28.34 38.19
CBOW PairDiff 5.76 13.43 9.95 33.16 42.89
GloVe CosAdd 86.67 82.81 84.56 27.00 40.11
GloVe CosMult 86.84 84.80 85.72 25.66 37.56
GloVe PairDiff 45.93 41.23 43.36 44.65 44.67
Prop CosAdd 86.70 85.35 85.97 29.41 41.86
Prop CosMult 86.91 87.04 86.98 28.87 39.67
Prop PairDiff 41.85 42.86 42.40 45.99 44.88
number of candidate word-pairs out of which only one is cor-
コーパス vs. 辞書
•コーパスさえあれば単語(関係)の分散的意味表現
が学習できる.
•しかし,既に人間が長年かけて作った「辞書」とい
うもので単語の意味が定義されている
•この両方を使うことでより正確な意味表現が学習で
きないか.[Bollegala+ AAAI-15]
•特に,コーパスが不完全な場合,辞書(オントロ
ジー)が役立つ
•私は犬と猫が好きだ.
35
JointReps
•コーパス中で同一文内に出現する単語を予測す
る.その際に生じる誤差(目的関数)を最小化
する.
•辞書(WordNet)で定義されている意味的関係を
制約として入れる.

36
then extract unigrams from the co-occurrence windows as
the corresponding context words. We down-weight distant
(and potentially noisy) co-occurrences using the reciprocal
1/l of the distance in tokens l between the two words that
co-occur.
A word wi is assigned two vectors wi and ˜wi denoting
whether wi is respectively the target of the prediction (cor-
responding to the rows of X), or in the context of another
word (corresponding to the columns of X). The GloVe ob-
jective can then be written as:
JC =
1
2
X
i2V
X
j2V
f(Xij)
⇣
wi
>
˜wj + bi + ˜bj log(Xij)
⌘2
(1)
Here, bi and ˜bj are real-valued scalar bias terms that adjust
for the difference between the inner-product and the loga-
rithm of the co-occurrence counts. The function f discounts
the co-occurrences between frequent words and is given by:
f(t) =
(
(t/tmax)↵
if t < tmax
1 otherwise
(2)
(3).
miza
Here
coeffi
man
corp
value
Th
w.r.t
we fi
the r
tion
rame
pre-d
keep
Th
d as re-
seman-
hod for
, where
vectors
a man-
tracted
nly the
escribe
Rd
for
denote
wi, and
pus) is
repre-
od that
ic lexi-
etween
Miller,
aphrase
we do
rticular
s paper
tmax = 100 in our experiments. The objective function de-
fined by (1) encourages the learning of word representations
that demonstrate the desirable property that vector differ-
ence between the word embeddings for two words represents
the semantic relations that exist between those two words.
For example, Mikolov et al. [2013c] observed that the dif-
ference between the word embeddings for the words king
and man when added to the word embedding for the word
woman yields a vector similar to that of queen.
Unfortunately, the objective function given by (1) does
not capture the semantic relations that exist between wi and
wj as specified in the lexicon S. Consequently, it considers
all co-occurrences equally and is likely to encounter prob-
lems when the co-occurrences are rare. To overcome this
problem we propose a regularizer, JS, by considering the
three-way co-occurrence among words wi, wj, and a seman-
tic relation R that exists between the target word wi and one
of its context words wj in the lexicon as follows:
JS =
1
2
X
i2V
X
j2V
R(i, j) (wi ˜wj)2
(3)
Here, R(i, j) is a binary function that returns 1 if the se-
mantic relation R exists between the words wi and wj in
単語間の意味的類似性計測
37
Table 1: Performance of the proposed method with different semantic relation types.
Method RG MC RW SCWS MEN sem syn total SemEval
corpus only 0.7523 0.6398 0.2708 0.460 0.6933 61.49 66.00 63.95 37.98
Synonyms 0.7866 0.7019 0.2731 0.4705 0.7090 61.46 69.33 65.76 38.65
Antonyms 0.7694 0.6417 0.2730 0.4644 0.6973 61.64 66.66 64.38 38.01
Hypernyms 0.7759 0.6713 0.2638 0.4554 0.6987 61.22 68.89 65.41 38.21
Hyponyms 0.7660 0.6324 0.2655 0.4570 0.6972 61.38 68.28 65.15 38.30
Member-holonyms 0.7681 0.6321 0.2743 0.4604 0.6952 61.69 66.36 64.24 37.95
Member-meronyms 0.7701 0.6223 0.2739 0.4611 0.6963 61.61 66.31 64.17 37.98
Part-holonyms 0.7852 0.6841 0.2732 0.4650 0.7007 61.44 67.34 64.66 38.07
Part-meronyms 0.7786 0.6691 0.2761 0.4679 0.7005 61.66 67.11 64.63 38.29
syn) analogies, and 8869 semantic analogies (sem).
mEval dataset contains manually ranked word-pairs
word-pairs describing various semantic relation types,
s defective, and agent-goal. In total there are 3218
airs in the SemEval dataset. Given a proportional
y a : b :: c : d, we compute the cosine similarity be-
b a+c and c, where the boldface symbols represent
beddings of the corresponding words. For the Google
we measure the accuracy for predicting the fourth
in each proportional analogy from the entire vocab-
We use the binomial exact test with Clopper-Pearson
nce interval to test for the statistical significance of
orted accuracy values. For SemEval we use the offi-
luation tool3
to compute MaxDiff scores.
Table 2: Comparison against prior work.
Method RG MEN sem syn
RCM 0.471 0.501 - 29.9
R-NET - - 32.64 43.46
C-NET - - 37.07 40.06
RC-NET - - 34.36 44.42
Retro (CBOW) 0.577 0.605 36.65 52.5
Retro (SG) 0.745 0.657 45.29 65.65
Retro (corpus only) 0.786 0.673 61.11 68.14
Proposed (synonyms) 0.787 0.709 61.46 69.33
learning of rare words, among which the co-occurrences a
人間が付けた類似度スコアとアルゴリズムが
出した類似度とのSpearman相関を使って評価.
様々な意味的関係を制約として使える.
類義語関係(synonymy)が最も有効.
残された難題
•単語の共起を予測するのが意味表現を学習するた
めの最適なタスクなのか.
•単語の意味表現ベクトルがなす空間について何も
しらない.
•そもそもベクトルで十分なのかさえ分からない
•文や文書の意味をどう表すか.(構成的意味論)
•多言語,曖昧性をどう扱うか.
38
39
御免 - sorry + thanks = 有難う
Danushka Bollegala
www.csc.liv.ac.uk/~danushka
danushka.bollegala@liverpool.ac.uk
@Bollegala

Weitere ähnliche Inhalte

Was ist angesagt?

Probabilistic face embeddings
Probabilistic face embeddingsProbabilistic face embeddings
Probabilistic face embeddingsKazuki Maeno
 
言語と画像の表現学習
言語と画像の表現学習言語と画像の表現学習
言語と画像の表現学習Yuki Noguchi
 
スペクトラル・クラスタリング
スペクトラル・クラスタリングスペクトラル・クラスタリング
スペクトラル・クラスタリングAkira Miyazawa
 
Bayesian Neural Networks : Survey
Bayesian Neural Networks : SurveyBayesian Neural Networks : Survey
Bayesian Neural Networks : Surveytmtm otm
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 
異常検知と変化検知 9章 部分空間法による変化点検知
異常検知と変化検知 9章 部分空間法による変化点検知異常検知と変化検知 9章 部分空間法による変化点検知
異常検知と変化検知 9章 部分空間法による変化点検知hagino 3000
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep LearningSeiya Tokui
 
混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)Takao Yamanaka
 
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知Chihiro Kusunoki
 
【論文読み会】Self-Attention Generative Adversarial Networks
【論文読み会】Self-Attention Generative  Adversarial Networks【論文読み会】Self-Attention Generative  Adversarial Networks
【論文読み会】Self-Attention Generative Adversarial NetworksARISE analytics
 
統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)Kota Mori
 
機械学習モデルの判断根拠の説明(Ver.2)
機械学習モデルの判断根拠の説明(Ver.2)機械学習モデルの判断根拠の説明(Ver.2)
機械学習モデルの判断根拠の説明(Ver.2)Satoshi Hara
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデルMasahiro Suzuki
 
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...joisino
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksDeep Learning JP
 
【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究Satoshi Hara
 
【解説】 一般逆行列
【解説】 一般逆行列【解説】 一般逆行列
【解説】 一般逆行列Kenjiro Sugimoto
 

Was ist angesagt? (20)

t-SNE Explained
t-SNE Explainedt-SNE Explained
t-SNE Explained
 
Probabilistic face embeddings
Probabilistic face embeddingsProbabilistic face embeddings
Probabilistic face embeddings
 
言語と画像の表現学習
言語と画像の表現学習言語と画像の表現学習
言語と画像の表現学習
 
スペクトラル・クラスタリング
スペクトラル・クラスタリングスペクトラル・クラスタリング
スペクトラル・クラスタリング
 
Bayesian Neural Networks : Survey
Bayesian Neural Networks : SurveyBayesian Neural Networks : Survey
Bayesian Neural Networks : Survey
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
一般化線形モデル (GLM) & 一般化加法モデル(GAM)
一般化線形モデル (GLM) & 一般化加法モデル(GAM) 一般化線形モデル (GLM) & 一般化加法モデル(GAM)
一般化線形モデル (GLM) & 一般化加法モデル(GAM)
 
異常検知と変化検知 9章 部分空間法による変化点検知
異常検知と変化検知 9章 部分空間法による変化点検知異常検知と変化検知 9章 部分空間法による変化点検知
異常検知と変化検知 9章 部分空間法による変化点検知
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)
 
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知深層自己符号化器+混合ガウスモデルによる教師なし異常検知
深層自己符号化器+混合ガウスモデルによる教師なし異常検知
 
【論文読み会】Self-Attention Generative Adversarial Networks
【論文読み会】Self-Attention Generative  Adversarial Networks【論文読み会】Self-Attention Generative  Adversarial Networks
【論文読み会】Self-Attention Generative Adversarial Networks
 
正準相関分析
正準相関分析正準相関分析
正準相関分析
 
統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)
 
機械学習モデルの判断根拠の説明(Ver.2)
機械学習モデルの判断根拠の説明(Ver.2)機械学習モデルの判断根拠の説明(Ver.2)
機械学習モデルの判断根拠の説明(Ver.2)
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデル
 
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究
 
【解説】 一般逆行列
【解説】 一般逆行列【解説】 一般逆行列
【解説】 一般逆行列
 

Ähnlich wie 深層意味表現学習 (Deep Semantic Representations)

GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationAttaporn Ninsuwan
 
Threshold network models
Threshold network modelsThreshold network models
Threshold network modelsNaoki Masuda
 
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...Vladimir Kulyukin
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerAdriano Melo
 
Large variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disasterLarge variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disasterHang-Hyun Jo
 
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Tanjarul Islam Mishu
 
CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semanticsJinho Choi
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy AnnotationsToru Tamaki
 
Data integration and provenance-Chapter-14
Data integration and provenance-Chapter-14Data integration and provenance-Chapter-14
Data integration and provenance-Chapter-14saadhash286
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)준식 최
 
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notesLda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes👋 Christopher Moody
 
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...Dwaipayan Roy
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataTony Fast
 

Ähnlich wie 深層意味表現学習 (Deep Semantic Representations) (20)

GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
 
Threshold network models
Threshold network modelsThreshold network models
Threshold network models
 
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
Connect-the-Dots in a Graph and Buffon's Needle on a Chessboard: Two Problems...
 
leanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL ReasonerleanCoR: lean Connection-based DL Reasoner
leanCoR: lean Connection-based DL Reasoner
 
Large variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disasterLarge variance and fat tail of damage by natural disaster
Large variance and fat tail of damage by natural disaster
 
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
Dynamic time wrapping (dtw), vector quantization(vq), linear predictive codin...
 
CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semantics
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
 
Data integration and provenance-Chapter-14
Data integration and provenance-Chapter-14Data integration and provenance-Chapter-14
Data integration and provenance-Chapter-14
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
graph_embeddings
graph_embeddingsgraph_embeddings
graph_embeddings
 
Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)Paper Summary of Disentangling by Factorising (Factor-VAE)
Paper Summary of Disentangling by Factorising (Factor-VAE)
 
kcde
kcdekcde
kcde
 
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notesLda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
 
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
 
Deepwalk vs Node2vec
Deepwalk vs Node2vecDeepwalk vs Node2vec
Deepwalk vs Node2vec
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 
Deepwalk vs Node2vec
Deepwalk vs Node2vecDeepwalk vs Node2vec
Deepwalk vs Node2vec
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 

Kürzlich hochgeladen

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 

Kürzlich hochgeladen (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 

深層意味表現学習 (Deep Semantic Representations)