SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Harnessing Deep Neural
Networks with Logic Rules
Zhiting Hu, Xuezhe Ma,
Zhengzhong Liu, Eduard Hovy, and Eric P. Xing
ACL2016
読む⼈人:東北北⼤大学,⾼高瀬翔
116/09/12 第8回最先端NLP勉強会
スライド中の図,表は [Hu+ 16] から引⽤用
⽬目的
•  ⼀一般的な規則や⼈人の直観をニューラル
ネットに導⼊入したい
– 評判分析において,A but B という⽂文のポジ/
ネガは B と⼀一致する
– 固有表現抽出において,B-PERの後にI-ORG
はありえない
•  規則や直観を⼀一階述語論論理理で表現
– equal(yi-1, B-PER) → ¬ equal(yi, I-ORG)
•  論論理理規則を制約として学習に⽤用いる
216/09/12 第8回最先端NLP勉強会
⼿手法の概要
•  pθ(y|x):モデル(任意のニューラルネット,例例えばCNN)の出⼒力力
•  q(y|x):制約を満たした上でのモデルの出⼒力力
•  q(y|x)の計算とパラメータの学習(θの更更新)を交互に⾏行行う
–  Posterior regularization [Ganchev+ 10] をニューラルネットで⾏行行う
316/09/12 第8回最先端NLP勉強会
⼿手法の概要
•  pθ(y|x):モデル(任意のニューラルネット,例例えばCNN)の出⼒力力
•  q(y|x):制約を満たした上でのモデルの出⼒力力
•  q(y|x)の計算とパラメータの学習(θの更更新)を交互に⾏行行う
–  Posterior regularization [Ganchev+ 10] をニューラルネットで⾏行行う
4
説明1
θの更更新
説明2
q(y|x)
説明3
16/09/12 第8回最先端NLP勉強会
パラメータの学習
•  達成したいこと
1,訓練事例例で正しいラベルを出⼒力力できる
2,制約を満たしたモデル(q(y|x))に似た出⼒力力
•  ⽬目的関数
–  1と2をπで重み調整し,⾜足す
–  lossは損失関数(今回は交差エントロピー)
5
min
1
N
NX
n=1
(1 ⇡)loss(yn, ✓(xn)) + ⇡loss(q(y|xn), ✓(xn))
1に相当 2に相当
xnについて,モデルの予測値
16/09/12 第8回最先端NLP勉強会
•  達成したいこと
1,制約(論論理理規則)を満たす
2,モデルの出⼒力力(pθ(y|x))に近い値となる
•  ⽬目的関数
expectation operator. That is, for each rule (indexed
by l) and each of its groundings (indexed by g)
on (X, Y ), we expect Eq(Y |X)[rlg(X, Y )] = 1,
with confidence l. The constraints define a rule-
regularized space of all valid distributions. For the
second property, we measure the closeness between
q and p✓ with KL-divergence, and wish to minimize
it. Combining the two factors together and further
allowing slackness for the constraints, we finally
get the following optimization problem:
min
q,⇠ 0
KL(q(Y |X)kp✓(Y |X)) + C
X
l,gl
⇠l,gl
s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl
gl = 1, . . . , Gl, l = 1, . . . , L,
(3)
where ⇠l,gl
0 is the slack variable for respec-
tive logic constraint; and C is the regularization
parameter. The problem can be seen as project-
forward
straints
the bas
sentime
straints
gram de
task (se
ming fo
constra
proxima
samples
the con
the rele
ference
when a
the soft
forward
2に相当
q(y|x)の計算(1/2)
6
1に相当
ξで制約を緩和
0から1までの連続値
事例例glが規則rlを満たすとき1
規則rlの強さ
(λが⼤大きい
=満たすべき規則)
16/09/12 第8回最先端NLP勉強会
•  q(y|x)は解析的に解ける(ラグランジュ双対問題)
–  Posterior regularization [Ganchev+ 10] と同様の解き⽅方  
•  制約の強さはC(定数)と  λ(規則ごとの値)で決定
•  規則を満たしていない場合,q(y|x)は⼩小さくなる
regularized space of all valid distributions. For the
second property, we measure the closeness between
q and p✓ with KL-divergence, and wish to minimize
it. Combining the two factors together and further
allowing slackness for the constraints, we finally
get the following optimization problem:
min
q,⇠ 0
KL(q(Y |X)kp✓(Y |X)) + C
X
l,gl
⇠l,gl
s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl
gl = 1, . . . , Gl, l = 1, . . . , L,
(3)
where ⇠l,gl
0 is the slack variable for respec-
tive logic constraint; and C is the regularization
parameter. The problem can be seen as project-
ing p✓ into the constrained subspace. The problem
is convex and can be efficiently solved in its dual
form with closed-form solutions. We provide the
detailed derivation in the supplementary materials
sentime
straints
gram de
task (se
ming fo
constrai
proxima
samples
the con
the rele
ference
when a
the soft
forward
tributio
calculat
p v.s. q
q(y|x)の計算(2/2)
7
q,⇠ 0 l,gl
l
s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl
gl = 1, . . . , Gl, l = 1, . . . , L,
(3)
where ⇠l,gl
0 is the slack variable for respec-
tive logic constraint; and C is the regularization
parameter. The problem can be seen as project-
ing p✓ into the constrained subspace. The problem
is convex and can be efficiently solved in its dual
form with closed-form solutions. We provide the
detailed derivation in the supplementary materials
and directly give the solution here:
q⇤
(Y |X) / p✓(Y |X) exp
8
<
:
X
l,gl
C l(1 rl,gl (X, Y ))
9
=
;
(4)
Intuitively, a strong rule with large l will lead to
low probabilities of predictions that fail to meet
the constraints spa
the relevant instan
ference (and rando
when a group is to
the soft prediction
forward pass is re
tribution p✓(y|x)
calculating the trut
p v.s. q at Test T
either the distilled
network q after a fi
sults show that bot
over the base netwo
label instances. In
p. Particularly, q i
rules introduce add
2413
16/09/12 第8回最先端NLP勉強会
規則について
•  ⼀一階述語論論理理で表現
–  A but B という⽂文のポジ/ネガは B と⼀一致
•  Probabilistic soft logicの枠組みで0から1の連続値
に変換
–  論論理理演算⼦子は
8
lg g=1
is typically relevant to only a single or subset of
examples, though here we give the most general
form on the entire set.
We encode the FOL rules using soft logic (Bach
et al., 2015) for flexible encoding and stable opti-
mization. Specifically, soft logic allows continu-
ous truth values from the interval [0, 1] instead of
{0, 1}, and the Boolean logic operators are refor-
mulated as:
A&B = max{A + B 1, 0}
A _ B = min{A + B, 1}
A1 ^ · · · ^ AN =
X
i
Ai/N
¬A = 1 A
(1)
Here & and ^ are two different approximations
vector
tion pa
of the t
A si
other s
et al.,
cess is
p✓(y|x
which
to hum
system
by pro
(i.e., th
ence f
teache
trained
is a classification
gnition which is a
describe the base
not focusing on
we largely use the
evious successful
the linguistically-
junction word “but” is one of the strong indicators
for such sentiment changes in a sentence, where
the sentiment of clauses following “but” generally
dominates. We thus consider sentences S with an
“A-but-B” structure, and expect the sentiment of the
whole sentence to be consistent with the sentiment
of clause B. The logic rule is written as:
has-‘A-but-B’-structure(S) )
(1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) ,
(5)
2414
16/09/12 第8回最先端NLP勉強会
規則の計算例例
•  A but B という⽂文のポジ/ネガは  B と⼀一致
•  ⽂文がポジティブ:
•  ⽂文がネガティブ:
9
ich is a classification
ecognition which is a
efly describe the base
are not focusing on
s, we largely use the
o previous successful
ign the linguistically-
ted.
junction word “but” is one of the strong indicators
for such sentiment changes in a sentence, where
the sentiment of clauses following “but” generally
dominates. We thus consider sentences S with an
“A-but-B” structure, and expect the sentiment of the
whole sentence to be consistent with the sentiment
of clause B. The logic rule is written as:
has-‘A-but-B’-structure(S) )
(1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) ,
(5)
2414
⽂文Sが  A but B という構造を持つ
⽂文がポジティブのとき1
そうでないとき0
B部分がポジティブと
モデルが予測した確率率率
(1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +))
, (¬1(y = +) _ ✓(B)+ ^ ¬ ✓(B)+ _ 1(y = +))
, (1 1(y = +) _ ✓(B)+ ^ 1 ✓(B)+ _ 1(y = +))
, (min{1 1(y = +) + ✓(B)+, 1} ^ min{1 ✓(B)+ + 1(y = +), 1})
(1 + ✓(B)+)/2
(2 ✓(B)+)/2
16/09/12 第8回最先端NLP勉強会
実験概要
•  制約を導⼊入して学習し,性能が向上する
か検証
•  評判分析,固有表現抽出で実験
– p,qの両⽅方の性能を検証
1016/09/12 第8回最先端NLP勉強会
実験設定(評判分析)
•  ポジ/ネガの⼆二値分類タスク
•  データセット:
–  Stanford Sentiment Treebank(SST2)
–  Movie Review(MR)
–  Customer Review(CR)
•  ベースライン:(単純な)CNN
–  [Kim+ 14]  と同じ⼿手法
•  適⽤用する制約(規則)
–  A but B という⽂文のポジ/ネガは  B と⼀一致
•  重要性  λ = 1
  
11
Algorithm 1 Harnessing NN with Rules
Input: The training data D = {(xn, yn)}N
n=1,
The rule set R = {(Rl, l)}L
l=1,
Parameters: ⇡ – imitation parameter
C – regularization strength
1: Initialize neural network parameter ✓
2: repeat
3: Sample a minibatch (X, Y ) ⇢ D
4: Construct teacher network q with Eq.(4)
5: Transfer knowledge into p✓ by updating ✓ with Eq.(2)
6: until convergence
Output: Distill student network p✓ and teacher network q
ning over multiple examples), requiring joint infer-
ence. In contrast, as mentioned above, p is more
lightweight and efficient, and useful when rule eval-
uation is expensive or impossible at prediction time.
Our experiments compare the performance of p and
q extensively.
Imitation Strength ⇡ The imitation parameter ⇡
in Eq.(2) balances between emulating the teacher
soft predictions and predicting the true hard la-
bels. Since the teacher network is constructed from
p✓, which, at the beginning of training, would pro-
duce low-quality predictions, we thus favor pre-
I like this book store a lot PaddingPadding
Word
Embedding
Convolution
Max Pooling
Sentence
Representation
Figure 2: The CNN architecture for sentence-level
sentiment analysis. The sentence representation
vector is followed by a fully-connected layer with
softmax output activation, to output sentiment pre-
dictions.
4.1 Sentiment Classification
Sentence-level sentiment analysis is to identify the
sentiment (e.g., positive or negative) underlying
an individual sentence. The task is crucial for
many opinion mining applications. One challeng-
ing point of the task is to capture the contrastive
of neural networks
t users are allowed
ntentions through
c. In this section
ur approach by ap-
work architectures,
recurrent network,
ons, i.e., sentence-
is a classification
gnition which is a
describe the base
not focusing on
we largely use the
evious successful
the linguistically-
windows. Multiple filters with varying window
sizes are used to obtain multiple features. Figure 2
shows the network architecture.
Logic Rules One difficulty for the plain neural
network is to identify contrastive sense in order to
capture the dominant sentiment precisely. The con-
junction word “but” is one of the strong indicators
for such sentiment changes in a sentence, where
the sentiment of clauses following “but” generally
dominates. We thus consider sentences S with an
“A-but-B” structure, and expect the sentiment of the
whole sentence to be consistent with the sentiment
of clause B. The logic rule is written as:
has-‘A-but-B’-structure(S) )
(1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) ,
(5)
241416/09/12 第8回最先端NLP勉強会
実験結果(評判分析)(1/3)
•  ベースライン [Kim+ 14] から性能が向上
•  MR,CRでstate-of-the-art
•  MVCNN(複数の単語ベクトル利利⽤用 + 複雑なCNN
(マルチチャンネル,多層))と同等の性能
12
Model SST2 MR CR
1 CNN (Kim, 2014) 87.2 81.3±0.1 84.3±0.2
2 CNN-Rule-p 88.8 81.6±0.1 85.0±0.3
3 CNN-Rule-q 89.3 81.7±0.1 85.3±0.3
4 MGNC-CNN (Zhang et al., 2016) 88.4 – –
5 MVCNN (Yin and Schutze, 2015) 89.4 – –
6 CNN-multichannel (Kim, 2014) 88.1 81.1 85.0
7 Paragraph-Vec (Le and Mikolov, 2014) 87.8 – –
8 CRF-PR (Yang and Cardie, 2014) – – 82.7
9 RNTN (Socher et al., 2013) 85.4 – –
10 G-Dropout (Wang and Manning, 2013) – 79.0 82.1
Table 1: Accuracy (%) of Sentiment Classification. Row 1, CNN (Kim, 2014) is the base network
corresponding to the “CNN-non-static” model in (Kim, 2014). Rows 2-3 are the networks enhanced by
our framework: CNN-Rule-p is the student network and CNN-Rule-q is the teacher network. For MR and
CR, we report the average accuracy±one standard deviation using 10-fold cross validation.
the base networks, we obtain substantial improve-
ments on both tasks and achieve state-of-the-art
or comparable results to previous best-performing
systems. Comparison with a diverse set of other
or positive sentiment. 3) CR (Hu and Liu, 2004),
customer reviews of various products, containing 2
classes and 3,775 instances. For MR and CR, we
use 10-fold cross validation as in previous work. In
16/09/12 第8回最先端NLP勉強会
実験結果(評判分析)(2/3)
•  pθ(y|x) と  q(y|x) を交互に計算する必要性
–  どちらか⼀一⽅方を最適化,⽚片⽅方の最適化後にもう⼀一⽅方
を求める,などでの性能は?
  
13
Model Accuracy (%)
1 CNN (Kim, 2014) 87.2
2 -but-clause 87.3
3 -`2-reg 87.5
4 -project 87.9
5 -opt-project 88.3
6 -pipeline 87.9
7 -Rule-p 88.8
8 -Rule-q 89.3
Table 2: Performance of different rule integration
1
2
3
4
5
6
Table 3
of labe
header
CNNを学習→q(y|x)の計算
q(y|x)を最適化
q(y|x)を最適化→CNNの学習
交互に計算したほうが
性能が良良い
16/09/12 第8回最先端NLP勉強会
実験結果(評判分析)(3/3)
•  データ量量に対する効果とラベルなしデータの利利⽤用
•  ラベルなしデータの利利⽤用で性能向上
–  制約によってラベルなしデータを上⼿手く使える
14
(%)
integration
Data size 5% 10% 30% 100%
1 CNN 79.9 81.6 83.6 87.2
2 -Rule-p 81.5 83.2 84.5 88.8
3 -Rule-q 82.5 83.9 85.6 89.3
4 -semi-PR 81.5 83.1 84.6 –
5 -semi-Rule-p 81.7 83.3 84.7 –
6 -semi-Rule-q 82.7 84.2 85.7 –
Table 3: Accuracy (%) on SST2 with varying sizes
of labeled data and semi-supervised learning. The
header row is the percentage of labeled examples
◯%のラベル付き
データを⽤用いて
学習
◯%のラベル付き
データと(100 - ◯)%の
ラベルなしデータを
⽤用いて学習
16/09/12 第8回最先端NLP勉強会
実験設定(固有表現抽出)
•  4種の固有表現(PER,ORG,LOC,Misc)の認識識タスク
•  データセット:CoNLL-2003 データセット
–  BIOESタグを採⽤用([Lample+ 16] などと同じ)
•  ベースライン:双⽅方向LSTM
–  [Chiu and Nichols, 15] からCNNを除去
•  適⽤用する制約(規則)
–  出⼒力力タグの並びが破綻していない
•  重要性  λ = ∞(強い制約)
–  リスト形式の場合,同種のタグとなる
•  1.  Juventus, 2. Barcelona, 3. …で  Juventus と Barcelona のタグは同種
•  重要性  λ = 1
  
1516/09/12 第8回最先端NLP勉強会
where 1(·) is an indicator function that takes 1
when its argument is true, and 0 otherwise; class ‘+’
represents ‘positive’; and ✓(B)+ is the element of
✓(B) for class ’+’. By Eq.(1), when S has the ‘A-
but-B’ structure, the truth value of the above logic
rule equals to (1 + ✓(B)+)/2 when y = +, and
(2 ✓(B)+)/2 otherwise 1. Note that here we
assume two-way classification (i.e., positive and
negative), though it is straightforward to design
rules for finer grained sentiment classification.
4.2 Named Entity Recognition
NER is to locate and classify elements in text into
entity categories such as “persons” and “organiza-
tions”. It is an essential first step for downstream
language understanding applications. The task as-
signs to each word a named entity tag in an “X-Y”
format where X is one of BIEOS (Beginning, In-
side, End, Outside, and Singleton) and Y is the
entity category. A valid tag sequence has to follow
Char+Word
Representation
Backward
LSTM
Forward
LSTM
LSTM LSTM LSTM LSTM
LSTM LSTM LSTM LSTM
Output
Representation
NYC locates in USA
Figure 3: The architecture of the bidirectional
LSTM recurrent network for NER. The CNN for
extracting character representation is omitted.
The confidence levels are set to 1 to prevent any
violation.
We further leverage the list structures within and
などequal(yi 1, B PER) ) ¬ equal(yi, I ORG)
s. The task as-
g in an “X-Y”
Beginning, In-
and Y is the
e has to follow
of the tagging
es (e.g., lists)
y expose some
has a similar
LSTM recur-
) proposed in
which has out-
models. The
word vectors
l information,
hen fed into a
s for sequence
hols, 2015) we
The confidence levels are set to 1 to prevent any
violation.
We further leverage the list structures within and
across sentences of the same documents. Specifi-
cally, named entities at corresponding positions in
a list are likely to be in the same categories. For
instance, in “1. Juventus, 2. Barcelona, 3. ...” we
know “Barcelona” must be an organization rather
than a location, since its counterpart entity “Juven-
tus” is an organization. We describe our simple
procedure for identifying lists and counterparts in
the supplementary materials. The logic rule is en-
coded as:
is-counterpart(X, A) ) 1 kc(ey) c( ✓(A))k2, (7)
where ey is the one-hot encoding of y (the class pre-
diction of X); c(·) collapses the probability mass
実験結果(固有表現抽出)
•  ベースラインから性能が向上
•  膨⼤大な外部資源を使った⼿手法 [Luo+ 15] やパラメータの多い
ニューラルネット [Ma and Hovy, 16] と同等の性能
16/09/12 第8回最先端NLP勉強会 16
Model F1
1 BLSTM 89.55
2 BLSTM-Rule-trans p: 89.80, q: 91.11
3 BLSTM-Rules p: 89.93, q: 91.18
4 NN-lex (Collobert et al., 2011) 89.59
5 S-LSTM (Lample et al., 2016) 90.33
6 BLSTM-lex (Chiu and Nichols, 2015) 90.77
7 BLSTM-CRF1 (Lample et al., 2016) 90.94
8 Joint-NER-EL (Luo et al., 2015) 91.20
9 BLSTM-CRF2 (Ma and Hovy, 2016) 91.21
Table 4: Performance of NER on CoNLL-2003.
Row 2, BLSTM-Rule-trans imposes the transition
rules (Eq.(6)) on the base BLSTM. Row 3, BLSTM-
Rules further incorporates the list rule (Eq.(7)). We
report the performance of both the student model p
NER
extra
as w
joint
tured
6 D
We h
deep
to al
tions
pose
fers
the w
リストに関する制約なし
リストに関する制約あり
まとめ
•  ⼀一般的な規則や⼈人の直観をニューラル
ネットに導⼊入する⼿手法を提案
– 規則を⼀一階述語論論理理で表現
– Probabilistic soft logicで0から1の連続値に
– 制約として学習
•  評判分析,固有表現抽出で実験
– 制約の導⼊入で性能が向上
– 複雑なネットワークなどと同等の性能
1716/09/12 第8回最先端NLP勉強会

Weitere ähnliche Inhalte

Was ist angesagt?

(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberAlex Klibisz
 
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken DownChin Huan Tan
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task LearningMasahiro Suzuki
 
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDaisuke BEKKI
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...AIST
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
Meta back translation
Meta back translationMeta back translation
Meta back translationHyunKyu Jeon
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsDaisuke BEKKI
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
Fuzzy logic and fuzzy time series edited
Fuzzy logic and fuzzy time series   editedFuzzy logic and fuzzy time series   edited
Fuzzy logic and fuzzy time series editedProf Dr S.M.Aqil Burney
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmLoc Nguyen
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 

Was ist angesagt? (20)

(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
 
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken Down
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
 
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
 
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Meta back translation
Meta back translationMeta back translation
Meta back translation
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
Icml12
Icml12Icml12
Icml12
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Fuzzy logic and fuzzy time series edited
Fuzzy logic and fuzzy time series   editedFuzzy logic and fuzzy time series   edited
Fuzzy logic and fuzzy time series edited
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Bq25399403
Bq25399403Bq25399403
Bq25399403
 
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
 

Andere mochten auch

Improving Coreference Resolution by Learning Entity-Level Distributed Represe...
Improving Coreference Resolution by Learning Entity-Level Distributed Represe...Improving Coreference Resolution by Learning Entity-Level Distributed Represe...
Improving Coreference Resolution by Learning Entity-Level Distributed Represe...Yuichiroh Matsubayashi
 
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or NotShuntaro Yada
 
Language and Domain Independent Entity Linking with Quantified Collective Val...
Language and Domain Independent Entity Linking with Quantified Collective Val...Language and Domain Independent Entity Linking with Quantified Collective Val...
Language and Domain Independent Entity Linking with Quantified Collective Val...Shuangshuang Zhou
 
Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...Koji Matsuda
 
Retrofitting Word Vectors to Semantic Lexicons
Retrofitting Word Vectors to Semantic LexiconsRetrofitting Word Vectors to Semantic Lexicons
Retrofitting Word Vectors to Semantic LexiconsSho Takase
 
Learning Composition Models for Phrase Embeddings
Learning Composition Models for Phrase EmbeddingsLearning Composition Models for Phrase Embeddings
Learning Composition Models for Phrase EmbeddingsSho Takase
 
Black holes and white rabbits metaphor identification with visual features
Black holes and white rabbits  metaphor identification with visual featuresBlack holes and white rabbits  metaphor identification with visual features
Black holes and white rabbits metaphor identification with visual featuresSumit Maharjan
 
NLP2015 構成性に基づく関係パタンの意味計算
NLP2015 構成性に基づく関係パタンの意味計算NLP2015 構成性に基づく関係パタンの意味計算
NLP2015 構成性に基づく関係パタンの意味計算Sho Takase
 
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」Kaoru Nasuno
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPNaoaki Okazaki
 
統計的係り受け解析入門
統計的係り受け解析入門統計的係り受け解析入門
統計的係り受け解析入門Yuya Unno
 
"Joint Extraction of Events and Entities within a Document Context"の解説
"Joint Extraction of Events and Entities within a Document Context"の解説"Joint Extraction of Events and Entities within a Document Context"の解説
"Joint Extraction of Events and Entities within a Document Context"の解説Akihiro Kameda
 
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011Preferred Networks
 
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...cvpaper. challenge
 
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」Naonori Nagano
 

Andere mochten auch (17)

Improving Coreference Resolution by Learning Entity-Level Distributed Represe...
Improving Coreference Resolution by Learning Entity-Level Distributed Represe...Improving Coreference Resolution by Learning Entity-Level Distributed Represe...
Improving Coreference Resolution by Learning Entity-Level Distributed Represe...
 
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not
 
Language and Domain Independent Entity Linking with Quantified Collective Val...
Language and Domain Independent Entity Linking with Quantified Collective Val...Language and Domain Independent Entity Linking with Quantified Collective Val...
Language and Domain Independent Entity Linking with Quantified Collective Val...
 
Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...
 
Snlp2016 kameko
Snlp2016 kamekoSnlp2016 kameko
Snlp2016 kameko
 
Retrofitting Word Vectors to Semantic Lexicons
Retrofitting Word Vectors to Semantic LexiconsRetrofitting Word Vectors to Semantic Lexicons
Retrofitting Word Vectors to Semantic Lexicons
 
Learning Composition Models for Phrase Embeddings
Learning Composition Models for Phrase EmbeddingsLearning Composition Models for Phrase Embeddings
Learning Composition Models for Phrase Embeddings
 
4thNLPDL
4thNLPDL4thNLPDL
4thNLPDL
 
Black holes and white rabbits metaphor identification with visual features
Black holes and white rabbits  metaphor identification with visual featuresBlack holes and white rabbits  metaphor identification with visual features
Black holes and white rabbits metaphor identification with visual features
 
NLP2015 構成性に基づく関係パタンの意味計算
NLP2015 構成性に基づく関係パタンの意味計算NLP2015 構成性に基づく関係パタンの意味計算
NLP2015 構成性に基づく関係パタンの意味計算
 
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
 
統計的係り受け解析入門
統計的係り受け解析入門統計的係り受け解析入門
統計的係り受け解析入門
 
"Joint Extraction of Events and Entities within a Document Context"の解説
"Joint Extraction of Events and Entities within a Document Context"の解説"Joint Extraction of Events and Entities within a Document Context"の解説
"Joint Extraction of Events and Entities within a Document Context"の解説
 
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
 
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
 
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」
 

Ähnlich wie Harnessing Deep Neural Networks with Logic Rules

A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisMonica Franklin
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalYI-JHEN LIN
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
A0311010106
A0311010106A0311010106
A0311010106theijes
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problemstheijes
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paperDBOnto
 
Stochastic Processes Homework Help
Stochastic Processes Homework HelpStochastic Processes Homework Help
Stochastic Processes Homework HelpExcel Homework Help
 
A Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemA Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemSheila Sinclair
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
A Formal Approach To Problem Solving
A Formal Approach To Problem SolvingA Formal Approach To Problem Solving
A Formal Approach To Problem SolvingNancy Rinehart
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscapeDevansh16
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Association for Computational Linguistics
 
Learning group variational inference
Learning group  variational inferenceLearning group  variational inference
Learning group variational inferenceShuai Zhang
 

Ähnlich wie Harnessing Deep Neural Networks with Logic Rules (20)

Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer Analysis
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Stochastic Processes Homework Help
Stochastic Processes Homework Help Stochastic Processes Homework Help
Stochastic Processes Homework Help
 
Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
A0311010106
A0311010106A0311010106
A0311010106
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problems
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paper
 
F5233444
F5233444F5233444
F5233444
 
Stochastic Processes Homework Help
Stochastic Processes Homework HelpStochastic Processes Homework Help
Stochastic Processes Homework Help
 
AI Lesson 11
AI Lesson 11AI Lesson 11
AI Lesson 11
 
A Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemA Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment Problem
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
A Formal Approach To Problem Solving
A Formal Approach To Problem SolvingA Formal Approach To Problem Solving
A Formal Approach To Problem Solving
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
 
Learning group variational inference
Learning group  variational inferenceLearning group  variational inference
Learning group variational inference
 

Mehr von Sho Takase

Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてSho Takase
 
ニューラルネットワークを用いた自然言語処理
ニューラルネットワークを用いた自然言語処理ニューラルネットワークを用いた自然言語処理
ニューラルネットワークを用いた自然言語処理Sho Takase
 
NeurIPS2020参加報告
NeurIPS2020参加報告NeurIPS2020参加報告
NeurIPS2020参加報告Sho Takase
 
STAIR Lab Seminar 202105
STAIR Lab Seminar 202105STAIR Lab Seminar 202105
STAIR Lab Seminar 202105Sho Takase
 
Robust Neural Machine Translation with Doubly Adversarial Inputs
Robust Neural Machine Translation with Doubly Adversarial InputsRobust Neural Machine Translation with Doubly Adversarial Inputs
Robust Neural Machine Translation with Doubly Adversarial InputsSho Takase
 
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...Sho Takase
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSho Takase
 
Lexical Inference over Multi-Word Predicates
Lexical Inference over Multi-Word PredicatesLexical Inference over Multi-Word Predicates
Lexical Inference over Multi-Word PredicatesSho Takase
 
dont_count_predict_in_acl2014
dont_count_predict_in_acl2014dont_count_predict_in_acl2014
dont_count_predict_in_acl2014Sho Takase
 

Mehr von Sho Takase (9)

Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
 
ニューラルネットワークを用いた自然言語処理
ニューラルネットワークを用いた自然言語処理ニューラルネットワークを用いた自然言語処理
ニューラルネットワークを用いた自然言語処理
 
NeurIPS2020参加報告
NeurIPS2020参加報告NeurIPS2020参加報告
NeurIPS2020参加報告
 
STAIR Lab Seminar 202105
STAIR Lab Seminar 202105STAIR Lab Seminar 202105
STAIR Lab Seminar 202105
 
Robust Neural Machine Translation with Doubly Adversarial Inputs
Robust Neural Machine Translation with Doubly Adversarial InputsRobust Neural Machine Translation with Doubly Adversarial Inputs
Robust Neural Machine Translation with Doubly Adversarial Inputs
 
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Lexical Inference over Multi-Word Predicates
Lexical Inference over Multi-Word PredicatesLexical Inference over Multi-Word Predicates
Lexical Inference over Multi-Word Predicates
 
dont_count_predict_in_acl2014
dont_count_predict_in_acl2014dont_count_predict_in_acl2014
dont_count_predict_in_acl2014
 

Kürzlich hochgeladen

Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 

Kürzlich hochgeladen (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 

Harnessing Deep Neural Networks with Logic Rules

  • 1. Harnessing Deep Neural Networks with Logic Rules Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric P. Xing ACL2016 読む⼈人:東北北⼤大学,⾼高瀬翔 116/09/12 第8回最先端NLP勉強会 スライド中の図,表は [Hu+ 16] から引⽤用
  • 2. ⽬目的 •  ⼀一般的な規則や⼈人の直観をニューラル ネットに導⼊入したい – 評判分析において,A but B という⽂文のポジ/ ネガは B と⼀一致する – 固有表現抽出において,B-PERの後にI-ORG はありえない •  規則や直観を⼀一階述語論論理理で表現 – equal(yi-1, B-PER) → ¬ equal(yi, I-ORG) •  論論理理規則を制約として学習に⽤用いる 216/09/12 第8回最先端NLP勉強会
  • 3. ⼿手法の概要 •  pθ(y|x):モデル(任意のニューラルネット,例例えばCNN)の出⼒力力 •  q(y|x):制約を満たした上でのモデルの出⼒力力 •  q(y|x)の計算とパラメータの学習(θの更更新)を交互に⾏行行う –  Posterior regularization [Ganchev+ 10] をニューラルネットで⾏行行う 316/09/12 第8回最先端NLP勉強会
  • 4. ⼿手法の概要 •  pθ(y|x):モデル(任意のニューラルネット,例例えばCNN)の出⼒力力 •  q(y|x):制約を満たした上でのモデルの出⼒力力 •  q(y|x)の計算とパラメータの学習(θの更更新)を交互に⾏行行う –  Posterior regularization [Ganchev+ 10] をニューラルネットで⾏行行う 4 説明1 θの更更新 説明2 q(y|x) 説明3 16/09/12 第8回最先端NLP勉強会
  • 5. パラメータの学習 •  達成したいこと 1,訓練事例例で正しいラベルを出⼒力力できる 2,制約を満たしたモデル(q(y|x))に似た出⼒力力 •  ⽬目的関数 –  1と2をπで重み調整し,⾜足す –  lossは損失関数(今回は交差エントロピー) 5 min 1 N NX n=1 (1 ⇡)loss(yn, ✓(xn)) + ⇡loss(q(y|xn), ✓(xn)) 1に相当 2に相当 xnについて,モデルの予測値 16/09/12 第8回最先端NLP勉強会
  • 6. •  達成したいこと 1,制約(論論理理規則)を満たす 2,モデルの出⼒力力(pθ(y|x))に近い値となる •  ⽬目的関数 expectation operator. That is, for each rule (indexed by l) and each of its groundings (indexed by g) on (X, Y ), we expect Eq(Y |X)[rlg(X, Y )] = 1, with confidence l. The constraints define a rule- regularized space of all valid distributions. For the second property, we measure the closeness between q and p✓ with KL-divergence, and wish to minimize it. Combining the two factors together and further allowing slackness for the constraints, we finally get the following optimization problem: min q,⇠ 0 KL(q(Y |X)kp✓(Y |X)) + C X l,gl ⇠l,gl s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl gl = 1, . . . , Gl, l = 1, . . . , L, (3) where ⇠l,gl 0 is the slack variable for respec- tive logic constraint; and C is the regularization parameter. The problem can be seen as project- forward straints the bas sentime straints gram de task (se ming fo constra proxima samples the con the rele ference when a the soft forward 2に相当 q(y|x)の計算(1/2) 6 1に相当 ξで制約を緩和 0から1までの連続値 事例例glが規則rlを満たすとき1 規則rlの強さ (λが⼤大きい =満たすべき規則) 16/09/12 第8回最先端NLP勉強会
  • 7. •  q(y|x)は解析的に解ける(ラグランジュ双対問題) –  Posterior regularization [Ganchev+ 10] と同様の解き⽅方   •  制約の強さはC(定数)と  λ(規則ごとの値)で決定 •  規則を満たしていない場合,q(y|x)は⼩小さくなる regularized space of all valid distributions. For the second property, we measure the closeness between q and p✓ with KL-divergence, and wish to minimize it. Combining the two factors together and further allowing slackness for the constraints, we finally get the following optimization problem: min q,⇠ 0 KL(q(Y |X)kp✓(Y |X)) + C X l,gl ⇠l,gl s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl gl = 1, . . . , Gl, l = 1, . . . , L, (3) where ⇠l,gl 0 is the slack variable for respec- tive logic constraint; and C is the regularization parameter. The problem can be seen as project- ing p✓ into the constrained subspace. The problem is convex and can be efficiently solved in its dual form with closed-form solutions. We provide the detailed derivation in the supplementary materials sentime straints gram de task (se ming fo constrai proxima samples the con the rele ference when a the soft forward tributio calculat p v.s. q q(y|x)の計算(2/2) 7 q,⇠ 0 l,gl l s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl gl = 1, . . . , Gl, l = 1, . . . , L, (3) where ⇠l,gl 0 is the slack variable for respec- tive logic constraint; and C is the regularization parameter. The problem can be seen as project- ing p✓ into the constrained subspace. The problem is convex and can be efficiently solved in its dual form with closed-form solutions. We provide the detailed derivation in the supplementary materials and directly give the solution here: q⇤ (Y |X) / p✓(Y |X) exp 8 < : X l,gl C l(1 rl,gl (X, Y )) 9 = ; (4) Intuitively, a strong rule with large l will lead to low probabilities of predictions that fail to meet the constraints spa the relevant instan ference (and rando when a group is to the soft prediction forward pass is re tribution p✓(y|x) calculating the trut p v.s. q at Test T either the distilled network q after a fi sults show that bot over the base netwo label instances. In p. Particularly, q i rules introduce add 2413 16/09/12 第8回最先端NLP勉強会
  • 8. 規則について •  ⼀一階述語論論理理で表現 –  A but B という⽂文のポジ/ネガは B と⼀一致 •  Probabilistic soft logicの枠組みで0から1の連続値 に変換 –  論論理理演算⼦子は 8 lg g=1 is typically relevant to only a single or subset of examples, though here we give the most general form on the entire set. We encode the FOL rules using soft logic (Bach et al., 2015) for flexible encoding and stable opti- mization. Specifically, soft logic allows continu- ous truth values from the interval [0, 1] instead of {0, 1}, and the Boolean logic operators are refor- mulated as: A&B = max{A + B 1, 0} A _ B = min{A + B, 1} A1 ^ · · · ^ AN = X i Ai/N ¬A = 1 A (1) Here & and ^ are two different approximations vector tion pa of the t A si other s et al., cess is p✓(y|x which to hum system by pro (i.e., th ence f teache trained is a classification gnition which is a describe the base not focusing on we largely use the evious successful the linguistically- junction word “but” is one of the strong indicators for such sentiment changes in a sentence, where the sentiment of clauses following “but” generally dominates. We thus consider sentences S with an “A-but-B” structure, and expect the sentiment of the whole sentence to be consistent with the sentiment of clause B. The logic rule is written as: has-‘A-but-B’-structure(S) ) (1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) , (5) 2414 16/09/12 第8回最先端NLP勉強会
  • 9. 規則の計算例例 •  A but B という⽂文のポジ/ネガは  B と⼀一致 •  ⽂文がポジティブ: •  ⽂文がネガティブ: 9 ich is a classification ecognition which is a efly describe the base are not focusing on s, we largely use the o previous successful ign the linguistically- ted. junction word “but” is one of the strong indicators for such sentiment changes in a sentence, where the sentiment of clauses following “but” generally dominates. We thus consider sentences S with an “A-but-B” structure, and expect the sentiment of the whole sentence to be consistent with the sentiment of clause B. The logic rule is written as: has-‘A-but-B’-structure(S) ) (1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) , (5) 2414 ⽂文Sが  A but B という構造を持つ ⽂文がポジティブのとき1 そうでないとき0 B部分がポジティブと モデルが予測した確率率率 (1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) , (¬1(y = +) _ ✓(B)+ ^ ¬ ✓(B)+ _ 1(y = +)) , (1 1(y = +) _ ✓(B)+ ^ 1 ✓(B)+ _ 1(y = +)) , (min{1 1(y = +) + ✓(B)+, 1} ^ min{1 ✓(B)+ + 1(y = +), 1}) (1 + ✓(B)+)/2 (2 ✓(B)+)/2 16/09/12 第8回最先端NLP勉強会
  • 11. 実験設定(評判分析) •  ポジ/ネガの⼆二値分類タスク •  データセット: –  Stanford Sentiment Treebank(SST2) –  Movie Review(MR) –  Customer Review(CR) •  ベースライン:(単純な)CNN –  [Kim+ 14]  と同じ⼿手法 •  適⽤用する制約(規則) –  A but B という⽂文のポジ/ネガは  B と⼀一致 •  重要性  λ = 1    11 Algorithm 1 Harnessing NN with Rules Input: The training data D = {(xn, yn)}N n=1, The rule set R = {(Rl, l)}L l=1, Parameters: ⇡ – imitation parameter C – regularization strength 1: Initialize neural network parameter ✓ 2: repeat 3: Sample a minibatch (X, Y ) ⇢ D 4: Construct teacher network q with Eq.(4) 5: Transfer knowledge into p✓ by updating ✓ with Eq.(2) 6: until convergence Output: Distill student network p✓ and teacher network q ning over multiple examples), requiring joint infer- ence. In contrast, as mentioned above, p is more lightweight and efficient, and useful when rule eval- uation is expensive or impossible at prediction time. Our experiments compare the performance of p and q extensively. Imitation Strength ⇡ The imitation parameter ⇡ in Eq.(2) balances between emulating the teacher soft predictions and predicting the true hard la- bels. Since the teacher network is constructed from p✓, which, at the beginning of training, would pro- duce low-quality predictions, we thus favor pre- I like this book store a lot PaddingPadding Word Embedding Convolution Max Pooling Sentence Representation Figure 2: The CNN architecture for sentence-level sentiment analysis. The sentence representation vector is followed by a fully-connected layer with softmax output activation, to output sentiment pre- dictions. 4.1 Sentiment Classification Sentence-level sentiment analysis is to identify the sentiment (e.g., positive or negative) underlying an individual sentence. The task is crucial for many opinion mining applications. One challeng- ing point of the task is to capture the contrastive of neural networks t users are allowed ntentions through c. In this section ur approach by ap- work architectures, recurrent network, ons, i.e., sentence- is a classification gnition which is a describe the base not focusing on we largely use the evious successful the linguistically- windows. Multiple filters with varying window sizes are used to obtain multiple features. Figure 2 shows the network architecture. Logic Rules One difficulty for the plain neural network is to identify contrastive sense in order to capture the dominant sentiment precisely. The con- junction word “but” is one of the strong indicators for such sentiment changes in a sentence, where the sentiment of clauses following “but” generally dominates. We thus consider sentences S with an “A-but-B” structure, and expect the sentiment of the whole sentence to be consistent with the sentiment of clause B. The logic rule is written as: has-‘A-but-B’-structure(S) ) (1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) , (5) 241416/09/12 第8回最先端NLP勉強会
  • 12. 実験結果(評判分析)(1/3) •  ベースライン [Kim+ 14] から性能が向上 •  MR,CRでstate-of-the-art •  MVCNN(複数の単語ベクトル利利⽤用 + 複雑なCNN (マルチチャンネル,多層))と同等の性能 12 Model SST2 MR CR 1 CNN (Kim, 2014) 87.2 81.3±0.1 84.3±0.2 2 CNN-Rule-p 88.8 81.6±0.1 85.0±0.3 3 CNN-Rule-q 89.3 81.7±0.1 85.3±0.3 4 MGNC-CNN (Zhang et al., 2016) 88.4 – – 5 MVCNN (Yin and Schutze, 2015) 89.4 – – 6 CNN-multichannel (Kim, 2014) 88.1 81.1 85.0 7 Paragraph-Vec (Le and Mikolov, 2014) 87.8 – – 8 CRF-PR (Yang and Cardie, 2014) – – 82.7 9 RNTN (Socher et al., 2013) 85.4 – – 10 G-Dropout (Wang and Manning, 2013) – 79.0 82.1 Table 1: Accuracy (%) of Sentiment Classification. Row 1, CNN (Kim, 2014) is the base network corresponding to the “CNN-non-static” model in (Kim, 2014). Rows 2-3 are the networks enhanced by our framework: CNN-Rule-p is the student network and CNN-Rule-q is the teacher network. For MR and CR, we report the average accuracy±one standard deviation using 10-fold cross validation. the base networks, we obtain substantial improve- ments on both tasks and achieve state-of-the-art or comparable results to previous best-performing systems. Comparison with a diverse set of other or positive sentiment. 3) CR (Hu and Liu, 2004), customer reviews of various products, containing 2 classes and 3,775 instances. For MR and CR, we use 10-fold cross validation as in previous work. In 16/09/12 第8回最先端NLP勉強会
  • 13. 実験結果(評判分析)(2/3) •  pθ(y|x) と  q(y|x) を交互に計算する必要性 –  どちらか⼀一⽅方を最適化,⽚片⽅方の最適化後にもう⼀一⽅方 を求める,などでの性能は?    13 Model Accuracy (%) 1 CNN (Kim, 2014) 87.2 2 -but-clause 87.3 3 -`2-reg 87.5 4 -project 87.9 5 -opt-project 88.3 6 -pipeline 87.9 7 -Rule-p 88.8 8 -Rule-q 89.3 Table 2: Performance of different rule integration 1 2 3 4 5 6 Table 3 of labe header CNNを学習→q(y|x)の計算 q(y|x)を最適化 q(y|x)を最適化→CNNの学習 交互に計算したほうが 性能が良良い 16/09/12 第8回最先端NLP勉強会
  • 14. 実験結果(評判分析)(3/3) •  データ量量に対する効果とラベルなしデータの利利⽤用 •  ラベルなしデータの利利⽤用で性能向上 –  制約によってラベルなしデータを上⼿手く使える 14 (%) integration Data size 5% 10% 30% 100% 1 CNN 79.9 81.6 83.6 87.2 2 -Rule-p 81.5 83.2 84.5 88.8 3 -Rule-q 82.5 83.9 85.6 89.3 4 -semi-PR 81.5 83.1 84.6 – 5 -semi-Rule-p 81.7 83.3 84.7 – 6 -semi-Rule-q 82.7 84.2 85.7 – Table 3: Accuracy (%) on SST2 with varying sizes of labeled data and semi-supervised learning. The header row is the percentage of labeled examples ◯%のラベル付き データを⽤用いて 学習 ◯%のラベル付き データと(100 - ◯)%の ラベルなしデータを ⽤用いて学習 16/09/12 第8回最先端NLP勉強会
  • 15. 実験設定(固有表現抽出) •  4種の固有表現(PER,ORG,LOC,Misc)の認識識タスク •  データセット:CoNLL-2003 データセット –  BIOESタグを採⽤用([Lample+ 16] などと同じ) •  ベースライン:双⽅方向LSTM –  [Chiu and Nichols, 15] からCNNを除去 •  適⽤用する制約(規則) –  出⼒力力タグの並びが破綻していない •  重要性  λ = ∞(強い制約) –  リスト形式の場合,同種のタグとなる •  1.  Juventus, 2. Barcelona, 3. …で  Juventus と Barcelona のタグは同種 •  重要性  λ = 1    1516/09/12 第8回最先端NLP勉強会 where 1(·) is an indicator function that takes 1 when its argument is true, and 0 otherwise; class ‘+’ represents ‘positive’; and ✓(B)+ is the element of ✓(B) for class ’+’. By Eq.(1), when S has the ‘A- but-B’ structure, the truth value of the above logic rule equals to (1 + ✓(B)+)/2 when y = +, and (2 ✓(B)+)/2 otherwise 1. Note that here we assume two-way classification (i.e., positive and negative), though it is straightforward to design rules for finer grained sentiment classification. 4.2 Named Entity Recognition NER is to locate and classify elements in text into entity categories such as “persons” and “organiza- tions”. It is an essential first step for downstream language understanding applications. The task as- signs to each word a named entity tag in an “X-Y” format where X is one of BIEOS (Beginning, In- side, End, Outside, and Singleton) and Y is the entity category. A valid tag sequence has to follow Char+Word Representation Backward LSTM Forward LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM Output Representation NYC locates in USA Figure 3: The architecture of the bidirectional LSTM recurrent network for NER. The CNN for extracting character representation is omitted. The confidence levels are set to 1 to prevent any violation. We further leverage the list structures within and などequal(yi 1, B PER) ) ¬ equal(yi, I ORG) s. The task as- g in an “X-Y” Beginning, In- and Y is the e has to follow of the tagging es (e.g., lists) y expose some has a similar LSTM recur- ) proposed in which has out- models. The word vectors l information, hen fed into a s for sequence hols, 2015) we The confidence levels are set to 1 to prevent any violation. We further leverage the list structures within and across sentences of the same documents. Specifi- cally, named entities at corresponding positions in a list are likely to be in the same categories. For instance, in “1. Juventus, 2. Barcelona, 3. ...” we know “Barcelona” must be an organization rather than a location, since its counterpart entity “Juven- tus” is an organization. We describe our simple procedure for identifying lists and counterparts in the supplementary materials. The logic rule is en- coded as: is-counterpart(X, A) ) 1 kc(ey) c( ✓(A))k2, (7) where ey is the one-hot encoding of y (the class pre- diction of X); c(·) collapses the probability mass
  • 16. 実験結果(固有表現抽出) •  ベースラインから性能が向上 •  膨⼤大な外部資源を使った⼿手法 [Luo+ 15] やパラメータの多い ニューラルネット [Ma and Hovy, 16] と同等の性能 16/09/12 第8回最先端NLP勉強会 16 Model F1 1 BLSTM 89.55 2 BLSTM-Rule-trans p: 89.80, q: 91.11 3 BLSTM-Rules p: 89.93, q: 91.18 4 NN-lex (Collobert et al., 2011) 89.59 5 S-LSTM (Lample et al., 2016) 90.33 6 BLSTM-lex (Chiu and Nichols, 2015) 90.77 7 BLSTM-CRF1 (Lample et al., 2016) 90.94 8 Joint-NER-EL (Luo et al., 2015) 91.20 9 BLSTM-CRF2 (Ma and Hovy, 2016) 91.21 Table 4: Performance of NER on CoNLL-2003. Row 2, BLSTM-Rule-trans imposes the transition rules (Eq.(6)) on the base BLSTM. Row 3, BLSTM- Rules further incorporates the list rule (Eq.(7)). We report the performance of both the student model p NER extra as w joint tured 6 D We h deep to al tions pose fers the w リストに関する制約なし リストに関する制約あり
  • 17. まとめ •  ⼀一般的な規則や⼈人の直観をニューラル ネットに導⼊入する⼿手法を提案 – 規則を⼀一階述語論論理理で表現 – Probabilistic soft logicで0から1の連続値に – 制約として学習 •  評判分析,固有表現抽出で実験 – 制約の導⼊入で性能が向上 – 複雑なネットワークなどと同等の性能 1716/09/12 第8回最先端NLP勉強会