ACL読み会2014@PFI "Less Grammar, More Features"

Less
Grammar,
More
Features
David
Hall,
Greg
Durre6
and
Dan
Klein
@
Berkeley
能地
宏
(@nozyh)
NII

この論文の主張
‣ 低レイヤー
NLP
タスクの曖昧性を解消するには、単語の表層から
の素性があれば十分
評判分析
Recursive Deep Models for Semantic Compositionality
Over a Sentiment Treebank
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang,
Christopher D. Manning, Andrew Y. Ng and Christopher Potts
Stanford University, Stanford, CA 94305, USA
richard@socher.org,{aperelyg,jcchuang,ang}@cs.stanford.edu
{jeaneis,manning,cgpotts}@stanford.edu
Abstract
Semantic word spaces have been very use-
ful but cannot express the meaning of longer
phrases in a principled way. Further progress
towards understanding compositionality in
tasks such as sentiment detection requires
richer supervised training and evaluation re-
sources and more powerful models of com-
position. To remedy this, we introduce a
Sentiment Treebank. It includes ﬁne grained
sentiment labels for 215,154 phrases in the
parse trees of 11,855 sentences and presents
new challenges for sentiment composition-
ality. To address them, we introduce the
Recursive Neural Tensor Network. When
–
0
0
This
0
ﬁlm
–
–
–
0
does
0
n’t
0
+
care
+
0
about
+
+
+
+
+
cleverness
0
,
0
wit
0
or
+
0
0
any
0
0
other
+
kind
+
0
of
+
+
intelligent
+ +
humor
0
.
Figure 1: Example of the Recursive Neural Tensor Net-
work accurately predicting 5 sentiment classes, very neg-
ative to very positive (– –, –, 0, +, + +), at every node of a
parse tree and capturing the negation and its scope in this
Sochar
et
al.’13
Deep
Learning
以上の性能
構文解析
多くの言語で
Berkeley
parser
以上

句構造構文解析
‣ 文の背後にある木構造を推定する
-‐ あらゆる上位レイヤーの処理でボトルネック？
-‐ 目的は曖昧性の解消

目的は曖昧性の解消
He

eats

sushi

with

chops.cks
N
S
PPNP
V
VP
VP
P
NP
He

eats

sushi

with

chops.cks
N
S
PPNP
V
NP
VP
P
NP

目的は曖昧性の解消
‣ どちらの構造も、文法的には正しい
-‐ 人間の解釈は左側なので、左側の構造を推定することが目的
He

eats

sushi

with

chops,cks
N
S
PPNP
V
VP
VP
P
NP
He

eats

sushi

with

chops,cks
N
S
PPNP
V
NP
VP
P
NP

He

eats

sushi

with

chops,cks
N
S
PPNP
V
VP
VP
P
NP
He

eats

sushi

with

chops,cks
N
S
PPNP
V
NP
VP
P
NP
Naive
PCFG
では性能が低い
VP
-‐>
V
NP 0.2
NP
-‐>
NP
PP 0.15
VP
-‐>
VP
PP 0.1
0.1
×
0.2
=
0.02 0.2
×
0.15
=
0.03
PCFG
は曖昧性の解消には不十分
F1-‐Score:
72.1 Treebank
から確率値を推定

He

eats

sushi

with

chops,cks
S
PP
V
VP
N
NP
VP
P
NP
Head
lexicalizaWon
Eisner’96;
Collins’97
[I]
[eat]
[I]
[sushi]
[sushi]
[eat]
[eat]
[with]
[eat]
•
葉ノードの情報を伝播させる
•
(eats,
with)
の関係を捉えられる
•
ルールの数が膨大
•
多言語への拡張性
(headの情報に依存)
欠点：

Latent
annotaWon
(state
spliang)
Matsuzaki
et
al.’05;
Petrov
et
al.’06
He

eats

sushi

with

chops,cks
S
PP-‐1
V-‐1
VP-‐2
N-‐3
NP-‐4
VP-‐3
P-‐1
NP-‐2
•
各ノードに存在する隠れ状態を推定
•
現在の
Berkeley
Parser
の実装;

F1-‐score:
90.2

これまでの手法のまとめ
‣ これまでの手法は基本的に、
CFG
のルールを増やすことで、
大域的な情報を取り出してきた
‣ lexicalizaWon:
部分木に
head
の情報を付与する
-‐ shic-‐reduce
系の手法も当てはまる

Zhang
and
Clark’09;
Zhu
et
al.’13
‣ ノードに粗い情報を付与する
-‐ 言語学的な分析に基づく

Klein
and
Manning’03
(Stanford
parser)
-‐ 隠れ変数として
EM
で推定

Petrov
et
al.’06
(Berkeley
parser)
VP
[eat]
VP
[eat] PP
[with]
VP
^S
VP PP
^VP
VP-‐3
VP-‐2 PP-‐1

本研究のアプローチ
‣ アノテーションを最低限にした状態で、構文解析の精度をあげる
ことは果たして可能か？
-‐ 曖昧性の解消を行う際、ノードに情報を付与することは本当に必要なのか
‣ モチベーション
-‐ lexicalized
parser
は
head
の情報が必要だが、言語によっては
head
の情報
が利用できないことがある（リソース不足により）
-‐ Berkeley
parser
は、単語の表層の情報をあまり使わない
-‐ morphological
rich
language
に弱い（チューニングが必要）
-‐ 実験によって、本手法が多言語の解析により有効であることを示す

本研究のアプローチ
‣ 曖昧性の解消の多くは、ルールの貼るスパンの回りの表層を見る
ので十分なのではないか？
He

eats

sushi

with

chops,cks
N
S
PPNP
V
NP
VP
P
NP
He

eats

sushi

with

chops,cks
N
S
PPNP
V
VP
VP
P
NP
[FIRSTWORD=eats
×
RULE=VP→V
PP]
[SPANLENGTH=5
×
RULE=VP→V
PP]
[LASTWORD=chop..
×
RULE=VP→V
PP] [LASTWORD=chop..
×
RULE=VP→V
NP]
[SPANLENGTH=5
×
RULE=VP→V
NP]
[FIRSTWORD=eats
×
RULE=VP→V
NP]
負の重みが学習されて欲しい

Result
Overview
n  40
0.1
0.5
0.2
0.9
0.3
reebank develop-
40, for different
d on top of the X-
y span feature is
ules and rule par-
nchored rule pro-
ing an annotation
does is reﬁne the
Test  40 Test all
Berkeley 90.6 90.1
This work 89.9 89.2
Table 3: Final Parseval results for the v = 1, h = 0
parser on Section 23 of the Penn Treebank.
5.2 Lexical Annotation
Another commonly-used kind of structural an-
notation is lexicalization (Eisner, 1996; Collins,
1997; Charniak, 1997). By annotating grammar
nonterminals with their headwords, the idea is to
better model phenomena that depend heavily on
the semantics of the words involved, such as coor-
dination and PP attachment.
Table 2 shows results from lexicalizing the X-
Arabic Basque French German Hebrew Hungarian Korean Polish Swedish Avg
Dev, all lengths
Berkeley 78.24 69.17 79.74 81.74 87.83 83.90 70.97 84.11 74.50 78.91
Berkeley-Rep 78.70 84.33 79.68 82.74 89.55 89.08 82.84 87.12 75.52 83.28
Our work 78.89 83.74 79.40 83.28 88.06 87.44 81.85 91.10 75.95 83.30
Test, all lengths
Berkeley 79.19 70.50 80.38 78.30 86.96 81.62 71.42 79.23 79.18 78.53
Berkeley-Tags 78.66 74.74 79.76 78.28 85.42 85.22 78.56 86.75 80.64 80.89
Our work 78.75 83.39 79.70 78.43 87.18 88.25 80.18 90.66 82.00 83.17
Table 4: Results for the nine treebanks in the SPMRL 2013 Shared Task; all values are F-scores for
sentences of all lengths using the version of evalb distributed with the shared task. Berkeley-Rep is
the best single parser from (Bj¨orkelund et al., 2013); we only compare to this parser on the development
Berkeley-‐Rep:
Berkeley
parser
で、低頻度語を言語毎にチューニング
した素性表現で置き換える
多言語データ：
SPMPL
2013
Shared
Task

モデル：CRF
Parsing
Finkel
et
al.’07
ng comes
tput be a
es a min-
ure a ba-
but relies
ive accu-
e the fea-
all back-
, such as
ured by a
, are nat-
on. The
s are ade-
on, which
e reflexes
will often
wer seems
Finkel et al. (2008) and Petrov and Klein (2008a).
Formally, we define the probability of a tree T
conditioned on a sentence w as
p(T|w) / exp ✓|
X
r2T
f(r, w)
!
(1)
where the feature domains r range over the (an-
chored) rules used in the tree. An anchored rule
r is the conjunction of an unanchored grammar
rule rule(r) and the start, stop, and split indexes
where that rule is anchored, which we refer to as
span(r). It is important to note that the richness of
the backbone grammar is reflected in the structure
of the trees T, while the features that condition di-
rectly on the input enter the equation through the
anchoring span(r). To optimize model parame-
ters, we use the Adagrad algorithm of Duchi et al.
I

eat

sushi

with

chops.cks
S
PP
V
VP
N
NP
VP
P
NP
Inside-‐Outsideで周辺確率を計算
AdaGrad
+
L2
（オンライン学習）

素性の抽出
averted financial disaster
VP
NPVBD
JJ NN
PARENT = VP
FIRSTWORD = averted
LENGTH = 3
RULE = VP → VBD NP
PARENT = VP
Span properties
Rule backoffs
Features
...
5 6 7 8
... LASTWORD = disaster
FIRSTWORD = averted
LASTWORD = disaster PARENT = VP
FIRSTWORD = averted RULE = VP → VBD NP
Figure 1: Features computed over the application
of the rule VP ! VBD NP over the anchored
span averted financial disaster with the shown in-
for parsing – if nothing else, parsing comes
a structural requirement that the output be a
-formed, nested tree. Our parser uses a min-
PCFG backbone grammar to ensure a ba-
evel of structural well-formedness, but relies
ly on features of surface spans to drive accu-
Formally, our model is a CRF where the fea-
factor over anchored rules of a small back-
grammar, as shown in Figure 1.
ome aspects of the parsing problem, such as
ree constraint, are clearly best captured by a
G. Others, such as heaviness effects, are nat-
y captured using surface information. The
question is whether surface features are ade-
e for key effects like subcategorization, which
deep definitions but regular surface reflexes
the preposition selected by a verb will often
rly follow it). Empirically, the answer seems
yes, and our system produces strong results,
up to 90.5 F1 on English parsing. Our parser
so able to generalize well across languages
Finkel et al. (2008) and Petrov and Klein (
Formally, we define the probability of a
conditioned on a sentence w as
p(T|w) / exp ✓|
X
r2T
f(r, w)
!
where the feature domains r range over t
chored) rules used in the tree. An anchor
r is the conjunction of an unanchored gr
rule rule(r) and the start, stop, and split
where that rule is anchored, which we ref
span(r). It is important to note that the rich
the backbone grammar is reflected in the st
of the trees T, while the features that condi
rectly on the input enter the equation thro
anchoring span(r). To optimize model p
ters, we use the Adagrad algorithm of Duc
(2010) with L2 regularization.
We start with a simple X-bar grammar
only symbols are NP, NP-bar, VP, and so o
base model has no surface features: form
0
0
1
0
…
0
1
0
1
10.3
-‐1.2
3.2
0.01
…
0.3
0.1
-‐20.1
10.1
内積でスコア計算
PCFG
のルール確率に対応

CKY
チャートのスコアに

どのような素性が有効か
Features Section F1
RULE 4 73.0
+ SPAN FIRST WORD + SPAN LAST WORD + LENGTH 4.1 85.0
+ WORD BEFORE SPAN + WORD AFTER SPAN 4.2 89.0
+ WORD BEFORE SPLIT + WORD AFTER SPLIT 4.3 89.7
+ SPAN SHAPE 4.4 89.9
1: Results for the Penn Treebank development set, reported in F1 on sentences of length
ction 22, for a number of incrementally growing feature sets. We show that each feature
ted in Section 4 adds beneﬁt over the previous, and in combination they produce a reaso
yet simple parser.
atures are bucketed together. During train-
ere are no collisions between positive fea-
which generally receive positive weight, and
ﬁxes of the current word up to length 5, regar
of frequency.
Subsequent lines in Table 1 indicate addi
長さ40以下、WSJ
Sec.
22

(development)
ほとんどの意味は直感的に分かる
以下、具体例でどのような文に役立つか説明

Word
before/acer
span
no read messages in his inbox
VP
VBP NNS
VP → no VBP NNS
gure 2: An example showing the utility of span
ntext. The ambiguity about whether read is an
jective or a verb is resolved when we construct
VP and notice that the word proceeding it is un-
ely.
NP → (NP ... impact) PP)
( CEO of Enron )
PRN
(XxX)
Figure 4: Computation o
two examples. Parenthe
punctuation-heavy, short
being explicitly modeled
stance of this feature tem
that is more likely to tak
no

read

messages

in
...
JJ NNS
NP
read
の品詞は
VBP
か
JJ
か？
read
messages
を張るルールを決める際、VP
の前に
no
は来ない、
という情報が手がかりになる（負の重みが学習されてほしい）

Word
before/acer
split
adjective or a verb is resolved when we construct
a VP and notice that the word proceeding it is un-
likely.
has an impact on the market
PPNP
NP
NP → (NP ... impact) PP)
Figure 3: An example showing split point features
disambiguating a PP attachment. Because impact
is likely to take a PP, the monolexical indicator
feature that conjoins impact with the appropriate
rule will help us parse this example correctly.
lengths 1, 2, 3, 4, 5, 10, 20, and 21 words.
punctuation
being expli
stance of th
that is mor
and so we
and encour
generally u
attachment
of the noun
example, c
indicator o
diately afte
tures with i
with a rule
split point.
4.4 Span
We add on
PP
a6achment
impact
は修飾を受けやすい名詞

大きい重みが学習されて欲しい
各句の
head
は、前後両端のどちらかに来やすいという情報を利用
（多くの言語で成り立つ；日本語の文節の
head
は右端）

Span
shape
box
e utility of span
ether read is an
en we construct
ceeding it is un-
( CEO of Enron )
PRN
(XxX)
said , “ Too bad , ”
VP
x,“Xx,”
Figure 4: Computation of span shape features on
two examples. Parentheticals, quotes, and other
punctuation-heavy, short constituents beneﬁt from
being explicitly modeled by a descriptor like this.
stance of this feature template. impact is a noun
that is more likely to take a PP than other nouns,
and so we expect this feature to have high weight
先頭の大文字、括弧を抽出する
（英語の場合）named
enWty
の判別、括弧の一致など

Less
Grammar
の意味について
‣ 言語学を捨てて機械学習だけで問題が解決できる、ということ
ではない
-‐ ここでの
Grammar
は、用いる
CFG
ルールのサイズのこと
-‐ 本論文の主張は、表層から意味のある素性を抽出すれば、小さな文法
でも十分である、というもの
-‐ 用いている機械学習はシンプル（CRF
+
SGD）
‣ 設計に必要な言語学的知識は、既存手法のほうが少ない？
-‐ Berkeley
parser:
確率モデルで
EM
で
spliang
（全自動）
-‐ shic-‐reduce:
突っ込める素性はとにかく突っ込む

余談：この研究の方向性
‣ EMNLP
2013
の共参照の論文と方向性が同じに見える
-‐ 共参照解析は、menWon
間の表層から取り出した素性のみに基づく
識別モデルを用いることで、最高精度を達成できる
（WordNet
等の外部知識は必要ではない）
-‐ Berkeley
coreference
はツール公開中で、Stanford
より高い精度（のはず）
Easy
Victories
and
Uphill
Ba6les
in
Coreference
ResoluWon
Greg
Durre6
and
Dan
Klein
(Berkeley)
[Barack$Obama]1$met$with$[David$Cameron]2$.$[He]1$said$...
[with$X%−%.%Y]
[with$X%−%Y%said]
...
Centering
with%[X]%. .%[X]%said
NLP
の多くの解析タスクは、
単語の表層からうまく素性を
選べば高精度が達成できる

SenWment
analysis
‣ Mechanical
turk
を使って木構造の上に5段階のラベルを付与した
‣ Neural
net
で既存手法より良いことを示した
(去年の
EMNLP)
ing,cgpotts}@stanford.edu
-
r
s
n
s
-
-
a
d
e
s
-
e
–
0
0
This
0
ﬁlm
–
–
–
0
does
0
n’t
0
+
care
+
0
about
+
+
+
+
+
cleverness
0
,
0
wit
0
or
+
0
0
any
0
0
other
+
kind
+
0
of
+
+
intelligent
+ +
humor
0
.
Figure 1: Example of the Recursive Neural Tensor Net-
work accurately predicting 5 sentiment classes, very neg-
ative to very positive (– –, –, 0, +, + +), at every node of a
parse tree and capturing the negation and its scope in this
Sochar
et
al.’13

本研究の手法がそのまま適応できる
‣ 木構造が与えられた上で、各スパンを5段階に分類
-‐ 構造を固定して
Inside-‐Outside,
CKY
を走らせる
While “ Gangs ” is never lethargic , it is hindered by its plot .
4 1
2
2 → (4 While...) 1
Figure 5: An example of a sentence from the Stan-
ford Sentiment Treebank which shows the utility
of our span features for this task. The presence
7.1 Ada
Our parse
parser tha
the treeb
with the
with very
terminals
fective an
are not us
One s
analysis a
スパンの先頭の語が論理関係であることが多い;
but
など

Neural
net
よりも高い性能
Root All Spans
Non-neutral Dev (872 trees)
Stanford CoreNLP current 50.7 80.8
This work 53.1 80.5
Non-neutral Test (1821 trees)
Stanford CoreNLP current 49.1 80.2
Stanford EMNLP 2013 45.7 80.7
This work 49.6 80.4
Table 5: Fine-grained sentiment analysis results
on the Stanford Sentiment Treebank of Socher et
al. (2013). We compare against the printed num-
bers in Socher et al. (2013) as well as the per-
formance of the corresponding release, namely
the sentiment component in the latest version of
the Stanford CoreNLP at the time of this writ-
References
Anders Bj¨orke
Thomas M
(Re)ranking
Results from
ceedings of
ing of Morp
Rens Bod. 1
Stochastic
Conference
for Comput
Peter F Brow
Vincent J D
Class-based
Computatio
参考：今年の
ACL
で別の論文
Nal
Kalchbrenner,
Edward
GrefensteJe,
Phil
Blunsom:

A
ConvoluRonal
Neural
Network
for
Modelling
Sentences
Neural
net
で、木構造を仮定せずに、senWment
を分類する
Test
set
で、48.5
point
(Stanford
current
より少し低い)

まとめ
‣ 構文解析で精度を出すためには、ノードに情報を付与し、
ルールを増やすことが必要と信じられていていた
‣ ルールの数を最小にした構文解析
-‐ 言語/文法への依存性が小さい

多言語への拡張性が高い
-‐ 素性を少し変更することで、他のタスクにも適応できる
(SenWment)
‣ Parser
は公開中
(epic
parser)
‣ 得るべき教訓（？）
-‐ 単語の表層から得られる情報は
(やっぱり)
非常に強力
-‐ 意味のある素性を抽出できれば、複雑な手法に匹敵する精度を出せる
h6ps://github.com/dlwh/epic

ACL読み会2014@PFI "Less Grammar, More Features"

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie ACL読み会2014@PFI "Less Grammar, More Features"

Ähnlich wie ACL読み会2014@PFI "Less Grammar, More Features" (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ACL読み会2014@PFI "Less Grammar, More Features"