Weitere ähnliche Inhalte Mehr von Association for Computational Linguistics Mehr von Association for Computational Linguistics (20) Kürzlich hochgeladen Introduction to Research ,Need for research, Need for design of Experiments, ... Introduction to Research ,Need for research, Need for design of Experiments, ... Nigar Kadar Mujawar,Womens College of Pharmacy,Peth Vadgaon,Kolhapur,416112
Kürzlich hochgeladen (20) Integrating normalization improves parser performance on social media1. Rob van der Goot
@robvanderg
Abstract
Groningen
www.bitbucket.org/robvanderg/berkeleygraph
July 2017
Previous photos and videos
This work explores normalization for
parser adaptation. Traditionally, norm-
alization is used as separate pre-
processing step. We show that
integrating the normalization model
into the parsing algorithm is
beneficial. To this end, we use a
normalization model combined with
the parsing as intersection algorithm.
This way, multiple normalization
candidates can be leveraged, which
improves parsing performance on
social media. We test this hypothesis
by modifying the Berkeley parser; out-
of-the-box it reaches an F1 score of
66.52. Our integrated approach
performs significantly better, with an
F1 score of 67.36, while using the
best normalization sequence results
in an F1 score of only 66.94.
Back to top
Rob van der Goot @robvanderg·
45 14 43
Rob van der Goot Retweeted
Gertjan van Noord @GJ ·
34 56 132
Rob van der Goot @robvanderg·
Overview of the model:
27 74 141
Rob van der Goot @robvanderg·
45 97 161
Rob van der Goot @robvanderg · 3h
66.05 61.95
70.85 66.52
72.04 66.94
72.77 67.36*
74.98 71.80
1.1k 3.4k 7.5k
Tweets Tweets & replies MediaRob van der Goot
@robvanderg
TWEETS
513
FOLLOWING
673
FOLLOWERS
14,344
You may also like · Refresh
Yehoshua Bar-Hillel, Micha Perles
Jennifer Foster, Ozlem C, etinoglu oachim W
Chen Li and Yang Liu
Slav Petrov and Dan Klein
Worldwide Trends
#ParsingAsIntersection
33.9K Tweets
#ACL2017
152K Tweets
#normalization
35.1K Tweets
#NeuralNetworks
74.1K Tweets
#ConstituencyParsing
24.7K Tweets
©2017 Twitter About Help Center Terms
Privacy policy Cookies Ads info
Parser Adaptation for Social Media by Integrating Normalization
The output of the Berkeley parser on a noisy sentence and its
automatically normalized counterpart. #Interesting
That is interesting!, maybe we can use the parsing as intersection
algorithm to improve even further?
@GJ F1 scores on the development data when integrating multiple
candidates while normalizing ALL words or only the UNKnown words:
@GJ, it is! These are the F1 scores of our proposed models and previous
work on the test set, trained on the EWT and WSJ, tested on a small
Twitter treebank:
*#StatisticalSignificant against Berkeley parser at P<0.01 and at P<0.05
against the best normalization sequence using a paired t-test.
Stanford parser
Parser
Berkeley parser
Best norm. seq.
Integrated norm.
Gold POS tags
Dev Test
Rob van der Goot
@robvanderg
Jan 10
Jan 15
Jan 20
Jan 22
www.bitbucket.org/robvanderg/monoise
...
...
On formal properties of simple phra...
#hardtoparse: POS Tagging and pa...
Joint POS tagging and text nomaliz...
Improved inference for unlexicalized...
NP
NN
tomoroe
NN
comming
NN
pix
JJ
new
NP
VB
NP
NN
tomorrow
VBG
coming
NP
NNS
pix
JJ
new
Corpus Sents Words/ Unk%
sent
WSJ (2-21) 39,832 23.9 4.4
EWT 16,520 15.3 3.7
Foster et al. (2011) 269 11.1 9.3
Li and Liu (2014) 2,577 15.7 14.1
Table 1: Some basic statistics for our train-
ing and development corpora. % of unknown
words (Unk) calculated against the Aspell dic-
tionary ignoring capitalization.
↑
1 2 3 4 5 6 7 8 9
Number of normalization candidates used
70.5
71.0
71.5
72.0
72.5
F1-score
UNK
ALL
VAN
#WordEmbeddings
57.3K Tweets
0 1 2 3 3
new (1.0)
pix (0.6)
pics (0.3)
pictures (0.1)
comming (0.3)
coming (0.6)
common (0.1)
tomoroe (0.3)
tomorrow(0.5)
more (0.2)
MoNoise
NP
VB
NP
NN
tomorrow
VBG
coming
NP
NNS
pictures
JJ
new
Berkeley
Parser
0 1 2 3 3
new (1.0)
pix (0.6)
pics (0.3)
pictures (0.1)
comming (0.3)
coming (0.6)
common (0.1)
tomoroe (0.3)
tomorrow(0.5)
more (0.2)
new pix comming tomoroe
Figure 1: The output of the normalization mod-
el for the sentence `new pix comming tomoroe'.
@ r.van.der.goot@rug.nl
Rob van der Goot Retweeted
Gertjan van Noord @GJ ·
34 56 132
Jan 15
But Rob, is this #Significant?