SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
Language-independent Model for Machine Translation Evaluation with
Reinforced Factors
Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Liangye He
Yi Lu, Junwen Xing, and Xiaodong Zeng
Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory
Department of Computer and Information Science
University of Macau, Macau S.A.R., China
hanlifengaaron@gmail.com, {derekfw, lidiasc}@umac.mo
{wutianshui0515,takamachi660,nlp2ct.anson,nlp2ct.samuel}@gmail.com
Abstract
The conventional machine translation eval-
uation metrics tend to perform well on cer-
tain language pairs but weak on other lan-
guage pairs. Furthermore, some evalua-
tion metrics could only work on certain
language pairs not language-independent.
Finally, no considering of linguistic in-
formation usually leads the metrics re-
sult in low correlation with human judg-
ments while too many linguistic features
or external resources make the metric-
s complicated and difficult in replicabil-
ity. To address these problems, a nov-
el language-independent evaluation metric
is proposed in this work with enhanced
factors and optional linguistic information
(part-of-speech, n-grammar) but not very
much. To make the metric perform well on
different language pairs, extensive factors
are designed to reflect the translation qual-
ity and the assigned parameter weights are
tunable according to the special character-
istics of focused language pairs. Experi-
ments show that this novel evaluation met-
ric yields better performances compared
with several classic evaluation metrics (in-
cluding BLEU, TER and METEOR) and
two state-of-the-art ones including ROSE
and MPF.1
1 Introduction
The machine translation (MT) began as early as
in the 1950s (Weaver, 1955) and gained a big
1
The final publication is available at http://www.mt-
archive.info/
progress science the 1990s due to the develop-
ment of computers (storage capacity and compu-
tational power) and the enlarged bilingual corpora
(Marino et al., 2006), e.g. (Och, 2003) present-
ed MERT (Minimum Error Rate Training) for log-
linear statistical machine translation (SMT) mod-
els to achieve better translation quality, (Su et al.,
2009) used the Thematic Role Templates model to
improve the translation and (Xiong et al., 2011)
employed the maximum-entropy model etc. The
statistical MT (Koehn, 2010) became mainly ap-
proaches in MT literature. Due to the wide-spread
development of MT systems, the MT evaluation
becomes more and more important to tell us how
well the MT systems perform and whether they
make some progress. However, the MT evaluation
is difficult because some reasons, e.g. language
variability results in no single correct translation,
the natural languages are highly ambiguous and d-
ifferent languages do not always express the same
content in the same way (Arnold, 2003).
How to evaluate each MT system’s quality and
what should be the criteria have become the new
challenges in front of MT researchers. The earliest
human assessment methods include the intelligi-
bility (measuring how understandable the sentence
is) and fidelity (measuring how much information
the translated sentence retains compared to the o-
riginal) used by the Automatic Language Process-
ing Advisory Committee (ALPAC) around 1966
(Carroll, 1966), and the afterwards proposed ad-
equacy (similar as fidelity), fluency (whether the
sentence is well-formed and fluent) and compre-
hension (improved intelligibility) by Defense Ad-
vanced Research Projects Agency (DARPA) of US
(White et al., 1994). The manual evaluations suf-
fer the main disadvantage that it is time-consuming
and thus too expensive to do frequently.
The early automatic evaluation metrics include
the word error rate WER (Su et al., 1992) (edit dis-
tance between the system output and the closest
reference translation), position independent word
error rate PER (Tillmann et al., 1997) (variant of
WER that disregards word ordering), BLEU (Pap-
ineni et al., 2002) (the geometric mean of n-gram
precision by the system output with respect to
reference translations), NIST (Doddington, 2002)
(adding the information weight) and GTM (Turian
et al., 2003). Recently, many other methods were
proposed to revise or improve the previous works.
One of the categories is the lexical similarity
based metric. The metrics of this kind include
the edit distance based method, such as the TER
(Snover et al., 2006) and the work of (Akiba et
al., 2001) in addition to WER and PER, the pre-
cision based method such as SIA (Liu and Gildea,
2006) in addition to BLEU and NIST, recall based
method such as ROUGE (Lin and Hovy, 2003),
the word order information utilized by (Wong and
Kit, 2008), (Isozaki et al., 2010) and (Talbot et
al., 2011), and the combination of precision and
recall such as Meteor-1.3 (Denkowski and Lavie,
2011) (an modified version of Meteor, includes
ranking and adequacy versions and has overcome
some weaknesses of previous version such as noise
in the paraphrase matching, lack of punctuation
handling and discrimination between word type-
s), BLANC (Lita et al., 2005), LEPOR (Han et
al., 2012) and PORT (Chen et al., 2012). An-
other category is the employing of linguistic fea-
tures. The metrics of this kind include the syntactic
similarity such as the Part-of-Speech information
used by ROSE (Song and Cohn, 2011) and MPF
(Popovic, 2011), and phrase information employed
by (Echizen-ya and Araki, 2010) and (Han et al.,
2013b); the semantic similarity such as Textual en-
tailment used by (Mirkin et al., 2009), Synonyms
by (Chan and Ng, 2008), paraphrase by (Snover et
al., 2009).
The evaluation methods proposed previously
suffer from several main weaknesses more or less:
perform well in certain language pairs but weak on
others, which we call the language-bias problem;
consider no linguistic information (not reasonable
from the aspect of linguistic analysis) or too many
linguistic features (making it difficult in replicabil-
ity), which we call the extremism problem; present
incomprehensive factors (e.g. BLEU focus on pre-
cision only). To address these problems, a novel
automatic evaluation metric is proposed in this pa-
per with enhanced factors, tunable parameters and
optional linguistic information (part-of-speech, n-
gram).
2 Designed Model
2.1 Employed Internal Factors
Firstly, we introduce the internal factors utilized in
the calculation model.
2.1.1 Enhanced Length Penalty
Enhanced length penalty ELP is designed to
put the penalty on both longer and shorter sys-
tem output translations (an enhanced version of the
brevity penalty in BLEU):
ELP =
e1−r
c : c < r
e1− c
r : c ≥ r
(1)
where the parameters c and r are the sentence
length of automatically output (candidate) and ref-
erence translation respectively.
2.1.2 N-gram Position Difference Penalty
The N-gram Position Difference Penalty
NPosPenal is developed to compare the word
order between the output and reference translation.
NPosPenal = e−NPD
(2)
where NPD is defined as:
NPD =
1
Lengthoutput
Lengthoutput
i=1
|PDi| (3)
where Lengthoutput is the length of system output
sentence and PDi means the position difference
value of each output word. Every word from both
output translation and reference should be aligned
only once. When there is no match, the value of
PDi is assigned with zero as default for this output
token.
Two steps are designed to measure the NPD
value. The first step is the context-dependent n-
gram alignment: we use the n-gram method and
assign it with higher priority, which means the sur-
rounding context of the potential words are con-
sidered when selecting the matched pairs between
the output and reference sentence. The nearest
match is accepted as a backup choice to establish
the alignment, if there are both nearby matching or
there is no other matched words surrounding the
potential word pairs. The one-direction alignment
is from output sentence to the reference.
Assuming that wx represents the current word
in output sentence and wx+k means the kth word
to the previous (k < 0) or following (k > 0).
On the other hand, wr
y means the word matching
wx in the references, and wr
y+j has the similar
meaning as wx+k but in reference sentence. The
variable Distance is the position difference val-
ue between the matching word in outputs and ref-
erences. The operation process and pseudo code
of the context-dependent n-gram word alignmen-
t algorithm are shown in Figure 1 (with → as the
alignment). There is an example in Figure 2. In
the calculating step, each word is labeled with the
quotient value of its position number divided by
sentence length (the total number of the tokens in
the sentence).
Let’s see the example in Figure 2 for the NPD
introduction (Figure 3). Each output word is la-
beled with the position quotient value from 1/6 to
6/6 (indicating the word position marked by sen-
tence length which is 6). The words in the refer-
ence sentence is labeled using the same subscripts.
2.1.3 Precision and Recall
Precision and recall are two commonly used cri-
teria in the NLP literature. We use the HPR to
represent the weighted Harmonic mean of preci-
sion and recall, i.e. Harmonic(αR, βP). The
weights are the tunable parameters α and β.
HPR =
(α + β)Precision × Recall
αPrecision + βRecall
(4)
Precision =
Alignednum
Lengthoutput
(5)
Recall =
Alignednum
Lengthreference
(6)
where Alignednum represents the number of suc-
cessfully matched words appearing both in trans-
lation and reference.
2.2 Sentence Level Score
Secondly, we introduce the mathematical harmon-
ic mean to group multi-variables (n variables
(X1, X2, ..., Xn)).
Harmonic(X1, X2, ..., Xn) =
n
n
i=1
1
Xi
(7)
where n is the number of factors. Then, the
weighted harmonic mean for multi-variables is:
Harmonic(wX1 X1, wX2 X2, ..., wXn Xn) =
n
i=1 wXi
n
i=1
wXi
Xi
(8)
where wXi is the weight of variable Xi. Final-
ly, the sentence level score of the developed eval-
uation metric hLEPOR (Harmonic mean of en-
hanced Length Penalty, Precision, n-gram Position
difference Penalty and Recall) is measured by:
hLEPOR =
=
n
i=1 wi
n
i=1
wi
Factori
=
wELP + wNPosPenal + wHPR
wELP
ELP + wNP osP enal
NPosPenal + wHP R
HPR
(9)
where ELP, NPosPenal and HPR are the three
factors explained in previous section with tunable
weights wELP , wNPosPenal and wHPR respective-
ly.
2.3 System-level Score
The system level score is the arithmetical mean of
the sentence scores as below.
hLEPOR =
1
SentNum
SentNum
i=1
hLEPORi
(10)
where hLEPOR represents the system-level s-
core of hLEPOR, SentNum specifies the
sentence number of the test document, and
hLEPORi means the score of the ith sentence.
3 Enhanced Version
This section introduces an enhanced version of the
developed metric hLEPOR as hLEPORE. As
discussed by many researchers, language variabili-
ty results in no single correct translation and differ-
ent languages do not always express the same con-
tent in the same way. In addition to the augment-
ed factors of the designed metric hLEPOR, we
present that optional linguistic information can be
combined into this metric concisely. As an exam-
ple, we will show how the part-of-speech (POS) in-
formation can be employed into this metric. First,
we calculate the system-level hLEPOR scores
on the surface words ( hLEPORword ). Then
we employ the same algorithms of hLEPOR on
the corresponding POS sequences of the words (
hLEPORPOS). Finally, we combine this two
system-level scores together with tunable weights
(whw and whp) as the final score.
hLEPORE =
1
whw + whp
(whwhLEPORword
+whphLEPORPOS) (11)
We mention the POS information because it
sometimes acts as the similar function with the
synonyms, e.g. “there is a big bag” and “there
is a large bag” could be the same meaning but
with different surface words “big” and “large”
(the same POS adjective). The POS information
has been proved helpful in the research works of
ROSE (Song and Cohn, 2011) and MPF (Popovic,
2011). The POS information could be replaced by
any other concise linguistic information in our de-
signed model.
4 Evaluating the Evaluation Metric
In order to distinguish the reliability of differen-
t MT evaluation metrics, Spearman rank correla-
tion coefficient ρ is commonly used to calculate
the correlation in the annual workshop of statisti-
cal machine translation (WMT) for Association of
Computational Linguistics (ACL) (Callison-Burch
et al., 2011). When there are no ties, Spearman
rank correlation coefficient is calculated as:
ρφ(XY ) = 1 −
6 n
i=1 d2
i
n(n2 − 1)
(12)
where di is the difference-value (D-value) be-
tween the two corresponding rank variables X =
{x1, x2, ..., xn} and Y = {y1, y2, ..., yn} describ-
ing the system φ and n is the number of variables
in the system.
5 Experiments
The experiment corpora are from the ACL’s spe-
cial interest group of machine translation SIGMT
(WMT workshop) which contain eight corpora in-
cluding English-to-other (Spanish, Czech, French
and German) and other-to-English. There are in-
deed a lot of linguistic POS tagger tools for differ-
ent languages available. We conduct an evaluation
with different POS taggers, and find that the em-
ploying of POS information can make an increase
of the correlation score with human judgment for
some language pairs but little or no effect on other-
s. The employed POS tagging tools include Berke-
ley POS tagger for French, English and German
(Petrov et al., 2006), COMPOST Czech morphol-
ogy tagger (Collins, 2002) and TreeTagger Span-
ish tagger (Schmid, 1994). To avoid the overfitting
problem, the WMT 20082 data are used in the de-
velopment stage for the tuning of the parameter-
s and the WMT 2011 corpora are used in testing.
The tuned parameter values for different language
pairs are shown in Table 1. The abbreviations EN,
CZ, DE, ES and FR mean English, Czech, Ger-
man, Spanish and French respectively. In the n-
gram word (POS) alignment, bigram is selected in
all the language pairs. To make the model con-
cise using as fewer of external resources as possi-
ble, the value of “N/A” means the POS informa-
tion of that language pair is not employed due to
that it makes little or no effect in the correlation
scores. The label “(W)” and “(POS)” means the
parameters tuned on word and POS respectively.
The “NPP” means NPosPenal to save window
space. The tuned parameter values also prove that
different language pairs embrace different charac-
teristics.
The testing results on WMT 20113 corpora are
shown in Table 2. The comparisons with language-
independent evaluation metrics include the clas-
sic metrics (BLEU, TER and METEOR) and two
state-of-the-art metrics MPF and ROSE. We selec-
t MPF and ROSE because that these two metrics
also employ the POS information and MPF yield-
ed the highest correlation score with human judg-
ments among all the language-independent metric-
s (performing on eight language pairs) in WMT
2011. The numbers of participated automatic MT
systems in WMT 2011 are 10, 22, 15 and 17 re-
spectively for English-to-other (CZ, DE, ES and
FR) and 8, 20, 15 and 18 respectively for the op-
posite translation direction. The gold standard ref-
erence data for those corpora consists of 3,003 sen-
tences offered by manual work. Automatic MT e-
2
http://www.statmt.org/wmt08/
3
http://www.statmt.org/wmt11/
Ratio Other-to-English English-to-Other
CZ-EN DE-EN ES-EN FR-EN EN-CZ EN-DE EN-ES EN-FR
HPR:ELP:NPP(W) 7:2:1 3:2:1 7:2:1 3:2:1 3:2:1 1:3:7 3:2:1 3:2:1
HPR:ELP:NPP(POS) N/A 3:2:1 N/A 3:2:1 N/A 7:2:1 N/A 3:2:1
α : β(W) 1:9 9:1 1:9 9:1 9:1 9:1 9:1 9:1
α : β(POS) N/A 9:1 N/A 9:1 N/A 9:1 N/A 9:1
whw : whp N/A 1:9 N/A 9:1 N/A 1:9 N/A 9:1
Table 1: Values of tuned weight parameters
valuation metrics are evaluated by the correlation
coefficient with the human judgments.
Several conclusions could be drawn from the re-
sults. First, some evaluation metrics show good
performances on part of the language pairs but low
performances on others, e.g ROSE results in 0.92
correlation with human judgments on Spanish-to-
English corpus but down to 0.41 score on English-
to-German; METEOR gets 0.93 score on French-
to-English but 0.3 on English-to-German. Second,
hLEPORE generally yields good performances
on different language pairs except for the English-
to-Czech and results in the highest Mean correla-
tion score 0.83 on eight corpora. Third, the recent-
ly developed methods (e.g. MPF, 0.81 mean score)
correlate better with human judgments than the tra-
ditional ones (e.g. BLEU, 0.74 means score), indi-
cating an improvement of the researches. Final-
ly, no metric can yield high performance on all
the language pairs, which shows that there remains
large potential to achieve improvement.
6 Conclusion and Perspectives
This work proposes a language-independent mod-
el for machine translation evaluation. Considering
the different characteristics of different languages,
hLEPORE has been extensively designed from
different aspects. That spans from word order
(context-dependent n-gram alignment), output ac-
curacy (precision), and loyalty (recall) to trans-
lation length performance (sentence length). D-
ifferent weight parameters are assigned to adjust
the importance of each factor, for instance, the
word position could be free in some languages but
strictly constrained in other languages. In prac-
tice, these employed features by hLEPORE are
also the vital ones when people facilitate language
translation. This is the philology behind the formu-
lation and the study of this work, and we believe
human’s translation ideology is the exact direction
that MT systems should try to approach. Further-
more, this work specifies that different external re-
sources or linguistic information could be integrat-
ed into this model easily. As suggested by other
works, e.g. (Avramidis et al., 2011), the POS infor-
mation is considered in the experiments and shows
some improvements on certain language pairs.
There are several main contributions of this pa-
per compared with our previous work (Han et al.,
2013). This work combines the utilizing of sur-
face words and linguistic features together (in-
stead of relying on the consilience of the POS se-
quence only). This paper measures the system-
level hLEPOR score by the arithmetical mean of
each sentence-level score (instead of the Harmon-
ic mean of system-level internal factors). This pa-
per shows the performances of enhanced method
hLEPORE on all the eight language pairs re-
leased by WMT official web (instead of part lan-
guage pairs by previous work) and most of the per-
formances have achieved improvements than pre-
vious work on the same language pairs (e.g. the
correlation score on German-English is 0.86 in-
creased from 0.83; the correlation score on French-
English is 0.92 increased from 0.74.). Other po-
tential linguistic features are easily to be employed
into the flexible model built in this paper.
There are also several aspects that should be ad-
dressed in the future works. Firstly, more language
pairs, in addition to the European languages, will
be tested such as Japanese, Korean and Chinese
and the performances of linguistic features (e.g.
POS tagging) will also be explored on the new lan-
guage pairs. Secondly, the tuning of weight pa-
rameters to achieve high correlation with human
judgments during the development period will be
automatically performed. Thirdly, since the use
of multiple references helps the usual translation
Metrics Other-to-English English-to-Other Mean
CZ-EN DE-EN ES-EN FR-EN EN-CZ EN-DE EN-ES EN-FR
hLEPORE 0.93 0.86 0.88 0.92 0.56 0.82 0.85 0.83 0.83
MPF 0.95 0.69 0.83 0.87 0.72 0.63 0.87 0.89 0.81
ROSE 0.88 0.59 0.92 0.86 0.65 0.41 0.9 0.86 0.76
METEOR 0.93 0.71 0.91 0.93 0.65 0.3 0.74 0.85 0.75
BLEU 0.88 0.48 0.9 0.85 0.65 0.44 0.87 0.86 0.74
TER 0.83 0.33 0.89 0.77 0.5 0.12 0.81 0.84 0.64
Table 2: Correlation coefficients with human judgments
quality measures correlate with the human judg-
ing, the scheme of how to use the multiple refer-
ences will be designed.
The designed source codes of this paper
can be freely downloaded for research pur-
pose, open source online. The source code of
hLEPOR measuring algorithm is available here
“https://github.com/aaronlifenghan/aaron-project-
hlepor”. “The final publication is available at
http://www.mt-archive.info/”
Acknowledgments.
The authors are grateful to the Science and
Technology Development Fund of Macau and the
Research Committee of the University of Macau
for the funding support for our research, under
the reference No. 017/2009/A and RG060/09-
10S/CS/FST. The authors also wish to thank the
anonymous reviewers for many helpful comments.
References
Akiba, Y., K. Imamura, and E. Sumita. 2001. Using
Multiple Edit Distances to Automatically Rank Ma-
chine Translation Output. Proceedings of MT Sum-
mit VIII , Santiago de Compostela, Spain.
Arnold, D. 2003. Why translation is difficult for com-
puters. In Computers and Translation: A transla-
tor’s guide , Benjamins Translation Library.
Avramidis, E., Popovic, M., Vilar, D., Burchardt, A.
2011. Evaluate with Confidence Estimation: Ma-
chine ranking of translation outputs using grammat-
ical features. Proceedings of ACL-WMT , pages 65-
70, Edinburgh, Scotland, UK.
Callison-Bruch, C., Koehn, P., Monz, C. and Zaidan, O.
F. 2011. Findings of the 2011 Workshop on Statisti-
cal Machine Translation. Proceedings of ACL-WMT
, pages 22-64, Edinburgh, Scotland, UK.
Carroll, J. B. 1966. Aan experiment in evaluating
the quality of translation. Languages and machines:
computers in translation and linguistics , Automat-
ic Language Processing Advisory Committee (AL-
PAC), Publication 1416, Division of Behavioral Sci-
ences, National Academy of Sciences, National Re-
search Council, page 67-75.
Chan, Y. S. and Ng, H. T. 2008. MAXSIM: A maxi-
mum similarity metric for machine translation eval-
uation. Proceedings of ACL 2008: HLT , pages
55–62.
Chen, Boxing, Roland Kuhn and Samuel Larkin.
2012. PORT: a Precision-Order-Recall MT Evalu-
ation Metric for Tuning. Proceedings of 50th ACL) ,
pages 930–939, Jeju, Republic of Korea.
Collins, M. 2002. Discriminative Training Method-
s for Hidden Markov Models: Theory and Experi-
ments with Perceptron Algorithms. Proceedings of
the ACL-02 conference, Volume 10 (EMNLP 02) ,
pages 1-8. Stroudsburg, PA, USA .
Denkowski, M. and Lavie, A. 2011. Meteor: Meteor
1.3: Automatic metric for reliable optimization and
evaluation of machine translation systems. Proceed-
ings of (ACL-WMT) ,pages 85-91, Edinburgh, Scot-
land, UK.
Doddington, G. 2002. Automatic evaluation of ma-
chine translation quality using n-gram co-occurrence
statistics. Proceedings of the second internation-
al conference on Human Language Technology Re-
search , pages 138-145, San Diego, California, USA.
Echizen-ya, H. and Araki, K. 2010. Automatic eval-
uation method for machine translation using noun-
phrase chunking. Proceedings of ACL 2010 , pages
108–117. Association for Computational Linguistic-
s.
Han, Aaron L.-F., Derek F. Wong, and Lidia S. Chao.
2012. LEPOR: A Robust Evaluation Metric for Ma-
chine Translation with Augmented Factors. Pro-
ceedings of the 24th International Conference of
COLING, Posters, pages 441-450, Mumbai, India.
Han, Aaron L.-F., Derek F. Wong, Lidia S. Chao, and
Liangye He. 2013. Automatic Machine Trans-
lation Evaluation with Part-of-Speech Information.
Proceedings of the 16th International Conference of
Text, Speech and Dialogue (TSD 2013), LNCS Vol-
ume Editors: Vaclav Matousek et al. Springer-Verlag
Berlin Heidelberg. Plzen, Czech Republic.
Han, Aaron L.-F., Derek F. Wong, Lidia S. Chao,
Liangye He, Shuo Li, and Ling Zhu. 2013b.
Phrase Mapping for French and English Treebank
and the Application in Machine Translation Evalu-
ation. Proceedings of the International Conference
of the German Society for Computational Linguis-
tics and Language Technology, (GSCL 2013), LNC-
S Volume Editors: Iryna Gurevych, Chris Biemann
and Torsten Zesch. Darmstadt, Germany.
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., and Tsuka-
da, H. 2010. Automatic evaluation of translation
quality for distant language pairs. Proceedings of
the 2010 Conference on EMNLP , pages 944–952,
Cambridge, MA.
Koehn, P. 2010. Statistical Machine Translation. Cam-
bridge University Press .
Marino, B. Jose, Rafael E. Banchs, Josep M. Crego,
Adria de Gispert, Patrik Lambert, Jose A. Fonollosa,
and Marta R. Costa-jussa. 2006. N-gram based ma-
chine translation. Journal of the Computational Lin-
guistics ,Vol. 32, No. 4. pp. 527-549, MIT Press.
Lin, Chin-Yew and E.H. Hovy. 2003. Automatic Eval-
uation of Summaries Using N-gram Co-occurrence
Statistics. Proceedings of HLT-NAACL 2003, Ed-
monton, Canada.
Lita, Lucian Vlad, Monica Rogati and Alon Lavie.
2005. BLANC: Learning Evaluation Metrics for
MT. Proceedings of the HLT/EMNLP, pages
740–747, Vancouver.
Liu D. and Daniel Gildea. 2006. Stochastic iterative
alignment for machine translation evaluation. Pro-
ceedings of ACL-06, Sydney.
Mirkin S., Lucia Specia, Nicola Cancedda, Ido Dagan,
Marc Dymetman, and Idan Szpektor. 2009. Source-
Language Entailment Modeling for Translating Un-
known Terms. Proceedings of the ACL-IJCNLP
2009) , pages 791–799, Suntec, Singapore.
Och, F. J. 2003. Minimum Error Rate Training for S-
tatistical Machine Translation. Proceedings of ACL-
2003 , pp. 160-167.
Papineni, K., Roukos, S., Ward, T. and Zhu, W. J. 2002.
BLEU: a method for automatic evaluation of ma-
chine translation. Proceedings of the ACL 2002 ,
pages 311-318, Philadelphia, PA, USA.
Petrov, S., Leon Barrett, Romain Thibaux, and Dan K-
lein 2006. Learning accurate, compact, and inter-
pretable tree annotation. Proceedings of the 21st A-
CL , pages 433–440, Sydney.
Popovic, M. 2011. Morphemes and POS tags for n-
gram based evaluation metrics. Proceedings of WMT
, pages 104-107, Edinburgh, Scotland, UK.
Schmid, H. 1994. Probabilistic Part-of-Speech Tag-
ging Using Decision Trees. Proceedings of Inter-
national Conference on New Methods in Language
Processing , Manchester, UK.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and
Makhoul J. 2006. A study of translation edit rate
with targeted human annotation. Proceedings of the
AMTA, pages 223-231, Boston, USA.
Snover, Matthew G., Nitin Madnani, Bonnie Dorr, and
Richard Schwartz. 2009. TER-Plus: paraphrase, se-
mantic, and alignment enhancements to Translation
Edit Rate. J. Machine Translation, 23: 117-127.
Song, X. and Cohn, T. 2011. Regression and rank-
ing based optimisation for sentence level MT eval-
uation. Proceedings of the WMT , pages 123-129,
Edinburgh, Scotland, UK.
Su, Hung-Yu and Chung-Hsien Wu. 2009. Improving
Structural Statistical Machine Translation for Sign
Language With Small Corpus Using Thematic Role
Templates as Translation Memory. IEEE TRANS-
ACTIONS ON AUDIO, SPEECH, AND LANGUAGE
PROCESSING , VOL. 17, NO. 7.
Su, Keh-Yih, Wu Ming-Wen and Chang Jing-Shin.
1992. A New Quantitative Quality Measure for Ma-
chine Translation Systems. Proceedings of COL-
ING, pages 433–439, Nantes, France.
Talbot, D., Kazawa, H., Ichikawa, H., Katz-Brown, J.,
Seno, M. and Och, F. 2011. A Lightweight Evalu-
ation Framework for Machine Translation Reorder-
ing. Proceedings of the WMT, pages 12-21, Edin-
burgh, Scotland, UK.
Tillmann, C., Stephan Vogel, Hermann Ney, Arkaitz
Zubiaga, and Hassan Sawaf. 1997. Accelerated
DP Based Search For Statistical Translation. Pro-
ceedings of the 5th European Conference on Speech
Communication and Technology .
Turian, J. P., Shen, L. and Melanmed, I. D. 2003. E-
valuation of machine translation and its evaluation.
Proceedings of MT Summit IX , pages 386-393, New
Orleans, LA, USA.
Weaver, Warren. 1955. Translation. Machine Trans-
lation of Languages: Fourteen Essays, In William
Locke and A. Donald Booth, editors, John Wiley and
Sons. New York, pages 15—23.
White, J. S., O’Connell, T. A., and O’Mara, F. E. 1994.
The ARPA MT evaluation methodologies: Evolu-
tion, lessons, and future approaches. Proceedings
of AMTA, pp193-205.
Wong, B. T-M and Kit, C. 2008. Word choice and
word position for automatic MT evaluation. Work-
shop: MetricsMATR of AMTA, short paper, 3 pages,
Waikiki, Hawai’I, USA.
Xiong, D., M. Zhang, H. Li. 2011. A Maximum-
Entropy Segmentation Model for Statistical Machine
Translation. IEEE Transactions on Audio, Speech,
and Language Processing , Volume: 19, Issue: 8,
2011 , pp. 2494- 2505.
Figure 1: N-gram word alignment algorithm
Figure 2: Example of n-gram word alignment
Figure 3: Example of NPD calculation

Weitere ähnliche Inhalte

Was ist angesagt?

IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Machine translation evaluation: a survey
Machine translation evaluation: a surveyMachine translation evaluation: a survey
Machine translation evaluation: a surveyLifeng (Aaron) Han
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONijnlc
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Lifeng (Aaron) Han
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningCITE
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiPadma Metta
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis kevig
 

Was ist angesagt? (18)

IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Machine translation evaluation: a survey
Machine translation evaluation: a surveyMachine translation evaluation: a survey
Machine translation evaluation: a survey
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
 
Language models
Language modelsLanguage models
Language models
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
SMT3
SMT3SMT3
SMT3
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to Hindi
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis
 

Andere mochten auch

Building a pan-European automated translation platform, Andrejs Vasiljevs, CE...
Building a pan-European automated translation platform, Andrejs Vasiljevs, CE...Building a pan-European automated translation platform, Andrejs Vasiljevs, CE...
Building a pan-European automated translation platform, Andrejs Vasiljevs, CE...TAUS - The Language Data Network
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...Lifeng (Aaron) Han
 
NLP Professional Publication and Presentation Links.Aaron 2011-2013
NLP Professional Publication and Presentation Links.Aaron 2011-2013NLP Professional Publication and Presentation Links.Aaron 2011-2013
NLP Professional Publication and Presentation Links.Aaron 2011-2013Lifeng (Aaron) Han
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsParisa Niksefat
 
Shallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishShallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishJim O'Regan
 
LEPOR: an augmented machine translation evaluation metric
LEPOR: an augmented machine translation evaluation metric LEPOR: an augmented machine translation evaluation metric
LEPOR: an augmented machine translation evaluation metric Lifeng (Aaron) Han
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translationRIILP
 
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...Kyoshiro Sugiyama
 
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015RIILP
 
Kone ja joukko – kääntäjän uudet ystävät?
Kone ja joukko – kääntäjän uudet ystävät?Kone ja joukko – kääntäjän uudet ystävät?
Kone ja joukko – kääntäjän uudet ystävät?Jari Herrgård
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...Lifeng (Aaron) Han
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine TranslationRIILP
 
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATION
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATIONTSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATION
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATIONLifeng (Aaron) Han
 
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...Lifeng (Aaron) Han
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
 

Andere mochten auch (19)

Building a pan-European automated translation platform, Andrejs Vasiljevs, CE...
Building a pan-European automated translation platform, Andrejs Vasiljevs, CE...Building a pan-European automated translation platform, Andrejs Vasiljevs, CE...
Building a pan-European automated translation platform, Andrejs Vasiljevs, CE...
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
WMT14_sakaguchi
WMT14_sakaguchiWMT14_sakaguchi
WMT14_sakaguchi
 
The Translation Game
The Translation GameThe Translation Game
The Translation Game
 
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...
MT SUMMIT2013 poster boaster slides.Language-independent Model for Machine Tr...
 
NLP Professional Publication and Presentation Links.Aaron 2011-2013
NLP Professional Publication and Presentation Links.Aaron 2011-2013NLP Professional Publication and Presentation Links.Aaron 2011-2013
NLP Professional Publication and Presentation Links.Aaron 2011-2013
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation Outputs
 
5. bleu
5. bleu5. bleu
5. bleu
 
Shallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishShallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to Polish
 
LEPOR: an augmented machine translation evaluation metric
LEPOR: an augmented machine translation evaluation metric LEPOR: an augmented machine translation evaluation metric
LEPOR: an augmented machine translation evaluation metric
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
 
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
 
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
ESR12 Hanna Bechara - EXPERT Summer School - Malaga 2015
 
Kone ja joukko – kääntäjän uudet ystävät?
Kone ja joukko – kääntäjän uudet ystävät?Kone ja joukko – kääntäjän uudet ystävät?
Kone ja joukko – kääntäjän uudet ystävät?
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
 
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATION
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATIONTSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATION
TSD2013.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATION
 
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
 

Ähnlich wie MT SUMMIT13.Language-independent Model for Machine Translation Evaluation with Reinforced Factors

Automated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfAutomated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfSarah Marie
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Yafi Azhari
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONkevig
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationDarian Pruitt
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationIJECEIAES
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluationijnlc
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...Kim Daniels
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...Lifeng (Aaron) Han
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSkevig
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSijnlc
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 

Ähnlich wie MT SUMMIT13.Language-independent Model for Machine Translation Evaluation with Reinforced Factors (20)

Automated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdfAutomated evaluation of coherence in student essays.pdf
Automated evaluation of coherence in student essays.pdf
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
semeval2016
semeval2016semeval2016
semeval2016
 
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
A Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference AnnotationA Pilot Study On Computer-Aided Coreference Annotation
A Pilot Study On Computer-Aided Coreference Annotation
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
P13 corley
P13 corleyP13 corley
P13 corley
 
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSA COMPARATIVE STUDY OF FEATURE SELECTION METHODS
A COMPARATIVE STUDY OF FEATURE SELECTION METHODS
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 

Mehr von Lifeng (Aaron) Han

WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterWMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterLifeng (Aaron) Han
 
Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Lifeng (Aaron) Han
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerBuild moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerLifeng (Aaron) Han
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Lifeng (Aaron) Han
 
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...Lifeng (Aaron) Han
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaLifeng (Aaron) Han
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationLifeng (Aaron) Han
 
machine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveymachine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveyLifeng (Aaron) Han
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Lifeng (Aaron) Han
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelLifeng (Aaron) Han
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the societyLifeng (Aaron) Han
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 

Mehr von Lifeng (Aaron) Han (20)

WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterWMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
 
Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerBuild moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
 
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine Translation
 
machine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveymachine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a survey
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the society
 
Thesis-Master-MTE-Aaron
Thesis-Master-MTE-AaronThesis-Master-MTE-Aaron
Thesis-Master-MTE-Aaron
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 

Kürzlich hochgeladen

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Kürzlich hochgeladen (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

MT SUMMIT13.Language-independent Model for Machine Translation Evaluation with Reinforced Factors

  • 1. Language-independent Model for Machine Translation Evaluation with Reinforced Factors Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Liangye He Yi Lu, Junwen Xing, and Xiaodong Zeng Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory Department of Computer and Information Science University of Macau, Macau S.A.R., China hanlifengaaron@gmail.com, {derekfw, lidiasc}@umac.mo {wutianshui0515,takamachi660,nlp2ct.anson,nlp2ct.samuel}@gmail.com Abstract The conventional machine translation eval- uation metrics tend to perform well on cer- tain language pairs but weak on other lan- guage pairs. Furthermore, some evalua- tion metrics could only work on certain language pairs not language-independent. Finally, no considering of linguistic in- formation usually leads the metrics re- sult in low correlation with human judg- ments while too many linguistic features or external resources make the metric- s complicated and difficult in replicabil- ity. To address these problems, a nov- el language-independent evaluation metric is proposed in this work with enhanced factors and optional linguistic information (part-of-speech, n-grammar) but not very much. To make the metric perform well on different language pairs, extensive factors are designed to reflect the translation qual- ity and the assigned parameter weights are tunable according to the special character- istics of focused language pairs. Experi- ments show that this novel evaluation met- ric yields better performances compared with several classic evaluation metrics (in- cluding BLEU, TER and METEOR) and two state-of-the-art ones including ROSE and MPF.1 1 Introduction The machine translation (MT) began as early as in the 1950s (Weaver, 1955) and gained a big 1 The final publication is available at http://www.mt- archive.info/ progress science the 1990s due to the develop- ment of computers (storage capacity and compu- tational power) and the enlarged bilingual corpora (Marino et al., 2006), e.g. (Och, 2003) present- ed MERT (Minimum Error Rate Training) for log- linear statistical machine translation (SMT) mod- els to achieve better translation quality, (Su et al., 2009) used the Thematic Role Templates model to improve the translation and (Xiong et al., 2011) employed the maximum-entropy model etc. The statistical MT (Koehn, 2010) became mainly ap- proaches in MT literature. Due to the wide-spread development of MT systems, the MT evaluation becomes more and more important to tell us how well the MT systems perform and whether they make some progress. However, the MT evaluation is difficult because some reasons, e.g. language variability results in no single correct translation, the natural languages are highly ambiguous and d- ifferent languages do not always express the same content in the same way (Arnold, 2003). How to evaluate each MT system’s quality and what should be the criteria have become the new challenges in front of MT researchers. The earliest human assessment methods include the intelligi- bility (measuring how understandable the sentence is) and fidelity (measuring how much information the translated sentence retains compared to the o- riginal) used by the Automatic Language Process- ing Advisory Committee (ALPAC) around 1966 (Carroll, 1966), and the afterwards proposed ad- equacy (similar as fidelity), fluency (whether the sentence is well-formed and fluent) and compre- hension (improved intelligibility) by Defense Ad- vanced Research Projects Agency (DARPA) of US (White et al., 1994). The manual evaluations suf-
  • 2. fer the main disadvantage that it is time-consuming and thus too expensive to do frequently. The early automatic evaluation metrics include the word error rate WER (Su et al., 1992) (edit dis- tance between the system output and the closest reference translation), position independent word error rate PER (Tillmann et al., 1997) (variant of WER that disregards word ordering), BLEU (Pap- ineni et al., 2002) (the geometric mean of n-gram precision by the system output with respect to reference translations), NIST (Doddington, 2002) (adding the information weight) and GTM (Turian et al., 2003). Recently, many other methods were proposed to revise or improve the previous works. One of the categories is the lexical similarity based metric. The metrics of this kind include the edit distance based method, such as the TER (Snover et al., 2006) and the work of (Akiba et al., 2001) in addition to WER and PER, the pre- cision based method such as SIA (Liu and Gildea, 2006) in addition to BLEU and NIST, recall based method such as ROUGE (Lin and Hovy, 2003), the word order information utilized by (Wong and Kit, 2008), (Isozaki et al., 2010) and (Talbot et al., 2011), and the combination of precision and recall such as Meteor-1.3 (Denkowski and Lavie, 2011) (an modified version of Meteor, includes ranking and adequacy versions and has overcome some weaknesses of previous version such as noise in the paraphrase matching, lack of punctuation handling and discrimination between word type- s), BLANC (Lita et al., 2005), LEPOR (Han et al., 2012) and PORT (Chen et al., 2012). An- other category is the employing of linguistic fea- tures. The metrics of this kind include the syntactic similarity such as the Part-of-Speech information used by ROSE (Song and Cohn, 2011) and MPF (Popovic, 2011), and phrase information employed by (Echizen-ya and Araki, 2010) and (Han et al., 2013b); the semantic similarity such as Textual en- tailment used by (Mirkin et al., 2009), Synonyms by (Chan and Ng, 2008), paraphrase by (Snover et al., 2009). The evaluation methods proposed previously suffer from several main weaknesses more or less: perform well in certain language pairs but weak on others, which we call the language-bias problem; consider no linguistic information (not reasonable from the aspect of linguistic analysis) or too many linguistic features (making it difficult in replicabil- ity), which we call the extremism problem; present incomprehensive factors (e.g. BLEU focus on pre- cision only). To address these problems, a novel automatic evaluation metric is proposed in this pa- per with enhanced factors, tunable parameters and optional linguistic information (part-of-speech, n- gram). 2 Designed Model 2.1 Employed Internal Factors Firstly, we introduce the internal factors utilized in the calculation model. 2.1.1 Enhanced Length Penalty Enhanced length penalty ELP is designed to put the penalty on both longer and shorter sys- tem output translations (an enhanced version of the brevity penalty in BLEU): ELP = e1−r c : c < r e1− c r : c ≥ r (1) where the parameters c and r are the sentence length of automatically output (candidate) and ref- erence translation respectively. 2.1.2 N-gram Position Difference Penalty The N-gram Position Difference Penalty NPosPenal is developed to compare the word order between the output and reference translation. NPosPenal = e−NPD (2) where NPD is defined as: NPD = 1 Lengthoutput Lengthoutput i=1 |PDi| (3) where Lengthoutput is the length of system output sentence and PDi means the position difference value of each output word. Every word from both output translation and reference should be aligned only once. When there is no match, the value of PDi is assigned with zero as default for this output token. Two steps are designed to measure the NPD value. The first step is the context-dependent n- gram alignment: we use the n-gram method and assign it with higher priority, which means the sur- rounding context of the potential words are con- sidered when selecting the matched pairs between
  • 3. the output and reference sentence. The nearest match is accepted as a backup choice to establish the alignment, if there are both nearby matching or there is no other matched words surrounding the potential word pairs. The one-direction alignment is from output sentence to the reference. Assuming that wx represents the current word in output sentence and wx+k means the kth word to the previous (k < 0) or following (k > 0). On the other hand, wr y means the word matching wx in the references, and wr y+j has the similar meaning as wx+k but in reference sentence. The variable Distance is the position difference val- ue between the matching word in outputs and ref- erences. The operation process and pseudo code of the context-dependent n-gram word alignmen- t algorithm are shown in Figure 1 (with → as the alignment). There is an example in Figure 2. In the calculating step, each word is labeled with the quotient value of its position number divided by sentence length (the total number of the tokens in the sentence). Let’s see the example in Figure 2 for the NPD introduction (Figure 3). Each output word is la- beled with the position quotient value from 1/6 to 6/6 (indicating the word position marked by sen- tence length which is 6). The words in the refer- ence sentence is labeled using the same subscripts. 2.1.3 Precision and Recall Precision and recall are two commonly used cri- teria in the NLP literature. We use the HPR to represent the weighted Harmonic mean of preci- sion and recall, i.e. Harmonic(αR, βP). The weights are the tunable parameters α and β. HPR = (α + β)Precision × Recall αPrecision + βRecall (4) Precision = Alignednum Lengthoutput (5) Recall = Alignednum Lengthreference (6) where Alignednum represents the number of suc- cessfully matched words appearing both in trans- lation and reference. 2.2 Sentence Level Score Secondly, we introduce the mathematical harmon- ic mean to group multi-variables (n variables (X1, X2, ..., Xn)). Harmonic(X1, X2, ..., Xn) = n n i=1 1 Xi (7) where n is the number of factors. Then, the weighted harmonic mean for multi-variables is: Harmonic(wX1 X1, wX2 X2, ..., wXn Xn) = n i=1 wXi n i=1 wXi Xi (8) where wXi is the weight of variable Xi. Final- ly, the sentence level score of the developed eval- uation metric hLEPOR (Harmonic mean of en- hanced Length Penalty, Precision, n-gram Position difference Penalty and Recall) is measured by: hLEPOR = = n i=1 wi n i=1 wi Factori = wELP + wNPosPenal + wHPR wELP ELP + wNP osP enal NPosPenal + wHP R HPR (9) where ELP, NPosPenal and HPR are the three factors explained in previous section with tunable weights wELP , wNPosPenal and wHPR respective- ly. 2.3 System-level Score The system level score is the arithmetical mean of the sentence scores as below. hLEPOR = 1 SentNum SentNum i=1 hLEPORi (10) where hLEPOR represents the system-level s- core of hLEPOR, SentNum specifies the sentence number of the test document, and hLEPORi means the score of the ith sentence. 3 Enhanced Version This section introduces an enhanced version of the developed metric hLEPOR as hLEPORE. As discussed by many researchers, language variabili- ty results in no single correct translation and differ- ent languages do not always express the same con- tent in the same way. In addition to the augment- ed factors of the designed metric hLEPOR, we present that optional linguistic information can be
  • 4. combined into this metric concisely. As an exam- ple, we will show how the part-of-speech (POS) in- formation can be employed into this metric. First, we calculate the system-level hLEPOR scores on the surface words ( hLEPORword ). Then we employ the same algorithms of hLEPOR on the corresponding POS sequences of the words ( hLEPORPOS). Finally, we combine this two system-level scores together with tunable weights (whw and whp) as the final score. hLEPORE = 1 whw + whp (whwhLEPORword +whphLEPORPOS) (11) We mention the POS information because it sometimes acts as the similar function with the synonyms, e.g. “there is a big bag” and “there is a large bag” could be the same meaning but with different surface words “big” and “large” (the same POS adjective). The POS information has been proved helpful in the research works of ROSE (Song and Cohn, 2011) and MPF (Popovic, 2011). The POS information could be replaced by any other concise linguistic information in our de- signed model. 4 Evaluating the Evaluation Metric In order to distinguish the reliability of differen- t MT evaluation metrics, Spearman rank correla- tion coefficient ρ is commonly used to calculate the correlation in the annual workshop of statisti- cal machine translation (WMT) for Association of Computational Linguistics (ACL) (Callison-Burch et al., 2011). When there are no ties, Spearman rank correlation coefficient is calculated as: ρφ(XY ) = 1 − 6 n i=1 d2 i n(n2 − 1) (12) where di is the difference-value (D-value) be- tween the two corresponding rank variables X = {x1, x2, ..., xn} and Y = {y1, y2, ..., yn} describ- ing the system φ and n is the number of variables in the system. 5 Experiments The experiment corpora are from the ACL’s spe- cial interest group of machine translation SIGMT (WMT workshop) which contain eight corpora in- cluding English-to-other (Spanish, Czech, French and German) and other-to-English. There are in- deed a lot of linguistic POS tagger tools for differ- ent languages available. We conduct an evaluation with different POS taggers, and find that the em- ploying of POS information can make an increase of the correlation score with human judgment for some language pairs but little or no effect on other- s. The employed POS tagging tools include Berke- ley POS tagger for French, English and German (Petrov et al., 2006), COMPOST Czech morphol- ogy tagger (Collins, 2002) and TreeTagger Span- ish tagger (Schmid, 1994). To avoid the overfitting problem, the WMT 20082 data are used in the de- velopment stage for the tuning of the parameter- s and the WMT 2011 corpora are used in testing. The tuned parameter values for different language pairs are shown in Table 1. The abbreviations EN, CZ, DE, ES and FR mean English, Czech, Ger- man, Spanish and French respectively. In the n- gram word (POS) alignment, bigram is selected in all the language pairs. To make the model con- cise using as fewer of external resources as possi- ble, the value of “N/A” means the POS informa- tion of that language pair is not employed due to that it makes little or no effect in the correlation scores. The label “(W)” and “(POS)” means the parameters tuned on word and POS respectively. The “NPP” means NPosPenal to save window space. The tuned parameter values also prove that different language pairs embrace different charac- teristics. The testing results on WMT 20113 corpora are shown in Table 2. The comparisons with language- independent evaluation metrics include the clas- sic metrics (BLEU, TER and METEOR) and two state-of-the-art metrics MPF and ROSE. We selec- t MPF and ROSE because that these two metrics also employ the POS information and MPF yield- ed the highest correlation score with human judg- ments among all the language-independent metric- s (performing on eight language pairs) in WMT 2011. The numbers of participated automatic MT systems in WMT 2011 are 10, 22, 15 and 17 re- spectively for English-to-other (CZ, DE, ES and FR) and 8, 20, 15 and 18 respectively for the op- posite translation direction. The gold standard ref- erence data for those corpora consists of 3,003 sen- tences offered by manual work. Automatic MT e- 2 http://www.statmt.org/wmt08/ 3 http://www.statmt.org/wmt11/
  • 5. Ratio Other-to-English English-to-Other CZ-EN DE-EN ES-EN FR-EN EN-CZ EN-DE EN-ES EN-FR HPR:ELP:NPP(W) 7:2:1 3:2:1 7:2:1 3:2:1 3:2:1 1:3:7 3:2:1 3:2:1 HPR:ELP:NPP(POS) N/A 3:2:1 N/A 3:2:1 N/A 7:2:1 N/A 3:2:1 α : β(W) 1:9 9:1 1:9 9:1 9:1 9:1 9:1 9:1 α : β(POS) N/A 9:1 N/A 9:1 N/A 9:1 N/A 9:1 whw : whp N/A 1:9 N/A 9:1 N/A 1:9 N/A 9:1 Table 1: Values of tuned weight parameters valuation metrics are evaluated by the correlation coefficient with the human judgments. Several conclusions could be drawn from the re- sults. First, some evaluation metrics show good performances on part of the language pairs but low performances on others, e.g ROSE results in 0.92 correlation with human judgments on Spanish-to- English corpus but down to 0.41 score on English- to-German; METEOR gets 0.93 score on French- to-English but 0.3 on English-to-German. Second, hLEPORE generally yields good performances on different language pairs except for the English- to-Czech and results in the highest Mean correla- tion score 0.83 on eight corpora. Third, the recent- ly developed methods (e.g. MPF, 0.81 mean score) correlate better with human judgments than the tra- ditional ones (e.g. BLEU, 0.74 means score), indi- cating an improvement of the researches. Final- ly, no metric can yield high performance on all the language pairs, which shows that there remains large potential to achieve improvement. 6 Conclusion and Perspectives This work proposes a language-independent mod- el for machine translation evaluation. Considering the different characteristics of different languages, hLEPORE has been extensively designed from different aspects. That spans from word order (context-dependent n-gram alignment), output ac- curacy (precision), and loyalty (recall) to trans- lation length performance (sentence length). D- ifferent weight parameters are assigned to adjust the importance of each factor, for instance, the word position could be free in some languages but strictly constrained in other languages. In prac- tice, these employed features by hLEPORE are also the vital ones when people facilitate language translation. This is the philology behind the formu- lation and the study of this work, and we believe human’s translation ideology is the exact direction that MT systems should try to approach. Further- more, this work specifies that different external re- sources or linguistic information could be integrat- ed into this model easily. As suggested by other works, e.g. (Avramidis et al., 2011), the POS infor- mation is considered in the experiments and shows some improvements on certain language pairs. There are several main contributions of this pa- per compared with our previous work (Han et al., 2013). This work combines the utilizing of sur- face words and linguistic features together (in- stead of relying on the consilience of the POS se- quence only). This paper measures the system- level hLEPOR score by the arithmetical mean of each sentence-level score (instead of the Harmon- ic mean of system-level internal factors). This pa- per shows the performances of enhanced method hLEPORE on all the eight language pairs re- leased by WMT official web (instead of part lan- guage pairs by previous work) and most of the per- formances have achieved improvements than pre- vious work on the same language pairs (e.g. the correlation score on German-English is 0.86 in- creased from 0.83; the correlation score on French- English is 0.92 increased from 0.74.). Other po- tential linguistic features are easily to be employed into the flexible model built in this paper. There are also several aspects that should be ad- dressed in the future works. Firstly, more language pairs, in addition to the European languages, will be tested such as Japanese, Korean and Chinese and the performances of linguistic features (e.g. POS tagging) will also be explored on the new lan- guage pairs. Secondly, the tuning of weight pa- rameters to achieve high correlation with human judgments during the development period will be automatically performed. Thirdly, since the use of multiple references helps the usual translation
  • 6. Metrics Other-to-English English-to-Other Mean CZ-EN DE-EN ES-EN FR-EN EN-CZ EN-DE EN-ES EN-FR hLEPORE 0.93 0.86 0.88 0.92 0.56 0.82 0.85 0.83 0.83 MPF 0.95 0.69 0.83 0.87 0.72 0.63 0.87 0.89 0.81 ROSE 0.88 0.59 0.92 0.86 0.65 0.41 0.9 0.86 0.76 METEOR 0.93 0.71 0.91 0.93 0.65 0.3 0.74 0.85 0.75 BLEU 0.88 0.48 0.9 0.85 0.65 0.44 0.87 0.86 0.74 TER 0.83 0.33 0.89 0.77 0.5 0.12 0.81 0.84 0.64 Table 2: Correlation coefficients with human judgments quality measures correlate with the human judg- ing, the scheme of how to use the multiple refer- ences will be designed. The designed source codes of this paper can be freely downloaded for research pur- pose, open source online. The source code of hLEPOR measuring algorithm is available here “https://github.com/aaronlifenghan/aaron-project- hlepor”. “The final publication is available at http://www.mt-archive.info/” Acknowledgments. The authors are grateful to the Science and Technology Development Fund of Macau and the Research Committee of the University of Macau for the funding support for our research, under the reference No. 017/2009/A and RG060/09- 10S/CS/FST. The authors also wish to thank the anonymous reviewers for many helpful comments. References Akiba, Y., K. Imamura, and E. Sumita. 2001. Using Multiple Edit Distances to Automatically Rank Ma- chine Translation Output. Proceedings of MT Sum- mit VIII , Santiago de Compostela, Spain. Arnold, D. 2003. Why translation is difficult for com- puters. In Computers and Translation: A transla- tor’s guide , Benjamins Translation Library. Avramidis, E., Popovic, M., Vilar, D., Burchardt, A. 2011. Evaluate with Confidence Estimation: Ma- chine ranking of translation outputs using grammat- ical features. Proceedings of ACL-WMT , pages 65- 70, Edinburgh, Scotland, UK. Callison-Bruch, C., Koehn, P., Monz, C. and Zaidan, O. F. 2011. Findings of the 2011 Workshop on Statisti- cal Machine Translation. Proceedings of ACL-WMT , pages 22-64, Edinburgh, Scotland, UK. Carroll, J. B. 1966. Aan experiment in evaluating the quality of translation. Languages and machines: computers in translation and linguistics , Automat- ic Language Processing Advisory Committee (AL- PAC), Publication 1416, Division of Behavioral Sci- ences, National Academy of Sciences, National Re- search Council, page 67-75. Chan, Y. S. and Ng, H. T. 2008. MAXSIM: A maxi- mum similarity metric for machine translation eval- uation. Proceedings of ACL 2008: HLT , pages 55–62. Chen, Boxing, Roland Kuhn and Samuel Larkin. 2012. PORT: a Precision-Order-Recall MT Evalu- ation Metric for Tuning. Proceedings of 50th ACL) , pages 930–939, Jeju, Republic of Korea. Collins, M. 2002. Discriminative Training Method- s for Hidden Markov Models: Theory and Experi- ments with Perceptron Algorithms. Proceedings of the ACL-02 conference, Volume 10 (EMNLP 02) , pages 1-8. Stroudsburg, PA, USA . Denkowski, M. and Lavie, A. 2011. Meteor: Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. Proceed- ings of (ACL-WMT) ,pages 85-91, Edinburgh, Scot- land, UK. Doddington, G. 2002. Automatic evaluation of ma- chine translation quality using n-gram co-occurrence statistics. Proceedings of the second internation- al conference on Human Language Technology Re- search , pages 138-145, San Diego, California, USA. Echizen-ya, H. and Araki, K. 2010. Automatic eval- uation method for machine translation using noun- phrase chunking. Proceedings of ACL 2010 , pages 108–117. Association for Computational Linguistic- s. Han, Aaron L.-F., Derek F. Wong, and Lidia S. Chao. 2012. LEPOR: A Robust Evaluation Metric for Ma- chine Translation with Augmented Factors. Pro- ceedings of the 24th International Conference of COLING, Posters, pages 441-450, Mumbai, India. Han, Aaron L.-F., Derek F. Wong, Lidia S. Chao, and Liangye He. 2013. Automatic Machine Trans- lation Evaluation with Part-of-Speech Information.
  • 7. Proceedings of the 16th International Conference of Text, Speech and Dialogue (TSD 2013), LNCS Vol- ume Editors: Vaclav Matousek et al. Springer-Verlag Berlin Heidelberg. Plzen, Czech Republic. Han, Aaron L.-F., Derek F. Wong, Lidia S. Chao, Liangye He, Shuo Li, and Ling Zhu. 2013b. Phrase Mapping for French and English Treebank and the Application in Machine Translation Evalu- ation. Proceedings of the International Conference of the German Society for Computational Linguis- tics and Language Technology, (GSCL 2013), LNC- S Volume Editors: Iryna Gurevych, Chris Biemann and Torsten Zesch. Darmstadt, Germany. Isozaki, H., Hirao, T., Duh, K., Sudoh, K., and Tsuka- da, H. 2010. Automatic evaluation of translation quality for distant language pairs. Proceedings of the 2010 Conference on EMNLP , pages 944–952, Cambridge, MA. Koehn, P. 2010. Statistical Machine Translation. Cam- bridge University Press . Marino, B. Jose, Rafael E. Banchs, Josep M. Crego, Adria de Gispert, Patrik Lambert, Jose A. Fonollosa, and Marta R. Costa-jussa. 2006. N-gram based ma- chine translation. Journal of the Computational Lin- guistics ,Vol. 32, No. 4. pp. 527-549, MIT Press. Lin, Chin-Yew and E.H. Hovy. 2003. Automatic Eval- uation of Summaries Using N-gram Co-occurrence Statistics. Proceedings of HLT-NAACL 2003, Ed- monton, Canada. Lita, Lucian Vlad, Monica Rogati and Alon Lavie. 2005. BLANC: Learning Evaluation Metrics for MT. Proceedings of the HLT/EMNLP, pages 740–747, Vancouver. Liu D. and Daniel Gildea. 2006. Stochastic iterative alignment for machine translation evaluation. Pro- ceedings of ACL-06, Sydney. Mirkin S., Lucia Specia, Nicola Cancedda, Ido Dagan, Marc Dymetman, and Idan Szpektor. 2009. Source- Language Entailment Modeling for Translating Un- known Terms. Proceedings of the ACL-IJCNLP 2009) , pages 791–799, Suntec, Singapore. Och, F. J. 2003. Minimum Error Rate Training for S- tatistical Machine Translation. Proceedings of ACL- 2003 , pp. 160-167. Papineni, K., Roukos, S., Ward, T. and Zhu, W. J. 2002. BLEU: a method for automatic evaluation of ma- chine translation. Proceedings of the ACL 2002 , pages 311-318, Philadelphia, PA, USA. Petrov, S., Leon Barrett, Romain Thibaux, and Dan K- lein 2006. Learning accurate, compact, and inter- pretable tree annotation. Proceedings of the 21st A- CL , pages 433–440, Sydney. Popovic, M. 2011. Morphemes and POS tags for n- gram based evaluation metrics. Proceedings of WMT , pages 104-107, Edinburgh, Scotland, UK. Schmid, H. 1994. Probabilistic Part-of-Speech Tag- ging Using Decision Trees. Proceedings of Inter- national Conference on New Methods in Language Processing , Manchester, UK. Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul J. 2006. A study of translation edit rate with targeted human annotation. Proceedings of the AMTA, pages 223-231, Boston, USA. Snover, Matthew G., Nitin Madnani, Bonnie Dorr, and Richard Schwartz. 2009. TER-Plus: paraphrase, se- mantic, and alignment enhancements to Translation Edit Rate. J. Machine Translation, 23: 117-127. Song, X. and Cohn, T. 2011. Regression and rank- ing based optimisation for sentence level MT eval- uation. Proceedings of the WMT , pages 123-129, Edinburgh, Scotland, UK. Su, Hung-Yu and Chung-Hsien Wu. 2009. Improving Structural Statistical Machine Translation for Sign Language With Small Corpus Using Thematic Role Templates as Translation Memory. IEEE TRANS- ACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING , VOL. 17, NO. 7. Su, Keh-Yih, Wu Ming-Wen and Chang Jing-Shin. 1992. A New Quantitative Quality Measure for Ma- chine Translation Systems. Proceedings of COL- ING, pages 433–439, Nantes, France. Talbot, D., Kazawa, H., Ichikawa, H., Katz-Brown, J., Seno, M. and Och, F. 2011. A Lightweight Evalu- ation Framework for Machine Translation Reorder- ing. Proceedings of the WMT, pages 12-21, Edin- burgh, Scotland, UK. Tillmann, C., Stephan Vogel, Hermann Ney, Arkaitz Zubiaga, and Hassan Sawaf. 1997. Accelerated DP Based Search For Statistical Translation. Pro- ceedings of the 5th European Conference on Speech Communication and Technology . Turian, J. P., Shen, L. and Melanmed, I. D. 2003. E- valuation of machine translation and its evaluation. Proceedings of MT Summit IX , pages 386-393, New Orleans, LA, USA. Weaver, Warren. 1955. Translation. Machine Trans- lation of Languages: Fourteen Essays, In William Locke and A. Donald Booth, editors, John Wiley and Sons. New York, pages 15—23. White, J. S., O’Connell, T. A., and O’Mara, F. E. 1994. The ARPA MT evaluation methodologies: Evolu- tion, lessons, and future approaches. Proceedings of AMTA, pp193-205.
  • 8. Wong, B. T-M and Kit, C. 2008. Word choice and word position for automatic MT evaluation. Work- shop: MetricsMATR of AMTA, short paper, 3 pages, Waikiki, Hawai’I, USA. Xiong, D., M. Zhang, H. Li. 2011. A Maximum- Entropy Segmentation Model for Statistical Machine Translation. IEEE Transactions on Audio, Speech, and Language Processing , Volume: 19, Issue: 8, 2011 , pp. 2494- 2505.
  • 9. Figure 1: N-gram word alignment algorithm Figure 2: Example of n-gram word alignment Figure 3: Example of NPD calculation