Unblocking The Main Thread Solving ANRs and Frozen Frames
Translation studies: Simplification and Explicitation Universals
1. Translation Studies:
Simplification and Explicitation Universals
Claudiu Mih˘il˘
a a
Faculty of Computer Science,
”Al.I. Cuza” University of Ia¸i,
s
16, General Berthelot Street,
700483 Ia¸i, Romania
s
claudiu.mihaila@info.uaic.ro
Abstract. The characteristics exhibited by translated texts compared
to non-translated texts have always been of great interest in Translation
Studies. Two universals, namely simplification and explicitation, are re-
viewed in this report, presenting some of the studies that have been
undertaken for their confirmation or, on the contrary, their disconfirma-
tion. We describe the corpora, the methods, and the results, and analyse
the conclusions of several important research papers.
Key words: translationese, translation studies, translation universal,
corpus linguistics
1 Introduction
The idea of translation studies to search for regularities and general laws is not
new; Gideon Toury is the best-known advocate for general laws of translation
[1]. He proposed this as a fundamental task of descriptive translation studies
due to the fact that translated language is believed to manifest certain universal
features, as a consequence of the translation process. Translations exhibit their
own specific lexico-grammatical and syntactic characteristics [2–4]. These ”fin-
gerprints” that the translation process leaves behind were first described by
Gellerstam and named generically translationese [5].
More recently, it has been stated that there are common characteristics which
all translations share, regardless of the source and the target languages [6].
Although mostly intuitively, Mona Baker defines several such universal laws.
Additionally, she observes the power that resides in electronic corpora and
automatic natural language processing systems, in comparison to the manual
contrastive studies undertaken by previous scholars on small-scale collections
of texts. She includes in her list of universals, amongst others, simplification,
explicitation, normalisation, and convergence.
However, the issue of the existence of translation universals remains highly
controversial. While some scientists report that they have found sufficient proof
that such translation laws exist [7], others consider that it is not possible to even
2. hypothesise on universals since we are not able to capture all translations from
all languages and from all times [8].
The translation universals field is thus a real target of debate in the trans-
lation studies domain in the last fifteen years, bringing together different perspec-
tives of the language of translation. Perhaps the main reason to investigate these
hypotheses is to raise awareness among translators about the conscious or uncon-
scious effects over translated texts, and the relationship between language and
culture [7]. Bringing unconscious tendencies to light will emphasise translators’
decisions and strategies, and hence should pave the way to more accurate trans-
lations, with ”more desired effects and fewer unwanted ones” [9].
The fundamental aim of this line of research is to model a language-indepen-
dent learning system, able to distinguish between translated and non-translated
texts. This development has implications in providing a wide applicability for
other languages, thus enhancing the possibilities of study of these universals.
Furthermore, it becomes feasible to determine which are the characteristics that
influence the most the translated language.
From a practical perspective, a system that automatically identifies transla-
tionese (improved or not by the inclusion of specific features of the considered
universals) may be of great help in the self-assessment of professional translators,
or in the assessment of their training process. Moreover, an automatic transla-
tionese identifier may significantly improve other nlp applications. For instance,
such a system may be integrated in a statistical machine translation framework
in order to identify translation direction [10]. Another possible application is its
use in multilingual plagiarism detection, topic that is tackled more intensively
in the last period.
The report is structured as follows: section 2 contains brief descriptions of
translation universals, whilst in section 3 we review the related work in this
domain, focussing on simplification and explicitation. Finally, conclusions are
drawn in section 4.
2 Translation universals
The universals attracted considerable attention from translation experts, but
their formulation and initial explanation has been based on intuition and intro-
spection with ulterior corpus research limited to comparatively small-size cor-
pora, literary or newswire texts and semi-manual analysis. Moreover, previous
research has not provided sufficient guidance as to which are the features which
account for these universals to be regarded as valid [11].
Various so-called translation universals as universal tendencies of the trans-
lation process, laws of translation and norms of translation have been suggested
in the literature [12, 13, 6, 7].
Toury proposed two laws of translation: the law of standardisation and the
law of interference [13]. Baker defined four possible translation universals [6,
14]. The four universals, namely simplification, explicitation, convergence, and
normalisation, are the ones which are the most intensively studied universals
3. in the recent years. The simplification universal is described as the tendency
of translators to produce simpler and easier-to-follow texts, whilst explicitation
refers to introducing overt information into the translation that is implicit in
the source language [6]. Convergence states that the translations become more
similar to one another than the non-translated texts are, and normalisation
represents the conscious or unconscious rendering of idiosyncratic text features
in order to make them conform to the typical textual characteristics of the target
language.
Laviosa continued this line of research by proposing features for simplification
in a corpus-based study [7]. Despite some evidence of the existence of such a
phenomenon, there is still a remarkable challenge in defining the features which
characterise the simplification universal.
3 Related work
A number of papers undertake certain experiments towards the research of the
universals, however without any clear-cut conclusions. Nevertheless, it seems
that these problematic claims require a strategy of investigation divided in two
linear stages: first, the investigation of the proposed translation tendencies, and
afterwards the investigation of the universality factor.
On the one hand, the claims themselves, without considering the universality
aspect, require adequate practical support in order to be validated as true or
false. On the other hand, the universality characteristic is a matter of discussion,
as the coverage implied by this term is too wide for the lack of evidence provided
for different languages. The condition needed for the universality aspect to be
widely accepted is to be validated for all languages, or at least for all language
families.
In what follows, we will see the current status of two of the hypothesised
universals, simplification and explicitation, going through some of the most
prevalent research undertaken in the field.
3.1 Simplification
Recently, a corpus-based approach which tests the statistical significance of
features proposed to investigate the simplification universal has been exploited
for Spanish [11, 15].
In [11], Corpas tries to verify the validity of the simplification universal
on a Spanish comparable corpus of medical and technical, translated and non-
translated texts produced by both professional and semi-professional translators.
Simplification seems to be validated for the lexical richness feature. Despite
this, it is contradicted in terms of complex sentences, sentence length, depth of
syntactical trees, information load, and ambiguity.
Nonetheless, in [15], the authors use the same corpora as in [11] and perform a
deeper analysis, exploiting other features as well. The experiments revealed that
the translated texts contain a lower level of lexical richness and density, a lower
4. number of discourse markers, and less simple and significantly shorter sentences.
However, the simplification traits are more visible only on the technical texts,
and to a lesser degree on the professionally translated medical texts.
Furthermore, Ilisei et al. develop a supervised learning system that is able to
distinguish with a very high accuracy in some cases between translated and
non-translated texts, also for the Spanish language [16, 17]. They use three
comparable corpora, of which two are related to the medical domain, and one
contains technical texts, and extract 21 language-independent features for their
learning system to exploit.
Table 1 includes the accuracies of various trained classifiers tested in [17].
The BayesNet, Simple Logistic, SVM, and Meta-classifier reach an incredible
value of 97.62% in technical texts, with the SVM result statistically significantly
better than without using simplification features.
Table 1. Classification accuracy results on medical and technical test datasets with
regard to simplification features (SF) [17].
Including SF Excluding SF
Classifier
Medical Technical Medical Technical
Baseline (ZeroR) 64.71% 66.67% 64.71% 66.67%
Naive Bayes 71.57% 95.24% 71.57% 80.95%
BayesNet 73.53% 97.62% 71.57% 92.86%
Jrip 79.42% 95.24% 72.55% 92.86%
Decision Tree 77.45% 92.86% 75.49% 95.24%
Simple Logistic 77.45% 97.62% 79.41% 83.33%
SVM 75.49% 97.62% 74.51% 69.05%
Meta-classifier 82.35% 97.62% 78.43% 92.86%
Aiming at determining which are the most salient features that lead to
these results, Ilisei et al. analyse the outputs of the various classifiers, such
as Decision Tree and Jrip, and use attribute evaluators, such as Chi-Square
and Information Gain. They conclude that lexical richness influences mostly
the classification, closely followed by sentence length, proportions of pronouns,
conjunctions, grammatical words, and lexical words; other features influence also
the classification, but in a smaller proportion. Both lexical richness and sentence
length are features considered to be indicative of the simplification hypothesis,
widely discussed and studied in the past decade. Sentence length is a characte-
ristic which posed a certain difficulty in its interpretation in the study undertaken
in [15]. The most influential features identified with these evaluators concur with
the first-level attributes from the intuitive output of the Decision Tree and Jrip
classifiers [18].
A different perspective for this research topic is undertaken by Baroni and
Bernardini, reporting a machine learning approach for the task of classifying
Italian texts as translated or originals [19]. Several features have been employed
5. in the feature vector, including unigrams, bigrams, trigrams, word forms, lem-
mas, and part-of-speech tags. Therefore, they are able to prove that shallow
data representations can be sufficient to automatically distinguish professional
translations from non-translated texts with an accuracy above the chance level,
and hypothesise that this representation captures the distinguishing features of
translationese. Additionally, the system’s classification quality seems to be much
higher than that of human judges when faced with the same task. However, it is
to be explicitly noted that in this study the feature vector is highly dependent
on the language the system works on.
The simplification universal is known to be a controversial claim, with dif-
ferent studies bringing evidence both for and against it. However, it has been
contested by studies on collocations [20], lexical use [21], and syntax [22].
For instance, Jantunen does not manage to establish clear and consistent
evidence of a universal untypical lexical-grammatical patterning when operating
on a subset of the Corpus of Translated Finnish (CTF) [22]. He tests the hypo-
thesis on three near-synonym degree modifiers, hyvin, kovin, and oikein, all
roughly meaning very, including a quantitative and qualitative analysis to pro-
vide a comprehensive description. He uses the Three Phase Comparative Ana-
lysis (TPCA) on three corpora, one of original Finnish (CNF), one of texts
translated from various Indo-European and Finno-Ugric languages (MuCTF),
and one translated from English (MoCTF). As described in Table 2, the author
shows that the modifiers are almost twice as frequent in the translations (after
normalisation per 100 000 tokens), and that this depends on the source language:
the difference is not statistically significant for English, but it is for the MuCTF
for a critical value for χ2 at 0.05 level of significance.
Table 2. Frequencies of hyvin, kovin, and oikein in the CNF, MuCTF and MoCTF
corpora [22].
Modifier CNF MuCTF MoCTF
hyvin 36 66 70
kovin 18 39 38
oikein 12 15 20
Total 66 120 128
Jantunen then extracts the top-ranked collocations for each of the three
modifiers from each of the three corpora. In the case of hyvin, the collocations
match in a extremely small degree. However, in the case of the other two
modifiers, the collocations overlap to a high degree, therefore making it rather
difficult to draw conclusions. Furthermore, the colligation analysis for hyvin
shows no difference between original and translated into Finnish texts. The
conclusion that Jantunen reports is that translations tend to exhibit untypical
lexical combinations, due to the source language, and that grammatical combina-
6. tions tend to be similar in translations and original texts, although the influence
of the source languages cannot be excluded.
3.2 Explicitation
Even though the surge for translation universals happened in the last two de-
cades, pointers towards the law of explicitation have existed since the middle of
the century. Vinay performed a comparative study in 1958 between French and
English, and defines explicitation as:
”the process of introducing information into the target language which is
present only implicitly in the source language, but which can be derived
from the context or the situation” [23]
Furthermore, Blum-Kulka notices the tendency of translations to be more
explicit compared to the source texts, regardless of the language-specific expli-
citness [12]. Later, Baker defines the explicitation universal as the tendency to
”spell things out rather than leave them implicit” [14].
Two categories of explicitation are described by Pym: the obligatory one,
forced by the language specificity, and the voluntary one, when the translator is
adding optional information in the text to avoid misinterpretations [24]. Vander-
auwera proposes the following list of explicitation repertoires: expansion of con-
densed passage; addition of modifiers, qualifiers and conjunctions to achieve
greater transparency; and addition of extra information and insertion of expla-
nations, amongst many others [25].
Another study, which exploits the Translational English Corpus (TEC), indi-
cates a significant use of the optional that with the verbs say and tell in trans-
lated texts compared to a British National Corpus (BNC) comparable sub-corpus
[26]. Tables 3 and 4 contain the results of the analysis, having included them
both as absolute and percentage values. It is immediately clear that the that-
connective is far more frequent in TEC than in BNC. By contrast, the zero-
connective is more frequent for all forms of both verbs in the BNC corpus.
These differences have been proven to be statistically significant. Furthermore,
the results of the say and tell study were consistent with findings by Burnett
who reviewed use of the verbs suggest, admit, claim, think, believe, hope and
know in both TEC and BNC [27].
A similar study investigating the verb promise found the same pattern be-
tween translated and non-translated English [28]. Table 5 shows that although
the number of occurrences of ’promise’ followed by that or zero connective is very
close in the two corpora (131 in the TEC and 135 in the BNC), the distributions
are almost directly inverse.
Also, the explicitation universal is investigated in simultaneous interpreting,
and Gumul concludes that, to a certain extent, explicitation appears to be
dependent on the direction of interpreting [29].
In contrast to simplification, the explicitation universal is maybe the least
controversial hypothesis according to the conclusions of several studies. However,
7. Table 3. Distribution of say + that/zero in the BNC and TEC [26].
Connective BNC TEC
712 775
that
23.72% 50.22%
2289 768
zero
76.28% 49.78%
Total 3001 1543
Table 4. Distribution of tell + that/zero in the BNC and TEC [26].
Connective BNC TEC
997 719
that
41.45% 62.74%
1408 427
zero
58.55% 37.26%
Total 2405 1146
Table 5. Distribution of promise + that/zero in the BNC and TEC [28].
Connective BNC TEC
46 89
that
34.1% 67.9%
89 42
zero
65.9% 32.1%
Total 135 131
8. the study of English into Korean translation described by Cheong contradicts
this claim [30].
Cheong clearly distinguishes between two reverse operations, explicitation
and implicitation, and notes that implicitation has been neglected in the study
of translation universals. Therefore, by using a English-Korean corpus, he tries
to determine which of the two phenomena is the dominant one, to test whether
the direction of the translation has any effect on them, and to identify the factors
that influence the phenomena. After applying four different measurement units
and a set of newly devised variables, the author concludes that both explicitation
and implicitation are present in the target text, and that the direction of the
translation influences the behaviour of texts regarding the two phenomena, even
in cases where the identical language pair is involved [30].
Although no studies have yet been performed in the case of the Romanian
language, it is possible for explicitation and implicitation to manifest themselves
in Romanian translations too. For instance, the flexible Romanian grammar
allows zero anaphora to exist with a relatively high frequency, of 0.32 zero
pronominal anaphors per sentence [31]. Therefore, when translating into Roma-
nian from a language with a very low degree of zero pronouns, such as English,
French, or German, the explicit information in the source text may become
encoded implicitly in some other word in the target text, without it being
demanded by grammar rules. Thus, the identification of zero pronouns in Roma-
nian [32] might prove itself a valuable characteristic of implicitation. On the
other hand, when translating between language pairs both of which have a high
degree of zero anaphora (e.g., Spanish, Portuguese, Korean, or Chinese), both
explicitation and implicitation might occur, in order to avoid ambiguities or to
create a more natural text.
4 Conclusions
This report contains in brief some of the results that have been obtained in the
field of translation studies, more specifically on the simplification and explicita-
tion universals. We have described various methodologies of study, and presented
the conclusions of the authors regarding the validity of the two universals.
Although intensely studied in the last two decades, simplification is not
yet completely and clearly confirmed as a universal. Although there are many
differently undertaken studies supporting it, there are also studies which contra-
dict it. It is still a difficult task to extract the characteristics of this phenomenon.
Nonetheless, efforts are being continuously made on different language pairs and
promising results started to appear in the past few years.
In the case of explicitation, things seem to be clearer than with simplification.
It occurs quite often in many translations, mostly in order to avoid misinterpre-
tations in the target text. However, there are cases when explicitation appears
combined with its reverse function, implicitation, making it rather complicated
to analyse the data and draw conclusions. Nevertheless, most studies confirm
this hypothesis, making it one of the most plausible universals.
9. A successful validation of translation universals could be of great help in many
other nlp tasks, which rely on translations. For instance, statistical machine
translations could be improved by automatically determining the direction of
translation, and multilingual plagiarism detection may benefit too. Moreover,
human translators would become more conscious of the way they translate, and
such universals could aid them to self-assess their work. However, due to the
number of disconfirming experiments, it is possible for the name of translation
universal to not be the most felicitous one; one could rename it to, for example,
translation trend.
References
1. Toury, G.: In search of a theory of translation. The Porter Institute for Poetics
and Semiotics, Tel Aviv (1980)
2. Borin, L., Pr¨tz, K.: Through a glass darkly: Part-of-speech distribution in original
u
and translated text. In Daelemans, W., Sima’an, K., Veenstra, J., Zavrel, J., eds.:
Computational Linguistics in the Netherlands 2000. (2001) pp. 30–44
3. Hansen, S.: The Nature of Translated Text - An Interdisciplinary Methodology for
the Investigation of the Specific Properties of Translations. Saarland University,
Saarbr¨cken (2003)
u
4. Teich, E.: Cross-Linguistic Variation in System and Text. Mouton de Gruyter,
Berlin (2003)
5. Gellerstam, M.: Translationese in Swedish novels translated from English. In
Wollin, L., Lindquist, H., eds.: Translation studies in Scandinavia. CWK Gleerup
(1986) pp. 88–95
6. Baker, M.: Corpus linguistics and translation studies: Implications and
applications. In Baker, M., Francis, G., Tognini-Bonelli, E., eds.: Text and
Technology: In Honour of John Sinclair. John Benjamins, Amsterdam -
Philadelphia (1993)
7. Laviosa, S.: Corpus-based Translation Studies. Theory, Findings, Applications.
Rodopi, Amsterdam - New York (2002)
8. Tymoczko, M.: Computerised corpora and translation studies. Meta 43(4) (1998)
pp. 652–659
9. Chesterman, A.: A causal model for translation studies. In Olohan, M., ed.:
Intercultural Faultlines. Research Models in Translation Studies I: Textual and
Cognitive Aspects. St. Jerome, Manchester (2000)
10. Goutte, C., Kurokawa, D., Isabelle, P.: Improving SMT by learning translation
direction. In: EAMT 2009 workshop ”Statistical Multilingual Analysis for Retrieval
and Translation”. (2009)
11. Corpas Pastor, G.: Investigar con corpus en traducci´n: los retos de un nuevo
o
paradigma. Peter Lang, Berlin & New York (2008)
12. Blum-Kulka, S.: Shifts of cohesion and coherence in translation. In House, J.,
Blum-Kulka, S., eds.: Interlingual and Intercultural Communication. Discourse and
Cognition in Translation and Second Language Acquisition. Narr (1986) pp. 17–35
13. Toury, G.: Descriptive Translation Studies and Beyond. John Benjamins,
Amsterdam (1995)
14. Baker, M.: Corpus-based translation studies: The challenges that lie ahead.
In Somers, H., ed.: Terminology, LSP and Translation: Studies in Language
10. Engineering in Honour of Juan C. Sager. John Benjamins, Amsterdam -
Philadelphia (1996)
15. Corpas Pastor, G., Mitkov, R., Afzal, N., Pekar, V.: Translation universals: Do
they exist? A corpus-based NLP study of convergence and simplification. In:
Proceedings of the AMTA. (2008)
16. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Towards simplification: A
supervised learning approach. In: Proceedings of Machine Translation 25 Years
On. (November 2009)
17. Ilisei, I., Inkpen, D., Corpas Pastor, G., Mitkov, R.: Identification of translationese:
A machine learning approach. In Gelbukh, A., ed.: Proceedings of the 11th Inter-
national Conference on Computational Linguistics and Intelligent Text Processing
(CICLing). (2010) pp. 503–511
18. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1) (1986) pp.
81–106
19. Baroni, M., Bernardini, S.: A New Approach to the Study of Translationese:
Machine-learning the Difference between Original and Translated Text. Lit
Linguist Computing 21(3) (2006) pp. 259–274
20. Mauranen, A.: Strange strings in translated language: A study on corpora. In
Olohan, M., ed.: Intercultural Faultlines. Research Models in Translation Studies
I: Textual and Cognitive Aspects. St. Jerome, Manchester (2000) pp. 119–141
21. Jantunen, J.H.: Synonymity and lexical simplification in translations: A corpus
based approach. Across Languages and Cultures 2(1) (2001) pp. 97–112
22. Jantunen, J.H.: Untypical patterns in translations: Issues on corpus methodology
and synonymity. In Mauranen, A., Kujamaki, P., eds.: Translation Universals : Do
They Exist? Volume 48. John Benjamins (2004) pp. 101–126
23. Vinay, D.: Stylistique Comparee du Fran¸ais et de l’Anglais. Didier (1958)
c
24. Pym, A.: Explaining explicitation. In Karoly, K., F´ris, A., eds.: New Trends
o
in Translation Studies. In Honour of Kinga Klaudy. Akad´miai Kiad´, Budapest
e o
(2005) pp. 29–34
25. Vanderauwera, R.: Dutch novels translated into English: the transformation of a
”Minority” literature. Rodopi, Amsterdam (1985)
26. Olohan, M., Baker, M.: Reporting ’that’ in translated English: Evidence for
subconscious processes of explicitation? Across Languages and Cultures 1(2)
(2000) pp. 141–158
27. Burnett, S.: A corpus-based study of translational English. Master’s thesis,
University of Manchester (1999)
28. Olohan, M.: Spelling out the optionals in translation: A corpus study. In: UCREL
Technical Papers. Volume 13. (2001) pp. 423–432
29. Gumul, E.: Explicitation in simultaneous interpreting: A strategy or a byproduct
of language mediation? Across Languages and Cultures 7(2) (2006) pp. 171–190
30. Cheong, H.J.: Target text contraction in English-into-Korean translations: A
contradiction of presumed translation universals? Meta 51(2) (2006) pp. 343–367
31. Mih˘il˘, C., Ilisei, I., Inkpen, D.: Romanian Zero Pronoun Distribution: A
a a
Comparative Study. In: Proceedings of the 7th International Conference on
Language Resources and Evaluation (LREC). (2010)
32. Mih˘il˘, C., Ilisei, I., Inkpen, D.: To Be or Not to Be a Zero Pronoun: A Machine
a a
Learning Approach for Romanian. In: Proceedings of the Processing ROmanian in
Multilingual, Interoperational and Scalable Environments Workshop (PROMISE).
(2010)