1. Human Translation &
Translation Workflow
Prof. Gloria Corpas Pastor
Dr. Jorge Leiva Rojo
Dr. MĂriam Seghiri DomĂnguez
Universidad de Málaga
Birmingham, 13th November 2013
5. MAIN TRAINING EVENTS AND CONFERENCES (WP7)
Scientific and technological training
Complementary skills training
Scientific and technological workshop
Business showcases
6. TUTORIAL ON HUMAN TRANSLATION AND TRANSLATION WORKFLOW
Relevant to all research sub-programmes (* WP1 & WP5)
Introduce the most common translation workflow to
researchers
• to learn how translators currently work
• to design new translation technologies
• to cover confidence and quality estimation in HTWs
7. LIST OF CONTENTS
Market studies (eg. industry, quality, technology,
language service providers)
The translation workflow (eg. certification, project
management, agents, emerging trends)
Training translators using corpora (compilation
protocol, analysis, translation strategies, etc.)
11. 1. Introduction
25,000 companies in the world (Translation Bureau, 2012)
1,500 translation companies in Europe; average turnover
300,000 € in 2005 (EUATC, 2005).
Translation and interpreting (+ software & website
localization) sector’s assumed value: 5.7 billion € in 2008;
9.1 billion € estimated in 2013 (European Commission,
2009).
Highest growth rate of all European industries in Europe.
World-wide annual growth: 5.13% (DePalma et al., 2013).
12. 1. Introduction
700 participants (LSP) (European Commission, 2009):
43% freelancers or sole proprietors;
36% 1-10 employees;
21% 10+ employees.
Growth of big companies is quicker than growth of the
rest of the language market (Boucau, 2009).
Supply exceeds demand; number of well-qualified
linguists is too small to cover the growing demand.
13. 1. Introduction
Six hyper-languages of the web (English, French, Italian,
German, Spanish, Japanese) and Chinese to undergo a
major growth (cf. Common Sense Advisory, 2011).
Prices dependent on exchange rates, not influenced by
inflation (cf. Goddard, 2013)
Prices relatively stable
2004-2008.
Market is very competitive. 80% of providers charge less
than 0.15 $ / word (Translation Bureau, 2012).
14. 1. Introduction
Average per-word rate for the 30 most commonly used
languages on the web fell 34.71%: 0.205 US$ (2010)
0.134
US$ (2012)
Global supply, advances in technology, economic issues
and more aggressive buyers conspired to drive down the
prices since 2008; Situation remains unchanged.
15. 1. Introduction
Domain and technological skills should be better
addressed (European Commission, 2009).
“Use of technology by LSPs is sporadic” (Translation
Bureau, 2012).
It requires an investment to build and maintain
infrastructure and a significant repository of data in order
for the tool to be effective; difficulty for small
enterprises, the bulk of businesses within the industry.
16. 1. Introduction
Decrease in resistance to MT (Systran #1; Google #2)
(European Commission, 2009).
MT does not produce a level of quality sufficient, output
to be reviewed by qualified translators
MT is not widely
adopted (large volume translations) (Translation Bureau,
2012).
HAMT is growing in usage. 2009 study indicating that HAMT
doubled the translation output and was 45% cheaper
(Translation Bureau, 2012).
18. “[A translation brief is a] definition of the
communicative purpose for which the translation
is needed. The ideal brief provides explicit or
implicit information about the intended text
function(s), the target-text addressee(s), the
medium over which it will be transmitted, the
prospective place and time and, if necessary,
motive of production or reception of the text”
(Nord, 1997).
20. 2. The translation workflow
- ISO 9001:2008, ISO 17100 (mid 2014 [Rosam, 2013])
- EN 15038:2006 (EU)
- ASTM (USA)
- GB/T 19363 (China)
- CA/CSGB-131.10 (Canada)
- To define translation’s basic terms and concepts.
- To establish the basics for the client-translation service
provider relationship to meet market needs.
- To determine the implementation of the translation
process.
21. 2. The translation workflow
EN15038 needs amendments.
“The standard although well intended does neither indicate
nor reflect the quality of the output of an LSP. Due to
downward pressures and trends in pricing, many translation
agencies need to operate with limited budgets in order to stay
competitive. As a result, if low cost and low quality translation
work is performed, the mere fact that such work is revised
does not guarantee high quality” (European Commission, 2009).
25. 3. Emerging trends
CAT tool suppliers to deal with newer media and new
crowd-based supply chains (DePalma et al., 2013).
Users want different forms of content translated:
emails, blogs, tweets.
Slight decrease in turnover due to the economic
downturn, small enterprises with turnovers below 50,000
€ (European Commission, 2009).
26. “Post-Editing is the process by which language
professionals edit machine translation outputs to
create human-quality translations” (Marcu,
2013).
29. Table of contents
1. Introduction
2. Corpora in Translation Training
3. Guidelines for Corpus Creation
3.1. Design Criteria
3.2. Compilation Protocol
4. Using Corpora to Translate
5. Using the corpus to translate
6. Corolary
31. The inclusion of documentation as a core subject in the
curriculum of Translation and Interpretation degrees clearly
underlines its importance to translators.
Training in this discipline is considered essential for a translator
given that only sufficient and conscientious work on
documentation will allow an adequate translation of a specialised
text.
32. The sources of information that may be utilised by the translator
are extremely varied, ranging from an oral consultation with an
expert to a search using specialised glossaries and dictionaries.
However, in the field of translation perhaps the most relevant
documentation activity today involves the use of the Internet
and, closely related to this, the compilation and management of
virtual corpora.
33. Here, we shall present a systematic methodology for corpus
compilation based on electronic resources available on the
Internet.
The methodology will be illustrated through the example of the
creation of a virtual corpus of Telecommunications
integrated by:
1 subcorpus in English
1 subcorpus in Spanish
35. Telecommunications, why?
Telecommunication is now the world’s largest industry [and] the
world’s fastest-changing industry from any measure of change
you can name technology, players applications and users. In one
decade, this industry is going from totally-closed, governmentcontrolled, highly regulated, monopolistic, bureaucratic, plodding
thing to an exploding fre-for-all (Newton, 1994: 1)
37. What is a corpus?
corpus, pl. corpora, from the Latin word corpus, i.e. “body”
A collection of texts assumed to be representative of a given
language, dialect, or other subset of a language, to be used for
linguistic analysis (Francis, 1982)
38. Characteristics of corpora
• collections of text
• naturally-occurring / authentic text
• representative of a given language
• collected according to specific criteria
• stored in machine-readable format
• used for linguistic analysis
39. Different types of corpora
According to what could corpora be distinguished/classified?
• language
• size
• purpose
•
40. The advantages of using corpora in translation have been shown by
various studies (cf. Laviosa, 1998; Bowker, 2002; Bowker y Pearson,
2002; Zanettin et al. 2003).
Advantages: their objectivity, their reusability and multiple usage.
They are user-friendly and allow access to and management of huge
quantities of information in almost no time.
41. Translators turn to the Internet in search of solutions to information
and documentation problems because they are not only translating
between languages but also between discourse communities and
cultures.
The compilation of corpora and the Internet appear to be two of the
most important documentation resources in the practice and
research of specialised translation.
Corpora for a particular speciality are not available for consultation
on the Internet.
Translators have no alternative other than to compile their own
virtual corpora for the specific translation that has been
commissioned in each case.
42. In order for a collection of texts to be considered
a corpus in the strict sense of the term, it must
meet:
a set of clear design criteria and
a specific compilation protocol
so that the collection may be deemed
representative of the field of specialisation or
the particular type of document that is being
translated.
44. Professional Competences
- Translating
- Linguistic and textual
- Research, information acquisition & processing
- Cultural
- Technical
The knowledge of how to compile and
use corpora is an essential part of
modern
translational
competence
(Varantola, 2003)
46. The extract comes from a brochure from the company DVEO:
<http://www.dveo.com/broadcast-systems/TDMB-and-DAB-modulator.shtml>.
47. The objective is to create a specialized corpus
on Telecommunications in English and
Spanish
compiled
exclusively
from
resources available on the Internet.
Restricted to texts that have been drawn up in
UK and Spain.
It
will
include
original
documents
(comparable corpus), complete texts and
documented.
48. CORPUS DESING
Text type: brochures, research articles,
Language/s:
English (subcorpus 1) & Spanish (subcorpus 2)
Diatopic restrictions:
United Kingdom & Spain
Original or translations: Comparable (original)
Complete text or partial: complete
Documented: yes
50. The Compilation Protocol is
integrated by 4 steps
(Seghiri,2011):
I.
II.
III.
IV.
Locating and accessing resources
Downloading Data
Text formatting
Data storage
52. The main sources of information to compile our corpus have
been:
institutional searches, carried out on the web sites of
international organisations and institutions (International
Telecommunication Union, Telefonica, etc.)
key word searches using a search engine
(www.google.com, www.yahoo.co.uk, etc.)
65. COMPARABLE CONCORDANCERS:
AntConc 3.2. is a non-commercial freely downloadable
concordancer for Windows, Mac and Linux. This versatile
software features several tools, which display lists of words and
keywords (Word List, Keyword List), list, sort and search for lexical
bundles (Collocates), generate lines in KWIC format
(Concordance), indicate the position of the keyword within a given
corpus (Concordance Plot), allow the user to have access to the
whole source file or corpus (File View).
http://www.antlab.sci.waseda.ac.jp/antconc_index.html
67. Another monolingual concordancer for Windows only is the
Multilingual Corpus Toolkit which supports many European and
Asian languages.
http://personalpages.manchester.ac.uk/staff/scott.piao/research/DownLoad/downl
oad.htm
Freeware concordancers for Mac are Conc 1.7/1.8 and Concorder
1.0.
Conc: http://www.sil.org/computing/conc/conc.html
Concorder: http://mac.softpedia.com/get/WordProcessing/Concorder.shtml
68. PARALLEL CONCORDANCERS:
A bilingual or multilingual concordancer is a program for
parallel corpora, i.e. corpora of source texts and their translations
into other languages. As a rule, this kind of software requires input
aligned at sentence level. Most bi-/multilingual concordances are
commercial. A well-known example is ParaConc 0.9, the
multilingual version of MonoConc Pro. It can analyse up to four
languages in parallel (one source text corpus and up to three target
corpora).
72. Comparable corpora are particularly useful for meeting
translators’ information needs.
Representative Corpora: finding information on
terminology, phraseology, concepts, cultural issues and
text discourse for direct and inverse translation.
73. Corpora:
instant access to authentic language and real usage
syntagmatic patterns and translation equivalents unavailable in
other resources or technologies
guidance to style, text-structuring devices and conventions in
both SL and TL
useful for the the translation of any kind of text type, language/s
and in any direction
74. Human Translation &
Translation Workflow
Prof. Gloria Corpas Pastor
Dr. Jorge Leiva Rojo
Dr. MĂriam Seghiri DomĂnguez
Universidad de Málaga
Birmingham, 13th November 2013