The document introduces the EXPERT ITN project, which aims to train young researchers on improving data-driven machine translation through empirical approaches. The project will support researchers during their training and research, with the goal of producing future leaders in the field. It describes the objectives to improve existing corpus-based translation tools by considering user needs, collecting data, incorporating linguistic processing, and developing hybrid approaches. The project consists of 12 individual research projects across 6 work packages and is led by an academic consortium with involvement from private sector partners.
3. What are Marie Curie ITN actions?
 Initial Training Networks (ITN):
 Offer the early-stage researchers the opportunity to improve their
research skills
 Join established research teams
 Enhance their career prospects
 Are run by consortia made up of universities, research centres and
companies
 Recruit of researchers who are in the first five years of their career for
initial training – for a research-level degree (PhD or equivalent) or be
doing initial post-doctoral research.
4. EXPERT: EXPloiting Empirical
appRoaches to Translation
 proposes
 the creation of an Initial Training Network to train young
researchers on ways to improve current data-driven MT
technologies (TM, SMT and EBMT)
 support young researchers of the network during the whole
research and development cycle, providing guidance, core
and complementary training skills and evaluating the
resulting technologies
 young researchers to become future leaders in this area
5. EXPERT
 Advocates there is no clear boundary between fully automatic
and semi-automatic translation and that they are tools that can
help human translators
 Aims to:
 improve existing corpus-based TM and MT technologies
 create hybrid technologies
 exploit the strengths of the existing technologies and address
their main limitations
 consider the needs of the users when proposing new
technologies
6. Training objectives
 EXPERT has five main Training Objectives:
 Training through research based on the set of sub-programmes
 Creating a large and diverse research community focused on a
common goal.
 Exploiting intersectoral and transnational mobility via
secondments and shorter visits to both industrial and academic
partners.
 Local training in core research and complementary skills within
both academic and industrial environments.
 Network-wide training in core research areas and
complementary skills.
7. Objectives of the project
Topic
State-of-the-art and limitations
EXPERT solutions
User
perspective
MT systems force the users to
change their working style.
Consider the real needs of translators,
involving them in the development of
technologies, and providing training to
prepare them with new skills.
Data
collection and
preparation
Existing TM, EBMT and SMT
approaches have particular data
constraints.
Investigate how data repositories can be
built automatically in a way that makes
them useful to multiple corpus-based
approaches to translation.
8. Objectives of the project (2)
Topic
State-of-the-art and limitations
EXPERT solutions
Improve
matching
and retrieval
with
linguistic
processing
Lack of linguistic processing
constrains for the retrieval of
previous translation.
Investigate matching algorithms which
rely on lexical, syntactic and semantic
variations of texts, including the use of
automatically acquired domain ontologies
and terminology databases
Hybrid
approaches
for
translation
Hybrid corpus-based solutions
consider
each
approach
individually as a tool, not fully
exploiting integration possibilities.
Fully integrate corpus-based approaches
to improve translation quality and
minimize translation effort and cost.
9. Objectives of the project (3)
Topic
State-of-the-art and limitations
EXPERT solutions
Human
translator in
the
loop:
Informing
users
and
learning from
user feedback
In interactive workflows where
humans post-edit/complete system
translations, translators are not
informed about the quality of the
translations. The translators’ choice
is at best saved for future use.
Generate confidence and quality estimation
mechanisms to allow these choices to be
based on the quality of the TM/MT output.
Make use of translators’ feedback as
produced at translation time to improve
the system on the fly.
10. Work packages
WP1: Management (UoW)
WP7: Training (UvA)
WP8: Dissemination (Pangeanic)
WP2: User perspective (UMA)
WP3: Data collection (Translated)
WP4: Language technology, domain ontologies and
terminologies (USSAR)
WP5: Learning from and informing translators (USFD)
WP6: Hybrid corpus-based approaches (DCU)
11. Projects
ESR1
Investigation of translators’ requirements from translation
technologies
UMA
WP2
ESR2
Investigation of an ideal translation workflow for hybrid
translation approaches
USAAR
WP2
ESR3
Collection and preparation of multilingual data for multiple
corpus-based approaches to translation
UMA
WP3
ESR4
Use of language technology to improve matching & retrieval in
translation memories
UoW
WP4
12. Projects (2)
ESR5
Use of terminologies and ontologies to improve corpus-based
approaches to translation
USAAR
WP4
ESR6
Learning from human feedback on the quality of the
translations
USFD
WP5
ESR7
Estimating the confidence of corpus-based approaches to
translation and the quality of the translated texts
USFD
WP5
ESR8
Investigation of how each individual corpus-based translation
approach (TM, EBMT and SMT) can benefit from each other
DCU
WP6
13. Projects (3)
ESR9
ESR10
ESR11
ESR12
Investigation of the ideal infrastructure for computer-aided
translation: pipeline with NLP tools for pre/post-processing,
SMT, EBMT and TM techniques–a hybrid CAT tool
Exploiting hierarchical alignments for linguistically-informed
SMT models to meet the hybrid approaches that aim at
compositional translation
Exploiting hierarchical alignments for a semantically-enriched
SMT system that offers an extension to existing TMs to allow
incremental, recursive partial match of the input using
hierarchical constructions containing variables
Investigation of methodologies to evaluate the improved SMT,
EBMT and TM prototypes and new hybrid computer-aided
translation technology proposed in EXPERT
DCU
WP6
UvA
WP6
UvA
WP6
UoW
WP6
14. Projects (4)
ER1
Investigation of automatic methods
preparation of multilingual data
ER2
ER3
for
collection
&
Translated
WP3
Implementation and evaluation (including user aspects) of the
improved SMT, EBMT and TM prototypes proposed in EXPERT
Hermes
WP6
Implementation and evaluation of the new hybrid computeraided translation technology proposed in EXPERT
Pangeanic
WP6
15. Consortium
 Academic partners:
 University of Wolverhampton, UK – coordinator
 Universidad de Malaga, Spain
 University of Sheffield, UK
 Universitaet des Saarlandes, Germany
 Dublin city University, Ireland
 Universiteit Van Amsterdam, Netherlands
 Private sector:
 Pangeanic, Spain
 Translated SRL, Italy
 Hermes, Spain
 Associated partners:
 Celer Soluciones S.L., Spain
 Wordfast, France