META-NET has received funding from the EU to support several language technology projects, including CRACKER, T4ME, CESAR, METANET4U, and META-NORD. It brings together over 60 research centers across 34 countries to build infrastructure for sharing language resources and tools. The goal is to improve the visibility, documentation, identification, availability, and interoperability of language resources in order to support both academic and commercial language technology research and development across Europe.
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Language Resources for Multilingual Europe
1. META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER
(grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119),
CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899).
Language Resources for
Multilingual Europe
Georg Rehm
META-NET Network Manager – CRACKER Coordinator
DFKI, Germany
georg.rehm@dfki.de
LT Innovate Summit – LR Dialogue Workshop, Panel “Language Resource Supply”
Brussels, Belgium, June 25, 2015
2. META-NET and META
q
60 research centres in 34 countries
(via four EU-funded projects: T4ME,
CESAR, METANET4U, META-NORD)
q
Multilingual Europe Technology Alliance,
794 members in 68 countries
http://www.meta-net.eu/members
3. http://www.meta-net.eu
q Pan-European infrastructure, bringing together providers and consumers of
language data, tools and services.
q LRs are documented, uploaded, stored, catalogued, downloaded, shared – to
improve visibility, documentation, identification, availability, interoperability.
q Caters for datasets, tools, services for LT research and development (both
academic and commercial); META-SHARE includes repository software, a
metadata model, licensing kit, statistics.
q 29 distributed repositories maintained
by 37 organisations in 25 countries.
q 2.500+ resources (corpora: 49%,
lexical: 38%, tools/services: 12%),
covering ca. 100 languages.
q 7.000+ downloads in total; ca. 70%
of all LRs have been downloaded.
4.
5. MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian,
Italian, Polish, Romanian
weak or no support
Basque, Bulgarian, Croatian, Czech,
Danish, Estonian, Finnish, Galician,
Greek, Icelandic, Irish, Latvian,
Lithuanian, Maltese, Norwegian,
Portuguese, Serbian, Slovak, Slovene,
Swedish, Welsh
excellent
English
good
Czech, Dutch, French,
German, Hungarian,
Italian, Polish,
Spanish, Swedish
moderate fragmentary
Basque, Bulgarian, Catalan,
Croatian, Danish, Estonian, Finnish,
Galician, Greek, Norwegian,
Portuguese, Romanian, Serbian,
Slovak, Slovene
Icelandic, Irish, Latvian,
Lithuanian, Maltese, Welsh
weak/no supportexcellent
Resources
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Romanian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portuguese
Finnish
Catalan
Polish
Hungarian
Czech
Italian
German
Dutch
Spanish
French
English
Levelofsupport
Languages with names in red
have little or no MT support
Language White Paper Series
Europe’s Languages in the Digital Age (2011/2012)
Summary: “At Least 21 European Languages
in Danger of Digital Extinction!”
6. http://www.cracker-project.eu • http://www.meta-net.eu
LR-Related Activities
2015 2016 2017
M12
M1
M24
M36
Kick-off meeting
for all ICT-17
Projects
translate5
WMT
2016
WMT
2017
IWSLT
2015
IWSLT
2016
IWSLT
2017
QT Marathon
2015
QT Marathon
2016
Roadmap for
European MT
Research
Survey on the State
of HQMT in Industry
and LSPs
SRIA
(initial version)
SRIA
(update)
SRIA
(final)
version 2version 1
• Production of resources (e.g., for WMT
2016 and 2017, IWSLT 2015-2017)
• Tools for resources (quality control,
evaluations; towards the idea of a smart
workbench for translators)
• Strategies and roadmaps for resources
(SRIA, Roadmap for European MT
Research)
• Exchange and sharing facility for
resources (META-SHARE)
Maintenance of Operations and Outreach
• Provide services, adapt them to evolving user requirements and licensing landscape
• adapt, streamline and extend the metadata schema;
• adapt licensing toolkit to new international licensing setups;
• streamline and simplify operations for repository providers and data depositors.
• Technical support and bug fixing
7. http://www.cracker-project.eu • http://www.meta-net.eu
• Federation of projects – core seed:
the group of H2020-ICT17 projects.
• Multi-lateral Memorandum of Understanding,
ca. 20 projects in total (including FP7 and
H2020-ICT15), to be approached in two
phases (first phase almost completed).
• Selected areas of collaboration: data
management and repositories (including
Data Management Plan), tools and
technologies; shared tasks and evaluations.
• http://www.cracking-the-language-barrier.eu
will be launched soon.
8. MT Use Cases and Language Resources
q “Usability” is an unusual generic dimension for the evaluation of a resource.
q Reason: the majority of LRs can be used in many different research or application scenarios.
q More relevant dimensions: quality, availability, coverage, maturity, sustainability, adaptability,
size, format, license, language, style etc. – depending on the use case.
q When talking about LRs for MT, it’s important to be specific in terms of the respective use case.
q Reason: the use case puts specific requirements on the type of LR and relevant dimensions.
Scenario MT Use Case
Maturity of
Technology
Human
Involvement
Relevance of
Quality
Methods LR Requirements
Inbound
Translation
(written texts)
Gist transla-
tion, provide
an idea of a
text’s
contents
Deployed
(Google
Translate),
research
ongoing
– Quality of MT
secondary
Statistical MT Very large aligned data
sets (the more data, the
better)
Outbound
Translation
(written texts)
Production
quality, for
publication
Research on
HQMT has
started, no
POCs yet
– Quality of MT
extremely
important,
ideally HQ
New approach needed,
SMT, RBMT, hybrid
systems (needs quality
estimation methods)
Deeply annotated data
sets with quality
information (also needs
more research)
Outbound
Translation
(written texts)
Production
quality, for
publication
Deployed,
usable via
LSPs
Post-editing Quality of
initial MT step
important but
secondary
MT, followed by post-
editing, ideally with
smart translation
workbenches (CAT)
Translation memories
and term databases
(large coverage, high
quality etc.)
Speech to
Speech
Translation
Enable face-
to-face
conversations
Research
ongoing but
POCs exist
(Skype)
– Quality of MT
secondary
Recognition and
generation of spoken
language; statistical MT
etc.
Several additional
technologies and LR
types needed (such as
very large speech
databases)
http://www.meta-net.eu 8
9. META-NET SRA LR Roadmap
q Infrastructure – maintain and extend sharing facility; promote
documentation through metadata; intensify cooperation
q Coverage, Quality, Adequacy – increase number of LRs for all
European languages to address application needs; promote
evaluation and validation to improve LR quality constantly
q Acquisition – define best practices for LR production; automate
production; distributed production (crowd-sourcing, social media,
gamification etc.); bridge acquisition methods with LOD, big data
q Openness – elaborate simple and har-
monised licensing solutions; promote
openness and sharing of LRs
q Interoperability – promote and
encourage use of standards
FLaReNet is a project funded under the eContentplus programme, grant agreement ECP-2007-LANG-617001.
eContentplus is a multiannual Community programme to make digital content in Europe more accessible, usable
and exploitable.
The Strategic Language Resource Agenda
Nicoletta Calzolari, Valeria Quochi, Claudia Soria
CNR - Istituto di Linguistica Computazionale “A. Zampolli”, Italy
with the contribution of
Núria Bel, University Pompeu Fabra, Spain
Gerhard Budin, Universität Wien, Austria
Khalid Choukri, ELDA, France
Joseph Mariani, LIMSI/IMMI-CNRS, France
Monica Monachini, CNR-ILC, Italy
Jan Odijk, Universiteit Utrecht, Netherlands
Stelios Piperidis, ILSP/”Athena” R.C., Greece
http://www.meta-net.eu
10. We need an LT Masterplan
q In 2015, LT is simply everywhere: search, interactive assistants (phones,
cars, appliances), big data, social media analytics, etc. The potential is huge!
q Europe needs to follow a Language Technology Masterplan. Resources
are only one piece of the puzzle, also needs to reflect technologies, tools,
research, innovation, platforms, infrastructures, services, language policy
making, the language communities, flagship initiatives (CEF, DSM), etc.
q Europe is only starting to
recognise the potential of LT.
q LT will be a key ingredient of our
future IT – with or without Europe.
q Europe has a unique opportunity
for a strategic investment into our
future growth.
http://www.meta-net.eu
DECLARATION OF COMMON INTERESTS
We, the undersigned, declare here, at the Riga Summit on the Multilingual Digital Single
Market, encouraged by the letter Vice President Andrus Ansip sent to its participants, that we
stand united in our goal and interest to:
- support multilingualism in Europe by employing language technology in business,
society and governance, to create a truly Multilingual Digital Single Market,
- exchange and share information in our efforts to promote our goals and interests at
local, national and European levels,
- raise awareness in society at large using channels available to our associations,
alliances and societies.
In the near future, we foresee the establishment of a Memorandum of Understanding among
our organisations towards a “Coalition for a Multilingual Europe”, to better serve our
members address the language barrier challenges towards establishing a truly integrated
Multilingual Digital Single Market.
Riga, 29. April 2015
Signed by (in alphabetical order):
BDVA Laure Le Bars
CITIA Steve Renals
CLARIN Steven Krauwer
EFNIL
Sabine Kirchmeier-Andersen,
Tamás Váradi
ELEN Davyth Hicks, Claudia Soria
ELRA
Nicoletta Calzolari,
Khalid Choukri
GALA
Laura Brandon, Robert E. Etches,
Sergey Gladkov
LT Innovate
Jochen Hummel,
Philippe Wacker
META-NET
Jan Hajic, Josef van Genabith,
Georg Rehm, Andrejs Vasiljevs
NPLD Meirion Prys Jones
TAUS Jaap van der Meer
W3C Richard Ishida, Felix Sasaki
For any questions, please contact Georg.Rehm@dfki.de.
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
Strategic Agenda for the
Multilingual Digital Single Market
Technologies for Overcoming Language Barriers towards
a truly integrated European Online Market
D
RAFT
Version 0.5 – April 22, 2015
The key ingredients are in place: the communities are ready,
several strategic research agendas were prepared, e.g.,:
10
META-NET SRA MDSM SRIARiga Summit Declaration
11. Enable multilingual
communication through web
scale platform (also: Multi-
lingual Digital Single Market)
Software engineering project;
“one size fits all” approach;
low risk of failure; increased
security and data protection
Web service (including APIs)
that makes use of SMT
methods and large data sets
Web service platform for LT/
MT research and innovation
(hybrid research, continuous
development and operations)
Enable the testing of new
methods and avantgarde
approaches with very large
amounts of users
European research and
innovation platform for novel
LT/MT ideas and specialised
services (e.g., genres, styles,
registers etc.)
Translingual Cloud
Web service platform for
human translators and LSPs
Enable hand-in-hand
operations of MT and human
translation; enable high-
quality human translation
Establish a sustainable
technological link between
human and machine (e.g., via
human-generated and
human-annotated data sets)
http://www.meta-net.eu 11