SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER

(grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119),
CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899).
Language Resources for 

Multilingual Europe
Georg Rehm
META-NET Network Manager – CRACKER Coordinator
DFKI, Germany
georg.rehm@dfki.de
LT Innovate Summit – LR Dialogue Workshop, Panel “Language Resource Supply”
Brussels, Belgium, June 25, 2015
META-NET and META
q  


60 research centres in 34 countries

(via four EU-funded projects: T4ME,

CESAR, METANET4U, META-NORD)
q  


Multilingual Europe Technology Alliance,

794 members in 68 countries
http://www.meta-net.eu/members
http://www.meta-net.eu
q  Pan-European infrastructure, bringing together providers and consumers of
language data, tools and services.
q  LRs are documented, uploaded, stored, catalogued, downloaded, shared – to
improve visibility, documentation, identification, availability, interoperability.
q  Caters for datasets, tools, services for LT research and development (both
academic and commercial); META-SHARE includes repository software, a
metadata model, licensing kit, statistics.
q  29 distributed repositories maintained 

by 37 organisations in 25 countries.
q  2.500+ resources (corpora: 49%, 

lexical: 38%, tools/services: 12%),

covering ca. 100 languages.
q  7.000+ downloads in total; ca. 70%

of all LRs have been downloaded.
MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian,
Italian, Polish, Romanian
weak or no support
Basque, Bulgarian, Croatian, Czech,
Danish, Estonian, Finnish, Galician,
Greek, Icelandic, Irish, Latvian,
Lithuanian, Maltese, Norwegian,
Portuguese, Serbian, Slovak, Slovene,
Swedish, Welsh
excellent
English
good
Czech, Dutch, French,
German, Hungarian,
Italian, Polish,
Spanish, Swedish
moderate fragmentary
Basque, Bulgarian, Catalan,
Croatian, Danish, Estonian, Finnish,
Galician, Greek, Norwegian,
Portuguese, Romanian, Serbian,
Slovak, Slovene
Icelandic, Irish, Latvian, 

Lithuanian, Maltese, Welsh
weak/no supportexcellent
Resources
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Romanian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portuguese
Finnish
Catalan
Polish
Hungarian
Czech
Italian
German
Dutch
Spanish
French
English
Levelofsupport
Languages with names in red
have little or no MT support
Language White Paper Series
Europe’s Languages in the Digital Age (2011/2012)
Summary: “At Least 21 European Languages
in Danger of Digital Extinction!”
http://www.cracker-project.eu • http://www.meta-net.eu
LR-Related Activities
2015 2016 2017
M12
M1
M24
M36
Kick-off meeting
for all ICT-17
Projects
translate5
WMT
2016
WMT
2017
IWSLT
2015
IWSLT
2016
IWSLT
2017
QT Marathon
2015
QT Marathon
2016
Roadmap for
European MT
Research
Survey on the State
of HQMT in Industry
and LSPs
SRIA
(initial version)
SRIA
(update)
SRIA
(final)
version 2version 1
•  Production of resources (e.g., for WMT
2016 and 2017, IWSLT 2015-2017)
•  Tools for resources (quality control,
evaluations; towards the idea of a smart
workbench for translators)
•  Strategies and roadmaps for resources
(SRIA, Roadmap for European MT
Research)
•  Exchange and sharing facility for
resources (META-SHARE)
Maintenance of Operations and Outreach
•  Provide services, adapt them to evolving user requirements and licensing landscape
•  adapt, streamline and extend the metadata schema;
•  adapt licensing toolkit to new international licensing setups;
•  streamline and simplify operations for repository providers and data depositors.
•  Technical support and bug fixing
http://www.cracker-project.eu • http://www.meta-net.eu
•  Federation of projects – core seed: 

the group of H2020-ICT17 projects.
•  Multi-lateral Memorandum of Understanding, 

ca. 20 projects in total (including FP7 and
H2020-ICT15), to be approached in two
phases (first phase almost completed).
•  Selected areas of collaboration: data
management and repositories (including
Data Management Plan), tools and
technologies; shared tasks and evaluations.
•  http://www.cracking-the-language-barrier.eu
will be launched soon.
MT Use Cases and Language Resources
q  “Usability” is an unusual generic dimension for the evaluation of a resource.
q  Reason: the majority of LRs can be used in many different research or application scenarios.
q  More relevant dimensions: quality, availability, coverage, maturity, sustainability, adaptability,
size, format, license, language, style etc. – depending on the use case.
q  When talking about LRs for MT, it’s important to be specific in terms of the respective use case.
q  Reason: the use case puts specific requirements on the type of LR and relevant dimensions.
Scenario MT Use Case
Maturity of
Technology
Human
Involvement
Relevance of
Quality
Methods LR Requirements
Inbound
Translation
(written texts)
Gist transla-
tion, provide
an idea of a
text’s
contents
Deployed
(Google
Translate),
research
ongoing
– Quality of MT
secondary
Statistical MT Very large aligned data
sets (the more data, the
better)
Outbound
Translation
(written texts)
Production
quality, for
publication
Research on
HQMT has
started, no
POCs yet
– Quality of MT
extremely
important,
ideally HQ
New approach needed,
SMT, RBMT, hybrid
systems (needs quality
estimation methods)
Deeply annotated data
sets with quality
information (also needs
more research)
Outbound
Translation
(written texts)
Production
quality, for
publication
Deployed,
usable via
LSPs
Post-editing Quality of
initial MT step
important but
secondary
MT, followed by post-
editing, ideally with
smart translation
workbenches (CAT)
Translation memories
and term databases
(large coverage, high
quality etc.)
Speech to
Speech
Translation
Enable face-
to-face
conversations
Research
ongoing but
POCs exist
(Skype)
– Quality of MT
secondary
Recognition and
generation of spoken
language; statistical MT
etc.
Several additional
technologies and LR
types needed (such as
very large speech
databases)
http://www.meta-net.eu 8
META-NET SRA LR Roadmap
q  Infrastructure – maintain and extend sharing facility; promote
documentation through metadata; intensify cooperation
q  Coverage, Quality, Adequacy – increase number of LRs for all
European languages to address application needs; promote
evaluation and validation to improve LR quality constantly
q  Acquisition – define best practices for LR production; automate
production; distributed production (crowd-sourcing, social media,
gamification etc.); bridge acquisition methods with LOD, big data
q  Openness – elaborate simple and har-

monised licensing solutions; promote 

openness and sharing of LRs
q  Interoperability – promote and 

encourage use of standards
FLaReNet is a project funded under the eContentplus programme, grant agreement ECP-2007-LANG-617001.
eContentplus is a multiannual Community programme to make digital content in Europe more accessible, usable
and exploitable.
The Strategic Language Resource Agenda
Nicoletta Calzolari, Valeria Quochi, Claudia Soria
CNR - Istituto di Linguistica Computazionale “A. Zampolli”, Italy
with the contribution of
Núria Bel, University Pompeu Fabra, Spain
Gerhard Budin, Universität Wien, Austria
Khalid Choukri, ELDA, France
Joseph Mariani, LIMSI/IMMI-CNRS, France
Monica Monachini, CNR-ILC, Italy
Jan Odijk, Universiteit Utrecht, Netherlands
Stelios Piperidis, ILSP/”Athena” R.C., Greece
http://www.meta-net.eu
We need an LT Masterplan
q  In 2015, LT is simply everywhere: search, interactive assistants (phones,
cars, appliances), big data, social media analytics, etc. The potential is huge!
q  Europe needs to follow a Language Technology Masterplan. Resources
are only one piece of the puzzle, also needs to reflect technologies, tools,
research, innovation, platforms, infrastructures, services, language policy
making, the language communities, flagship initiatives (CEF, DSM), etc.
q  Europe is only starting to 

recognise the potential of LT.
q  LT will be a key ingredient of our 

future IT – with or without Europe.
q  Europe has a unique opportunity 

for a strategic investment into our

future growth.
http://www.meta-net.eu
DECLARATION OF COMMON INTERESTS
We, the undersigned, declare here, at the Riga Summit on the Multilingual Digital Single
Market, encouraged by the letter Vice President Andrus Ansip sent to its participants, that we
stand united in our goal and interest to:
- support multilingualism in Europe by employing language technology in business,
society and governance, to create a truly Multilingual Digital Single Market,
- exchange and share information in our efforts to promote our goals and interests at
local, national and European levels,
- raise awareness in society at large using channels available to our associations,
alliances and societies.
In the near future, we foresee the establishment of a Memorandum of Understanding among
our organisations towards a “Coalition for a Multilingual Europe”, to better serve our
members address the language barrier challenges towards establishing a truly integrated
Multilingual Digital Single Market.
Riga, 29. April 2015
Signed by (in alphabetical order):
BDVA Laure Le Bars
CITIA Steve Renals
CLARIN Steven Krauwer
EFNIL
Sabine Kirchmeier-Andersen,
Tamás Váradi
ELEN Davyth Hicks, Claudia Soria
ELRA
Nicoletta Calzolari,
Khalid Choukri
GALA
Laura Brandon, Robert E. Etches,
Sergey Gladkov
LT Innovate
Jochen Hummel,
Philippe Wacker
META-NET
Jan Hajic, Josef van Genabith,
Georg Rehm, Andrejs Vasiljevs
NPLD Meirion Prys Jones
TAUS Jaap van der Meer
W3C Richard Ishida, Felix Sasaki
For any questions, please contact Georg.Rehm@dfki.de.
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
D
RAFT
Strategic Agenda for the
Multilingual Digital Single Market
Technologies for Overcoming Language Barriers towards
a truly integrated European Online Market
D
RAFT
Version 0.5 – April 22, 2015
The key ingredients are in place: the communities are ready,
several strategic research agendas were prepared, e.g.,:
10
META-NET SRA MDSM SRIARiga Summit Declaration
Enable multilingual
communication through web
scale platform (also: Multi-
lingual Digital Single Market)
Software engineering project;
“one size fits all” approach;
low risk of failure; increased
security and data protection
Web service (including APIs)
that makes use of SMT
methods and large data sets
Web service platform for LT/
MT research and innovation
(hybrid research, continuous
development and operations)
Enable the testing of new
methods and avantgarde
approaches with very large
amounts of users
European research and
innovation platform for novel
LT/MT ideas and specialised
services (e.g., genres, styles,
registers etc.)
Translingual Cloud
Web service platform for
human translators and LSPs
Enable hand-in-hand
operations of MT and human
translation; enable high-
quality human translation
Establish a sustainable
technological link between
human and machine (e.g., via
human-generated and
human-annotated data sets)
http://www.meta-net.eu 11
Thank you!
http://www.meta-net.eu
http://www.facebook.com/META.Alliance
12

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

META-NET: Towards a Strategic Research Agenda for Multilingual Europe
META-NET: Towards a Strategic Research Agenda for Multilingual EuropeMETA-NET: Towards a Strategic Research Agenda for Multilingual Europe
META-NET: Towards a Strategic Research Agenda for Multilingual Europe
 
The META-NET Strategic Research Agenda and Linked Open Data
The META-NET Strategic Research Agenda and Linked Open DataThe META-NET Strategic Research Agenda and Linked Open Data
The META-NET Strategic Research Agenda and Linked Open Data
 
Promoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language TechnologyPromoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language Technology
 
The META-NET Language White Paper Series
The META-NET Language White Paper SeriesThe META-NET Language White Paper Series
The META-NET Language White Paper Series
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
 
Language Technology for Multilingual Europe
Language Technology for Multilingual EuropeLanguage Technology for Multilingual Europe
Language Technology for Multilingual Europe
 
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
 
Computational Morphology and the META-NET Strategic Research Agenda for Multi...
Computational Morphology and the META-NET Strategic Research Agenda for Multi...Computational Morphology and the META-NET Strategic Research Agenda for Multi...
Computational Morphology and the META-NET Strategic Research Agenda for Multi...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Quaero Technology Catalog
Quaero Technology CatalogQuaero Technology Catalog
Quaero Technology Catalog
 
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationTowards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
 
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
 
SETAF 2015 Sponsorship Prospectus
SETAF 2015  Sponsorship ProspectusSETAF 2015  Sponsorship Prospectus
SETAF 2015 Sponsorship Prospectus
 
Introducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updatedIntroducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updated
 
Overview of the Sustainability Plans of the ICT-29b) Projects
Overview of the Sustainability Plans of the ICT-29b) ProjectsOverview of the Sustainability Plans of the ICT-29b) Projects
Overview of the Sustainability Plans of the ICT-29b) Projects
 
FP7-ICT Programme
FP7-ICT ProgrammeFP7-ICT Programme
FP7-ICT Programme
 
Road2 germany xl zwolle 08062017 oth
Road2 germany xl zwolle 08062017 othRoad2 germany xl zwolle 08062017 oth
Road2 germany xl zwolle 08062017 oth
 
CMS Research Presentation
CMS Research PresentationCMS Research Presentation
CMS Research Presentation
 
Freme general-overview-version-june-2015
Freme general-overview-version-june-2015Freme general-overview-version-june-2015
Freme general-overview-version-june-2015
 
deftcon 2015 - Nino Vincenzo Verde - European Antitrust Forensic IT Tools
deftcon 2015 - Nino Vincenzo Verde - European Antitrust Forensic IT Toolsdeftcon 2015 - Nino Vincenzo Verde - European Antitrust Forensic IT Tools
deftcon 2015 - Nino Vincenzo Verde - European Antitrust Forensic IT Tools
 

Ähnlich wie Language Resources for Multilingual Europe

Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
MediaMixerCommunity
 

Ähnlich wie Language Resources for Multilingual Europe (20)

META-NET and META-SHARE: An Overview
META-NET and META-SHARE: An OverviewMETA-NET and META-SHARE: An Overview
META-NET and META-SHARE: An Overview
 
Cracking the Language Barrier for a Multilingual Europe
Cracking the Language Barrier for a Multilingual EuropeCracking the Language Barrier for a Multilingual Europe
Cracking the Language Barrier for a Multilingual Europe
 
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
 
TAUS MT Showcase, MT@EC for European public administrations and online servic...
TAUS MT Showcase, MT@EC for European public administrations and online servic...TAUS MT Showcase, MT@EC for European public administrations and online servic...
TAUS MT Showcase, MT@EC for European public administrations and online servic...
 
The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020
 
Language technology market and components taxonomy
Language technology market and components taxonomyLanguage technology market and components taxonomy
Language technology market and components taxonomy
 
Sala+ Presentation Cali Cartagena Octubre 2008 V0.0
Sala+ Presentation Cali Cartagena Octubre 2008 V0.0Sala+ Presentation Cali Cartagena Octubre 2008 V0.0
Sala+ Presentation Cali Cartagena Octubre 2008 V0.0
 
MLi - Project presentation
MLi - Project presentationMLi - Project presentation
MLi - Project presentation
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual Europe
 
Thinking the archives of 2020: Opportunitiws, priorities, Issues
Thinking the archives of 2020: Opportunitiws, priorities, IssuesThinking the archives of 2020: Opportunitiws, priorities, Issues
Thinking the archives of 2020: Opportunitiws, priorities, Issues
 
RNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challengesRNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challenges
 
AP_CV_photo0131
AP_CV_photo0131AP_CV_photo0131
AP_CV_photo0131
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
packed-preforma@lleida2015
packed-preforma@lleida2015packed-preforma@lleida2015
packed-preforma@lleida2015
 
Digitizing European Industry
Digitizing European IndustryDigitizing European Industry
Digitizing European Industry
 
Bne impact co_c
Bne impact co_cBne impact co_c
Bne impact co_c
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708
 
ION Durban - What's Happening at the IETF?
ION Durban - What's Happening at the IETF?ION Durban - What's Happening at the IETF?
ION Durban - What's Happening at the IETF?
 
Spanish Language Technology Plan. David Pérez Fernández, Cabinet of State Sec...
Spanish Language Technology Plan. David Pérez Fernández, Cabinet of State Sec...Spanish Language Technology Plan. David Pérez Fernández, Cabinet of State Sec...
Spanish Language Technology Plan. David Pérez Fernández, Cabinet of State Sec...
 

Mehr von Georg Rehm

Mehr von Georg Rehm (18)

Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...
 
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
 
Künstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und ÜbersetzenKünstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und Übersetzen
 
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
 
European Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and FutureEuropean Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and Future
 
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) ÜberblickKI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
 
Kuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KIKuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KI
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film Industry
 
KI für die Kundenkommunikation
KI für die KundenkommunikationKI für die Kundenkommunikation
KI für die Kundenkommunikation
 
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
 
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenDigitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
 
EPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3CEPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3C
 
Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungssze...
Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungssze...Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungssze...
Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungssze...
 
Generische Kuratierungstechnologien für spezifische Anwendungsfälle: Hintergr...
Generische Kuratierungstechnologien für spezifische Anwendungsfälle: Hintergr...Generische Kuratierungstechnologien für spezifische Anwendungsfälle: Hintergr...
Generische Kuratierungstechnologien für spezifische Anwendungsfälle: Hintergr...
 
Curation Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeCuration Technologies for Multilingual Europe
Curation Technologies for Multilingual Europe
 
Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?
 
Globale Standards im Web of Things
Globale Standards im Web of ThingsGlobale Standards im Web of Things
Globale Standards im Web of Things
 
W3C/DFKI Automotive Workshop
W3C/DFKI Automotive WorkshopW3C/DFKI Automotive Workshop
W3C/DFKI Automotive Workshop
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Language Resources for Multilingual Europe

  • 1. META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER
 (grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899). Language Resources for 
 Multilingual Europe Georg Rehm META-NET Network Manager – CRACKER Coordinator DFKI, Germany georg.rehm@dfki.de LT Innovate Summit – LR Dialogue Workshop, Panel “Language Resource Supply” Brussels, Belgium, June 25, 2015
  • 2. META-NET and META q  

 60 research centres in 34 countries
 (via four EU-funded projects: T4ME,
 CESAR, METANET4U, META-NORD) q  

 Multilingual Europe Technology Alliance,
 794 members in 68 countries http://www.meta-net.eu/members
  • 3. http://www.meta-net.eu q  Pan-European infrastructure, bringing together providers and consumers of language data, tools and services. q  LRs are documented, uploaded, stored, catalogued, downloaded, shared – to improve visibility, documentation, identification, availability, interoperability. q  Caters for datasets, tools, services for LT research and development (both academic and commercial); META-SHARE includes repository software, a metadata model, licensing kit, statistics. q  29 distributed repositories maintained 
 by 37 organisations in 25 countries. q  2.500+ resources (corpora: 49%, 
 lexical: 38%, tools/services: 12%),
 covering ca. 100 languages. q  7.000+ downloads in total; ca. 70%
 of all LRs have been downloaded.
  • 4.
  • 5. MT English good French, Spanish moderate fragmentary Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian weak or no support Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh excellent English good Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish moderate fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene Icelandic, Irish, Latvian, 
 Lithuanian, Maltese, Welsh weak/no supportexcellent Resources Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Language White Paper Series Europe’s Languages in the Digital Age (2011/2012) Summary: “At Least 21 European Languages in Danger of Digital Extinction!”
  • 6. http://www.cracker-project.eu • http://www.meta-net.eu LR-Related Activities 2015 2016 2017 M12 M1 M24 M36 Kick-off meeting for all ICT-17 Projects translate5 WMT 2016 WMT 2017 IWSLT 2015 IWSLT 2016 IWSLT 2017 QT Marathon 2015 QT Marathon 2016 Roadmap for European MT Research Survey on the State of HQMT in Industry and LSPs SRIA (initial version) SRIA (update) SRIA (final) version 2version 1 •  Production of resources (e.g., for WMT 2016 and 2017, IWSLT 2015-2017) •  Tools for resources (quality control, evaluations; towards the idea of a smart workbench for translators) •  Strategies and roadmaps for resources (SRIA, Roadmap for European MT Research) •  Exchange and sharing facility for resources (META-SHARE) Maintenance of Operations and Outreach •  Provide services, adapt them to evolving user requirements and licensing landscape •  adapt, streamline and extend the metadata schema; •  adapt licensing toolkit to new international licensing setups; •  streamline and simplify operations for repository providers and data depositors. •  Technical support and bug fixing
  • 7. http://www.cracker-project.eu • http://www.meta-net.eu •  Federation of projects – core seed: 
 the group of H2020-ICT17 projects. •  Multi-lateral Memorandum of Understanding, 
 ca. 20 projects in total (including FP7 and H2020-ICT15), to be approached in two phases (first phase almost completed). •  Selected areas of collaboration: data management and repositories (including Data Management Plan), tools and technologies; shared tasks and evaluations. •  http://www.cracking-the-language-barrier.eu will be launched soon.
  • 8. MT Use Cases and Language Resources q  “Usability” is an unusual generic dimension for the evaluation of a resource. q  Reason: the majority of LRs can be used in many different research or application scenarios. q  More relevant dimensions: quality, availability, coverage, maturity, sustainability, adaptability, size, format, license, language, style etc. – depending on the use case. q  When talking about LRs for MT, it’s important to be specific in terms of the respective use case. q  Reason: the use case puts specific requirements on the type of LR and relevant dimensions. Scenario MT Use Case Maturity of Technology Human Involvement Relevance of Quality Methods LR Requirements Inbound Translation (written texts) Gist transla- tion, provide an idea of a text’s contents Deployed (Google Translate), research ongoing – Quality of MT secondary Statistical MT Very large aligned data sets (the more data, the better) Outbound Translation (written texts) Production quality, for publication Research on HQMT has started, no POCs yet – Quality of MT extremely important, ideally HQ New approach needed, SMT, RBMT, hybrid systems (needs quality estimation methods) Deeply annotated data sets with quality information (also needs more research) Outbound Translation (written texts) Production quality, for publication Deployed, usable via LSPs Post-editing Quality of initial MT step important but secondary MT, followed by post- editing, ideally with smart translation workbenches (CAT) Translation memories and term databases (large coverage, high quality etc.) Speech to Speech Translation Enable face- to-face conversations Research ongoing but POCs exist (Skype) – Quality of MT secondary Recognition and generation of spoken language; statistical MT etc. Several additional technologies and LR types needed (such as very large speech databases) http://www.meta-net.eu 8
  • 9. META-NET SRA LR Roadmap q  Infrastructure – maintain and extend sharing facility; promote documentation through metadata; intensify cooperation q  Coverage, Quality, Adequacy – increase number of LRs for all European languages to address application needs; promote evaluation and validation to improve LR quality constantly q  Acquisition – define best practices for LR production; automate production; distributed production (crowd-sourcing, social media, gamification etc.); bridge acquisition methods with LOD, big data q  Openness – elaborate simple and har-
 monised licensing solutions; promote 
 openness and sharing of LRs q  Interoperability – promote and 
 encourage use of standards FLaReNet is a project funded under the eContentplus programme, grant agreement ECP-2007-LANG-617001. eContentplus is a multiannual Community programme to make digital content in Europe more accessible, usable and exploitable. The Strategic Language Resource Agenda Nicoletta Calzolari, Valeria Quochi, Claudia Soria CNR - Istituto di Linguistica Computazionale “A. Zampolli”, Italy with the contribution of Núria Bel, University Pompeu Fabra, Spain Gerhard Budin, Universität Wien, Austria Khalid Choukri, ELDA, France Joseph Mariani, LIMSI/IMMI-CNRS, France Monica Monachini, CNR-ILC, Italy Jan Odijk, Universiteit Utrecht, Netherlands Stelios Piperidis, ILSP/”Athena” R.C., Greece http://www.meta-net.eu
  • 10. We need an LT Masterplan q  In 2015, LT is simply everywhere: search, interactive assistants (phones, cars, appliances), big data, social media analytics, etc. The potential is huge! q  Europe needs to follow a Language Technology Masterplan. Resources are only one piece of the puzzle, also needs to reflect technologies, tools, research, innovation, platforms, infrastructures, services, language policy making, the language communities, flagship initiatives (CEF, DSM), etc. q  Europe is only starting to 
 recognise the potential of LT. q  LT will be a key ingredient of our 
 future IT – with or without Europe. q  Europe has a unique opportunity 
 for a strategic investment into our
 future growth. http://www.meta-net.eu DECLARATION OF COMMON INTERESTS We, the undersigned, declare here, at the Riga Summit on the Multilingual Digital Single Market, encouraged by the letter Vice President Andrus Ansip sent to its participants, that we stand united in our goal and interest to: - support multilingualism in Europe by employing language technology in business, society and governance, to create a truly Multilingual Digital Single Market, - exchange and share information in our efforts to promote our goals and interests at local, national and European levels, - raise awareness in society at large using channels available to our associations, alliances and societies. In the near future, we foresee the establishment of a Memorandum of Understanding among our organisations towards a “Coalition for a Multilingual Europe”, to better serve our members address the language barrier challenges towards establishing a truly integrated Multilingual Digital Single Market. Riga, 29. April 2015 Signed by (in alphabetical order): BDVA Laure Le Bars CITIA Steve Renals CLARIN Steven Krauwer EFNIL Sabine Kirchmeier-Andersen, Tamás Váradi ELEN Davyth Hicks, Claudia Soria ELRA Nicoletta Calzolari, Khalid Choukri GALA Laura Brandon, Robert E. Etches, Sergey Gladkov LT Innovate Jochen Hummel, Philippe Wacker META-NET Jan Hajic, Josef van Genabith, Georg Rehm, Andrejs Vasiljevs NPLD Meirion Prys Jones TAUS Jaap van der Meer W3C Richard Ishida, Felix Sasaki For any questions, please contact Georg.Rehm@dfki.de. D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT Strategic Agenda for the Multilingual Digital Single Market Technologies for Overcoming Language Barriers towards a truly integrated European Online Market D RAFT Version 0.5 – April 22, 2015 The key ingredients are in place: the communities are ready, several strategic research agendas were prepared, e.g.,: 10 META-NET SRA MDSM SRIARiga Summit Declaration
  • 11. Enable multilingual communication through web scale platform (also: Multi- lingual Digital Single Market) Software engineering project; “one size fits all” approach; low risk of failure; increased security and data protection Web service (including APIs) that makes use of SMT methods and large data sets Web service platform for LT/ MT research and innovation (hybrid research, continuous development and operations) Enable the testing of new methods and avantgarde approaches with very large amounts of users European research and innovation platform for novel LT/MT ideas and specialised services (e.g., genres, styles, registers etc.) Translingual Cloud Web service platform for human translators and LSPs Enable hand-in-hand operations of MT and human translation; enable high- quality human translation Establish a sustainable technological link between human and machine (e.g., via human-generated and human-annotated data sets) http://www.meta-net.eu 11