META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER

(...
META-NET and META
q  


60 research centres in 34 countries

(via four EU-funded projects: T4ME,

CESAR, METANET4U, META-...
http://www.meta-net.eu
q  Pan-European infrastructure, bringing together providers and consumers of
language data, tools ...
MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian,
Italian, Polish, Romanian
weak or ...
http://www.cracker-project.eu • http://www.meta-net.eu
LR-Related Activities
2015 2016 2017
M12
M1
M24
M36
Kick-off meeting...
http://www.cracker-project.eu • http://www.meta-net.eu
•  Federation of projects – core seed: 

the group of H2020-ICT17 p...
MT Use Cases and Language Resources
q  “Usability” is an unusual generic dimension for the evaluation of a resource.
q  ...
META-NET SRA LR Roadmap
q  Infrastructure – maintain and extend sharing facility; promote
documentation through metadata;...
We need an LT Masterplan
q  In 2015, LT is simply everywhere: search, interactive assistants (phones,
cars, appliances), ...
Enable multilingual
communication through web
scale platform (also: Multi-
lingual Digital Single Market)
Software enginee...
Thank you!
http://www.meta-net.eu
http://www.facebook.com/META.Alliance
12
Language Resources for Multilingual Europe
Nächste SlideShare
Wird geladen in …5
×

Language Resources for Multilingual Europe

922 Aufrufe

Veröffentlicht am

Georg Rehm. Language Resources for Multilingual Europe. Presented at LT Innovate Summit – LR Dialogue Workshop, Panel “Language Resource Supply”, Brussels, Belgium. June 2015. June 25, 2015.

Veröffentlicht in: Technologie
0 Kommentare
0 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Keine Downloads
Aufrufe
Aufrufe insgesamt
922
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
241
Aktionen
Geteilt
0
Downloads
2
Kommentare
0
Gefällt mir
0
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Language Resources for Multilingual Europe

  1. 1. META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER
 (grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899). Language Resources for 
 Multilingual Europe Georg Rehm META-NET Network Manager – CRACKER Coordinator DFKI, Germany georg.rehm@dfki.de LT Innovate Summit – LR Dialogue Workshop, Panel “Language Resource Supply” Brussels, Belgium, June 25, 2015
  2. 2. META-NET and META q  

 60 research centres in 34 countries
 (via four EU-funded projects: T4ME,
 CESAR, METANET4U, META-NORD) q  

 Multilingual Europe Technology Alliance,
 794 members in 68 countries http://www.meta-net.eu/members
  3. 3. http://www.meta-net.eu q  Pan-European infrastructure, bringing together providers and consumers of language data, tools and services. q  LRs are documented, uploaded, stored, catalogued, downloaded, shared – to improve visibility, documentation, identification, availability, interoperability. q  Caters for datasets, tools, services for LT research and development (both academic and commercial); META-SHARE includes repository software, a metadata model, licensing kit, statistics. q  29 distributed repositories maintained 
 by 37 organisations in 25 countries. q  2.500+ resources (corpora: 49%, 
 lexical: 38%, tools/services: 12%),
 covering ca. 100 languages. q  7.000+ downloads in total; ca. 70%
 of all LRs have been downloaded.
  4. 4. MT English good French, Spanish moderate fragmentary Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian weak or no support Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh excellent English good Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish moderate fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene Icelandic, Irish, Latvian, 
 Lithuanian, Maltese, Welsh weak/no supportexcellent Resources Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Language White Paper Series Europe’s Languages in the Digital Age (2011/2012) Summary: “At Least 21 European Languages in Danger of Digital Extinction!”
  5. 5. http://www.cracker-project.eu • http://www.meta-net.eu LR-Related Activities 2015 2016 2017 M12 M1 M24 M36 Kick-off meeting for all ICT-17 Projects translate5 WMT 2016 WMT 2017 IWSLT 2015 IWSLT 2016 IWSLT 2017 QT Marathon 2015 QT Marathon 2016 Roadmap for European MT Research Survey on the State of HQMT in Industry and LSPs SRIA (initial version) SRIA (update) SRIA (final) version 2version 1 •  Production of resources (e.g., for WMT 2016 and 2017, IWSLT 2015-2017) •  Tools for resources (quality control, evaluations; towards the idea of a smart workbench for translators) •  Strategies and roadmaps for resources (SRIA, Roadmap for European MT Research) •  Exchange and sharing facility for resources (META-SHARE) Maintenance of Operations and Outreach •  Provide services, adapt them to evolving user requirements and licensing landscape •  adapt, streamline and extend the metadata schema; •  adapt licensing toolkit to new international licensing setups; •  streamline and simplify operations for repository providers and data depositors. •  Technical support and bug fixing
  6. 6. http://www.cracker-project.eu • http://www.meta-net.eu •  Federation of projects – core seed: 
 the group of H2020-ICT17 projects. •  Multi-lateral Memorandum of Understanding, 
 ca. 20 projects in total (including FP7 and H2020-ICT15), to be approached in two phases (first phase almost completed). •  Selected areas of collaboration: data management and repositories (including Data Management Plan), tools and technologies; shared tasks and evaluations. •  http://www.cracking-the-language-barrier.eu will be launched soon.
  7. 7. MT Use Cases and Language Resources q  “Usability” is an unusual generic dimension for the evaluation of a resource. q  Reason: the majority of LRs can be used in many different research or application scenarios. q  More relevant dimensions: quality, availability, coverage, maturity, sustainability, adaptability, size, format, license, language, style etc. – depending on the use case. q  When talking about LRs for MT, it’s important to be specific in terms of the respective use case. q  Reason: the use case puts specific requirements on the type of LR and relevant dimensions. Scenario MT Use Case Maturity of Technology Human Involvement Relevance of Quality Methods LR Requirements Inbound Translation (written texts) Gist transla- tion, provide an idea of a text’s contents Deployed (Google Translate), research ongoing – Quality of MT secondary Statistical MT Very large aligned data sets (the more data, the better) Outbound Translation (written texts) Production quality, for publication Research on HQMT has started, no POCs yet – Quality of MT extremely important, ideally HQ New approach needed, SMT, RBMT, hybrid systems (needs quality estimation methods) Deeply annotated data sets with quality information (also needs more research) Outbound Translation (written texts) Production quality, for publication Deployed, usable via LSPs Post-editing Quality of initial MT step important but secondary MT, followed by post- editing, ideally with smart translation workbenches (CAT) Translation memories and term databases (large coverage, high quality etc.) Speech to Speech Translation Enable face- to-face conversations Research ongoing but POCs exist (Skype) – Quality of MT secondary Recognition and generation of spoken language; statistical MT etc. Several additional technologies and LR types needed (such as very large speech databases) http://www.meta-net.eu 8
  8. 8. META-NET SRA LR Roadmap q  Infrastructure – maintain and extend sharing facility; promote documentation through metadata; intensify cooperation q  Coverage, Quality, Adequacy – increase number of LRs for all European languages to address application needs; promote evaluation and validation to improve LR quality constantly q  Acquisition – define best practices for LR production; automate production; distributed production (crowd-sourcing, social media, gamification etc.); bridge acquisition methods with LOD, big data q  Openness – elaborate simple and har-
 monised licensing solutions; promote 
 openness and sharing of LRs q  Interoperability – promote and 
 encourage use of standards FLaReNet is a project funded under the eContentplus programme, grant agreement ECP-2007-LANG-617001. eContentplus is a multiannual Community programme to make digital content in Europe more accessible, usable and exploitable. The Strategic Language Resource Agenda Nicoletta Calzolari, Valeria Quochi, Claudia Soria CNR - Istituto di Linguistica Computazionale “A. Zampolli”, Italy with the contribution of Núria Bel, University Pompeu Fabra, Spain Gerhard Budin, Universität Wien, Austria Khalid Choukri, ELDA, France Joseph Mariani, LIMSI/IMMI-CNRS, France Monica Monachini, CNR-ILC, Italy Jan Odijk, Universiteit Utrecht, Netherlands Stelios Piperidis, ILSP/”Athena” R.C., Greece http://www.meta-net.eu
  9. 9. We need an LT Masterplan q  In 2015, LT is simply everywhere: search, interactive assistants (phones, cars, appliances), big data, social media analytics, etc. The potential is huge! q  Europe needs to follow a Language Technology Masterplan. Resources are only one piece of the puzzle, also needs to reflect technologies, tools, research, innovation, platforms, infrastructures, services, language policy making, the language communities, flagship initiatives (CEF, DSM), etc. q  Europe is only starting to 
 recognise the potential of LT. q  LT will be a key ingredient of our 
 future IT – with or without Europe. q  Europe has a unique opportunity 
 for a strategic investment into our
 future growth. http://www.meta-net.eu DECLARATION OF COMMON INTERESTS We, the undersigned, declare here, at the Riga Summit on the Multilingual Digital Single Market, encouraged by the letter Vice President Andrus Ansip sent to its participants, that we stand united in our goal and interest to: - support multilingualism in Europe by employing language technology in business, society and governance, to create a truly Multilingual Digital Single Market, - exchange and share information in our efforts to promote our goals and interests at local, national and European levels, - raise awareness in society at large using channels available to our associations, alliances and societies. In the near future, we foresee the establishment of a Memorandum of Understanding among our organisations towards a “Coalition for a Multilingual Europe”, to better serve our members address the language barrier challenges towards establishing a truly integrated Multilingual Digital Single Market. Riga, 29. April 2015 Signed by (in alphabetical order): BDVA Laure Le Bars CITIA Steve Renals CLARIN Steven Krauwer EFNIL Sabine Kirchmeier-Andersen, Tamás Váradi ELEN Davyth Hicks, Claudia Soria ELRA Nicoletta Calzolari, Khalid Choukri GALA Laura Brandon, Robert E. Etches, Sergey Gladkov LT Innovate Jochen Hummel, Philippe Wacker META-NET Jan Hajic, Josef van Genabith, Georg Rehm, Andrejs Vasiljevs NPLD Meirion Prys Jones TAUS Jaap van der Meer W3C Richard Ishida, Felix Sasaki For any questions, please contact Georg.Rehm@dfki.de. D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT D RAFT Strategic Agenda for the Multilingual Digital Single Market Technologies for Overcoming Language Barriers towards a truly integrated European Online Market D RAFT Version 0.5 – April 22, 2015 The key ingredients are in place: the communities are ready, several strategic research agendas were prepared, e.g.,: 10 META-NET SRA MDSM SRIARiga Summit Declaration
  10. 10. Enable multilingual communication through web scale platform (also: Multi- lingual Digital Single Market) Software engineering project; “one size fits all” approach; low risk of failure; increased security and data protection Web service (including APIs) that makes use of SMT methods and large data sets Web service platform for LT/ MT research and innovation (hybrid research, continuous development and operations) Enable the testing of new methods and avantgarde approaches with very large amounts of users European research and innovation platform for novel LT/MT ideas and specialised services (e.g., genres, styles, registers etc.) Translingual Cloud Web service platform for human translators and LSPs Enable hand-in-hand operations of MT and human translation; enable high- quality human translation Establish a sustainable technological link between human and machine (e.g., via human-generated and human-annotated data sets) http://www.meta-net.eu 11
  11. 11. Thank you! http://www.meta-net.eu http://www.facebook.com/META.Alliance 12

×