SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Georg Rehm
georg.rehm@dfki.de
DFKI GmbH, Language Technology Lab – Berlin, Germany
META-NET, General Secretary
AI for Translation Technologies
and Multilingual Europe
Outline
• Artificial Intelligence
• Technology Support for Multilingual Europe
• European MT Research – Results from QT21
• Connecting Europe Facility – Automated Translation
• Towards the Human Language Project
• Conclusions
2EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
Artificial Intelligence
3EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 4
EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 5
EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 6
EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 7
EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 8
Data Intelligence
Current breakthroughs based on Machine Learning (Deep Learning)
Also still in use: symbolic, rule-based methods and systems
Artificial Intelligence
• Huge data sets + powerful algorithms + extremely fast hardware
• Self-driving cars, robots, image recognition, machine translation
• Enormous potential for disruptions in all sectors and areas
Technology Support for
Multilingual Europe
9EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
• Multilingualism is at the very heart of the European idea
• 24 EU languages – all languages have the same status
• Dozens of regional and minority languages as well as
languages of immigrants and trade partners
• Economic challenges:
– If the DSM is not multilingual, there will be 20+ isolated markets
– Language barriers are market barriers
• Social and public challenges:
– Empower all citizens to use their mother tongues
– Enable cross-border, cross-lingual, cross-cultural communication
– Provide multilingual digital public services
– Restore trust in media (fake news debate, filter bubble issue etc.)
q
60 research centres in 34 countries (founded in 2010)
Chair of Executive Board: Jan Hajic (CUNI)
Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde)
General Secretary: Georg Rehm (DFKI)
q
Multilingual Europe
Technology Alliance.
826 members in
67 countries
(published in 2013) (31 volumes; published in 2012)
T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
q Basque
q Bulgarian*
q Catalan
q Croatian*
q Czech*
q Danish*
q Dutch*
q English*
q Estonian*
q Finnish*
q French*
q Galician
q German*
q Greek*
q Hungarian*
q Icelandic
q Irish*
q Italian*
q Latvian*
q Lithuanian*
q Maltese*
q Norwegian
q Polish*
q Portuguese*
q Romanian*
q Serbian
q Slovak*
q Slovene*
q Spanish*
q Swedish*
q Welsh
* Official EU languagehttp://www.meta-net.eu/whitepapers
MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German,
Hungarian, Italian, Polish,
Romanian
weak or no support through LT
Basque, Bulgarian, Croatian,
Czech, Danish, Estonian, Finnish,
Galician, Greek, Icelandic, Irish,
Latvian, Lithuanian, Maltese,
Norwegian, Portuguese, Serbian,
Slovak, Slovene, Swedish, Welsh
excellent
Czech, Dutch,
Finnish, French,
German, Italian,
Portuguese,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,
Danish, Estonian, Galician,
Greek, Hungarian, Irish,
Norwegian, Polish, Serbian,
Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Icelandic, Latvian,
Lithuanian, Maltese, Romanian,
Welsh
excellent
English
good
Speech
English
good
Dutch, French,
German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,
Czech, Danish, Finnish,
Galician, Greek, Hungarian,
Norwegian, Polish,
Portuguese, Romanian,
Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Estonian, Icelandic, Irish,
Latvian, Lithuanian, Maltese,
Serbian, Welsh
excellent
English
good
Czech, Dutch,
French, German,
Hungarian, Italian,
Polish, Spanish,
Swedish
moderate fragmentary
Basque, Bulgarian, Catalan,
Croatian, Danish, Estonian,
Finnish, Galician, Greek,
Norwegian, Portuguese,
Romanian, Serbian, Slovak,
Slovene
Icelandic, Irish, Latvian,
Lithuanian, Maltese, Welsh
weak or no support through LTexcellent
ResourcesTextAnalytics
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Romanian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portuguese
Finnish
Catalan
Polish
Hungarian
Czech
Italian
German
Dutch
Spanish
French
English
Levelofsupport
Languages with names in red
have little or no MT support
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg,
New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
Important: even current state of the art
technologies are far from being perfect!
Important: 20+ European languages are
severely under-supported and face the
danger of digital extinction.
Excellent
Good
Moderate
Fragmentary
Weak/no
support
LanguageTechnologySupport
MillionsofNativeSpeakers(Worldwide)
Yiddish
Welsh
VlaxRomani
Turkish
Scots
Romany
Occitan
Maltese
Macedonian
Luxembourgish
Lithuanian
Limburgish
Latvian
Icelandic
Friulian
Frisian
Breton
Bosnian
Asturian
Albanian
Irish
Croatian
Serbian
Hebrew
Estonian
Slovene
Slovak
Romanian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portuguese
Finnish
Catalan
Polish
Hungarian
Czech
Italian
German
Dutch
Spanish
French
English
0
50
100
150
200
250
300
350
400
Source: Georg Rehm, Hans Uszkoreit, Ido Dagan, Vartkes Goetcherian, Mehmet Ugur Dogan, Coskun Mermer, Tamás Váradi, Sabine Kirchmeier-Andersen,
Gerhard Stickel, Meirion Prys Jones, Stefan Oeter, and Sigve Gramstad. An Update and Extension of the META-NET Study “Europe's Languages in the
Digital Age”. In Proceedings of the Workshop on Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era (CCURL 2014),
Reykjavik, Iceland, May 2014.
European Machine Translation
Research – Results from QT21
16EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
Research & Innovation Action 2015-18
Coordinator:  
Josef  van  Genabith (DFKI)
17
• Morphologically rich languages (De, Cz, Lv)
• Under-resourced languages (Lv, Ro)
• Quality Assessment: MQM – DQF
• Learning from human feedback (APE)
• Evaluation framework: WMT
– Event series to present and discuss results from MT evaluations
– Procedures: Automatic scoring (BLEU etc.) and human
judgements (large number of human annotators)
• Shared tasks (newspaper translation, quality estimation,
metrics and automatic post-editing)
QT21 is Improving Automatic Translation
18
Human Judgement Rankings
64 53 1 3First
66 65 3 6
First  +
Second
2015 2016 2017
QT21 Best Online
19
WMT Newspaper Translation Task
• En ó Cz
• En ó De
• En ó Fr
• En ó Cz
• En ó De
• En ó Ro
• En ó Cz
• En ó De
• En ó Lv
0
5
10
15
20
25
30
35
40
En  -­>  De De  -­>  En En  -­>  Cz Cz  -­>  En
QT21  improvement  in  the  last  12  months  vs.  online  systems
QT21-­WMT-­2016 Online  WMT-­2017 QT21-­WMT-­2017
WMT 2016 System on WMT 2017 Data
20
• Data sets are the fuel for neural networks
• QT21’s neural technologies define the state of the art
• Ranked #1 in more than 80% of all tasks at WMT 2017
• Also predominantly ranked #1 at WMT 2016
• QT21 keeps commercial systems at a distance
• Huge improvements on morphologically rich languages
• MQM as a standard for quality evaluation
Selected Results
21
Connecting Europe Facility:
Automated Translation (CEF AT)
22EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
Connecting Europe
23
• EU flagship goal: Establishing the Digital Single Market
• Overcoming existing barriers
– by creating an environment for digital services to flourish
– by providing cross-border infrastructures and services.
• Sectorial CEF Digital Service Infrastructures (DSIs)
This also includesODRBRIS eHealth EESSI
Citizens
need to
solve
disputes
online across
borders
Citizens and
business
partners need
legal certainty
when doing
business
cross-border
Citizens need
to have online
access to their
patient
summary when
abroad
Citizens need
to get to enjoy
their social
security
seamlessly
and online
when abroad
eProcurement
Open Data
e-Justice
Cyber Security
Safer Internet …
24
• Technological CEF building blocks can be used by the
different DSIs (e.g., eInvoicing, eSignature etc.)
• Most important in this context: CEF eTranslation
– Why? To help European and national public administrations
exchange information across language barriers
– How? By providing MT capabilities that will enable digital
services (in particular all DSIs) to be multilingual.
• CEF eTranslation builds on MT@EC
• Guarantees confidentiality and security of translated data
• è ELRC contract
Connecting Europe
Coordinator:
Josef  van  Genabith (DFKI)
European Language
Resource Coordination
2525
• Language resourcesCollect
• Needs of public servicesIdentify
• With the public sector in the
identification of Language ResourcesEngage
• With any technical or legal issuesHelp
• Observatory for language resources
across EuropeAct
What has been achieved?
0
20
40
60
80
100
120
140
160
Bi-­/Multilingual  Corpora Terminologies Monolingual  Corpuora
LR contributions by type
Status:  April  2017
• 225 language resources collected
• More than 2 billion words in all EU official
languages, Norwegian and Icelandic
• Over 450,000 terms
• More than 2 million translation units
• More than 91 resources to be used by you!
ELRC for you
27
• ELRC-SHARE Repository
– Access to, sharing and contribution of LRs
– Access to tools and services catalogue (forthcoming)
– http://www.lr-coordination.eu/resources
• ELRC Technical and Legal Helpdesk
– Support for potential data donors (phone, email)
– http://www.lr-coordination.eu/helpdesk
• ELRC On-site assistance
– http://www.lr-coordination.eu/services
Current Developments
28EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
• Multilingual Europe: our languages enjoy equal status yet digital
extinction of the majority of EU languages is a very severe danger.
• Language Technology Research and Innovation in Europe:
World class research results (e.g., in QT21), strong SME base,
thousands of LSPs; fragmentation; need for coordination.
• Big need for high-quality Language Technologies: translation,
personal assistants, multilingual DSM etc. (example: CEF).
• AI: Important breakthroughs and massive investments in R&D and
applications (mostly in US, Asia) – huge opportunity for Europe!
• The European Language Challenge cannot be abandoned or
outsourced!
Ø Need for Language Technology made in Europe for Europe!
Current Developments
29EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
Towards the
Human Language Project
30EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
• STOA Workshop in European Parliament (January 2017):
“Language equality in the digital age – towards a Human Language Project”
• Human Language Project vision suggested in several presentations
• STOA Study, published in March 2017, does recommend setting up the HLP
Ø http://www.stoa.europarl.europa.eu/stoa/cms/home/workshops/language
STUDY
EPRS | European Parliamentary Research Service
Scientific Foresight Unit (STOA)
PE 581.621
Science and Technology Options Assessment
32
• Goal: Deep Natural Language Understanding by 2030
• AI for Next Generation Language Technology
• Large-scale EU funding programme for basic and
applied research as well as innovation (10-15 years)
• New breakthroughs for research, industry and society
to foster a multitude of innovations.
Artificial Intelligence
including cognition, perception, vision,
cross-modal, cross-platform, cross-culture, IoT etc.
Machine Learning
Language TechnologyKnowledge Technology
EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
Human Language Project
• All official European and many additional languages
• Broad coverage, HQ, high precision – across modalities,
across platforms, across cultures
• Collaboration between EU, EC, EP, Member States,
research, industry, other stakeholders.
• Basic and applied research, innovation, commercialisation
• Policy change towards “LT-enabled multilingualism”
• HQMT – overcome quality (and language) barriers, written
and spoken, collaborate with human translators
• Resources and technologies for all European languages
33EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
Human Language Project
http://www.cracker-project.eu • http://www.meta-net.eu
Version 1.0 of the SRIA
• Strategic Agenda: “Language Technologiesfor Multilingual
Europe – Towards a Human Language Project”
• Key recommendation: set up Human Language Project
• Also: establish Multilingual Digital Single Market
• Informed by “LT for Multilingual Europe” survey
• Takes into account: CEF AT, DSM, NGI
• To be presented at META-FORUM 2017 (13/14 Nov. 2017)
34
Summary & Conclusions
• AI is disrupting all industries – including translation.
• But: perfect machine translation is still far away.
• Not only are tools for gist translation getting better and
better, so are tools for human translators!
• Translators can expect to make use of a vastly improved
(adaptive) tool landscape in the next couple of years.
• We are collaborating with human translators better to
understand how translation processes work.
• The goal of the Human Language Project is to move
Europe into the pole position in this field.
35EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
13/14 November 2017
Brussels, Belgium – http://www.meta-forum.eu
Register now! Participation is free of charge.
Thank you!
Many thanks to Josef van Genabith, Christian Dugast,
Andrea Lösch (all DFKI) and to Maria Giagkou (ILSP).
Dr. Georg Rehm
DFKI Berlin
! georg.rehm@dfki.de
! http://de.linkedin.com/in/georgrehm
! https://www.slideshare.net/georgrehm
Human
Language
Project
Truly
Multilingual
Europe
European
Economy
(MDSM)
Attractive
jobs for
high
potentials
Education
and young
researchers
Massive
boost for
research
Foster
innovation
and new
companies
13/14 November 2017
Brussels, Belgium – http://www.meta-forum.eu
Register now! Participation is free of charge.

Weitere ähnliche Inhalte

Ähnlich wie AI for Translation Technologies and Multilingual Europe

Ähnlich wie AI for Translation Technologies and Multilingual Europe (20)

TAUS MT Showcase, MT@EC for European public administrations and online servic...
TAUS MT Showcase, MT@EC for European public administrations and online servic...TAUS MT Showcase, MT@EC for European public administrations and online servic...
TAUS MT Showcase, MT@EC for European public administrations and online servic...
 
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationTowards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
 
European Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and FutureEuropean Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and Future
 
Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital Europe
 
META-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeMETA-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for Europe
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
Language Resources for Multilingual Europe
Language Resources for Multilingual EuropeLanguage Resources for Multilingual Europe
Language Resources for Multilingual Europe
 
Is MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
Is MT ready for e-Government? The Latvian Story. Indra Samite, TildeIs MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
Is MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
 
META-NET: Language Technology for Europe
META-NET: Language Technology for EuropeMETA-NET: Language Technology for Europe
META-NET: Language Technology for Europe
 
A Strategic Research and Innovation Agenda for the Multilingual Digital Singl...
A Strategic Research and Innovation Agenda for the Multilingual Digital Singl...A Strategic Research and Innovation Agenda for the Multilingual Digital Singl...
A Strategic Research and Innovation Agenda for the Multilingual Digital Singl...
 
The Strategic Agenda for the Multilingual Digital Single Market V0.9
The Strategic Agenda for the Multilingual Digital Single Market V0.9The Strategic Agenda for the Multilingual Digital Single Market V0.9
The Strategic Agenda for the Multilingual Digital Single Market V0.9
 
The Strategic Impact of META-NET on the Regional, National and International ...
The Strategic Impact of META-NET on the Regional, National and International ...The Strategic Impact of META-NET on the Regional, National and International ...
The Strategic Impact of META-NET on the Regional, National and International ...
 
TAUS Roundtable Moscow, Is MT Ready for e-Government, The Latvian Story, Indr...
TAUS Roundtable Moscow, Is MT Ready for e-Government, The Latvian Story, Indr...TAUS Roundtable Moscow, Is MT Ready for e-Government, The Latvian Story, Indr...
TAUS Roundtable Moscow, Is MT Ready for e-Government, The Latvian Story, Indr...
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
 
The META-NET Strategic Research Agenda and Linked Open Data
The META-NET Strategic Research Agenda and Linked Open DataThe META-NET Strategic Research Agenda and Linked Open Data
The META-NET Strategic Research Agenda and Linked Open Data
 
Computational Morphology and the META-NET Strategic Research Agenda for Multi...
Computational Morphology and the META-NET Strategic Research Agenda for Multi...Computational Morphology and the META-NET Strategic Research Agenda for Multi...
Computational Morphology and the META-NET Strategic Research Agenda for Multi...
 
META-NET: Towards a Strategic Research Agenda for Multilingual Europe
META-NET: Towards a Strategic Research Agenda for Multilingual EuropeMETA-NET: Towards a Strategic Research Agenda for Multilingual Europe
META-NET: Towards a Strategic Research Agenda for Multilingual Europe
 
The META-NET Language White Paper Series
The META-NET Language White Paper SeriesThe META-NET Language White Paper Series
The META-NET Language White Paper Series
 

Mehr von Georg Rehm

Mehr von Georg Rehm (20)

QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...
 
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
 
Künstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und ÜbersetzenKünstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und Übersetzen
 
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
 
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) ÜberblickKI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
 
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
 
Kuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KIKuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KI
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film Industry
 
KI für die Kundenkommunikation
KI für die KundenkommunikationKI für die Kundenkommunikation
KI für die Kundenkommunikation
 
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
 
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenDigitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
 
EPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3CEPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3C
 
Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungssze...
Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungssze...Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungssze...
Digitale Kuratierungstechnologien für verschiedene Branchen und Anwendungssze...
 
Generische Kuratierungstechnologien für spezifische Anwendungsfälle: Hintergr...
Generische Kuratierungstechnologien für spezifische Anwendungsfälle: Hintergr...Generische Kuratierungstechnologien für spezifische Anwendungsfälle: Hintergr...
Generische Kuratierungstechnologien für spezifische Anwendungsfälle: Hintergr...
 
Curation Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeCuration Technologies for Multilingual Europe
Curation Technologies for Multilingual Europe
 
Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?
 
Globale Standards im Web of Things
Globale Standards im Web of ThingsGlobale Standards im Web of Things
Globale Standards im Web of Things
 
W3C/DFKI Automotive Workshop
W3C/DFKI Automotive WorkshopW3C/DFKI Automotive Workshop
W3C/DFKI Automotive Workshop
 
Digitale Kuratierungstechnologien – Beispiele aus ausgewählten Branchen
Digitale Kuratierungstechnologien – Beispiele aus ausgewählten BranchenDigitale Kuratierungstechnologien – Beispiele aus ausgewählten Branchen
Digitale Kuratierungstechnologien – Beispiele aus ausgewählten Branchen
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

AI for Translation Technologies and Multilingual Europe

  • 1. Georg Rehm georg.rehm@dfki.de DFKI GmbH, Language Technology Lab – Berlin, Germany META-NET, General Secretary AI for Translation Technologies and Multilingual Europe
  • 2. Outline • Artificial Intelligence • Technology Support for Multilingual Europe • European MT Research – Results from QT21 • Connecting Europe Facility – Automated Translation • Towards the Human Language Project • Conclusions 2EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 3. Artificial Intelligence 3EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 4. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 4
  • 5. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 5
  • 6. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 6
  • 7. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 7
  • 8. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 8 Data Intelligence Current breakthroughs based on Machine Learning (Deep Learning) Also still in use: symbolic, rule-based methods and systems Artificial Intelligence • Huge data sets + powerful algorithms + extremely fast hardware • Self-driving cars, robots, image recognition, machine translation • Enormous potential for disruptions in all sectors and areas
  • 9. Technology Support for Multilingual Europe 9EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 10. • Multilingualism is at the very heart of the European idea • 24 EU languages – all languages have the same status • Dozens of regional and minority languages as well as languages of immigrants and trade partners • Economic challenges: – If the DSM is not multilingual, there will be 20+ isolated markets – Language barriers are market barriers • Social and public challenges: – Empower all citizens to use their mother tongues – Enable cross-border, cross-lingual, cross-cultural communication – Provide multilingual digital public services – Restore trust in media (fake news debate, filter bubble issue etc.)
  • 11. q 60 research centres in 34 countries (founded in 2010) Chair of Executive Board: Jan Hajic (CUNI) Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI) q Multilingual Europe Technology Alliance. 826 members in 67 countries (published in 2013) (31 volumes; published in 2012) T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
  • 12. q Basque q Bulgarian* q Catalan q Croatian* q Czech* q Danish* q Dutch* q English* q Estonian* q Finnish* q French* q Galician q German* q Greek* q Hungarian* q Icelandic q Irish* q Italian* q Latvian* q Lithuanian* q Maltese* q Norwegian q Polish* q Portuguese* q Romanian* q Serbian q Slovak* q Slovene* q Spanish* q Swedish* q Welsh * Official EU languagehttp://www.meta-net.eu/whitepapers
  • 13. MT English good French, Spanish moderate fragmentary Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian weak or no support through LT Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish, Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh excellent Czech, Dutch, Finnish, French, German, Italian, Portuguese, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish weak or no support through LT Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh excellent English good Speech English good Dutch, French, German, Italian, Spanish moderate fragmentary Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish weak or no support through LT Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian, Welsh excellent English good Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish, Swedish moderate fragmentary Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician, Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh weak or no support through LTexcellent ResourcesTextAnalytics
  • 14. Fragmentary Weak/none Moderate Good Excellent Welsh Maltese Lithuanian Latvian Icelandic Irish Croatian Serbian Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English Levelofsupport Languages with names in red have little or no MT support Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors) Important: even current state of the art technologies are far from being perfect! Important: 20+ European languages are severely under-supported and face the danger of digital extinction.
  • 15. Excellent Good Moderate Fragmentary Weak/no support LanguageTechnologySupport MillionsofNativeSpeakers(Worldwide) Yiddish Welsh VlaxRomani Turkish Scots Romany Occitan Maltese Macedonian Luxembourgish Lithuanian Limburgish Latvian Icelandic Friulian Frisian Breton Bosnian Asturian Albanian Irish Croatian Serbian Hebrew Estonian Slovene Slovak Romanian Norwegian Greek Galician Danish Bulgarian Basque Swedish Portuguese Finnish Catalan Polish Hungarian Czech Italian German Dutch Spanish French English 0 50 100 150 200 250 300 350 400 Source: Georg Rehm, Hans Uszkoreit, Ido Dagan, Vartkes Goetcherian, Mehmet Ugur Dogan, Coskun Mermer, Tamás Váradi, Sabine Kirchmeier-Andersen, Gerhard Stickel, Meirion Prys Jones, Stefan Oeter, and Sigve Gramstad. An Update and Extension of the META-NET Study “Europe's Languages in the Digital Age”. In Proceedings of the Workshop on Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era (CCURL 2014), Reykjavik, Iceland, May 2014.
  • 16. European Machine Translation Research – Results from QT21 16EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 17. Research & Innovation Action 2015-18 Coordinator:   Josef  van  Genabith (DFKI) 17
  • 18. • Morphologically rich languages (De, Cz, Lv) • Under-resourced languages (Lv, Ro) • Quality Assessment: MQM – DQF • Learning from human feedback (APE) • Evaluation framework: WMT – Event series to present and discuss results from MT evaluations – Procedures: Automatic scoring (BLEU etc.) and human judgements (large number of human annotators) • Shared tasks (newspaper translation, quality estimation, metrics and automatic post-editing) QT21 is Improving Automatic Translation 18
  • 19. Human Judgement Rankings 64 53 1 3First 66 65 3 6 First  + Second 2015 2016 2017 QT21 Best Online 19 WMT Newspaper Translation Task • En ó Cz • En ó De • En ó Fr • En ó Cz • En ó De • En ó Ro • En ó Cz • En ó De • En ó Lv
  • 20. 0 5 10 15 20 25 30 35 40 En  -­>  De De  -­>  En En  -­>  Cz Cz  -­>  En QT21  improvement  in  the  last  12  months  vs.  online  systems QT21-­WMT-­2016 Online  WMT-­2017 QT21-­WMT-­2017 WMT 2016 System on WMT 2017 Data 20
  • 21. • Data sets are the fuel for neural networks • QT21’s neural technologies define the state of the art • Ranked #1 in more than 80% of all tasks at WMT 2017 • Also predominantly ranked #1 at WMT 2016 • QT21 keeps commercial systems at a distance • Huge improvements on morphologically rich languages • MQM as a standard for quality evaluation Selected Results 21
  • 22. Connecting Europe Facility: Automated Translation (CEF AT) 22EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 23. Connecting Europe 23 • EU flagship goal: Establishing the Digital Single Market • Overcoming existing barriers – by creating an environment for digital services to flourish – by providing cross-border infrastructures and services. • Sectorial CEF Digital Service Infrastructures (DSIs) This also includesODRBRIS eHealth EESSI Citizens need to solve disputes online across borders Citizens and business partners need legal certainty when doing business cross-border Citizens need to have online access to their patient summary when abroad Citizens need to get to enjoy their social security seamlessly and online when abroad eProcurement Open Data e-Justice Cyber Security Safer Internet …
  • 24. 24 • Technological CEF building blocks can be used by the different DSIs (e.g., eInvoicing, eSignature etc.) • Most important in this context: CEF eTranslation – Why? To help European and national public administrations exchange information across language barriers – How? By providing MT capabilities that will enable digital services (in particular all DSIs) to be multilingual. • CEF eTranslation builds on MT@EC • Guarantees confidentiality and security of translated data • è ELRC contract Connecting Europe Coordinator: Josef  van  Genabith (DFKI)
  • 25. European Language Resource Coordination 2525 • Language resourcesCollect • Needs of public servicesIdentify • With the public sector in the identification of Language ResourcesEngage • With any technical or legal issuesHelp • Observatory for language resources across EuropeAct
  • 26. What has been achieved? 0 20 40 60 80 100 120 140 160 Bi-­/Multilingual  Corpora Terminologies Monolingual  Corpuora LR contributions by type Status:  April  2017 • 225 language resources collected • More than 2 billion words in all EU official languages, Norwegian and Icelandic • Over 450,000 terms • More than 2 million translation units • More than 91 resources to be used by you!
  • 27. ELRC for you 27 • ELRC-SHARE Repository – Access to, sharing and contribution of LRs – Access to tools and services catalogue (forthcoming) – http://www.lr-coordination.eu/resources • ELRC Technical and Legal Helpdesk – Support for potential data donors (phone, email) – http://www.lr-coordination.eu/helpdesk • ELRC On-site assistance – http://www.lr-coordination.eu/services
  • 28. Current Developments 28EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 29. • Multilingual Europe: our languages enjoy equal status yet digital extinction of the majority of EU languages is a very severe danger. • Language Technology Research and Innovation in Europe: World class research results (e.g., in QT21), strong SME base, thousands of LSPs; fragmentation; need for coordination. • Big need for high-quality Language Technologies: translation, personal assistants, multilingual DSM etc. (example: CEF). • AI: Important breakthroughs and massive investments in R&D and applications (mostly in US, Asia) – huge opportunity for Europe! • The European Language Challenge cannot be abandoned or outsourced! Ø Need for Language Technology made in Europe for Europe! Current Developments 29EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 30. Towards the Human Language Project 30EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 31. • STOA Workshop in European Parliament (January 2017): “Language equality in the digital age – towards a Human Language Project” • Human Language Project vision suggested in several presentations • STOA Study, published in March 2017, does recommend setting up the HLP Ø http://www.stoa.europarl.europa.eu/stoa/cms/home/workshops/language STUDY EPRS | European Parliamentary Research Service Scientific Foresight Unit (STOA) PE 581.621 Science and Technology Options Assessment
  • 32. 32 • Goal: Deep Natural Language Understanding by 2030 • AI for Next Generation Language Technology • Large-scale EU funding programme for basic and applied research as well as innovation (10-15 years) • New breakthroughs for research, industry and society to foster a multitude of innovations. Artificial Intelligence including cognition, perception, vision, cross-modal, cross-platform, cross-culture, IoT etc. Machine Learning Language TechnologyKnowledge Technology EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies Human Language Project
  • 33. • All official European and many additional languages • Broad coverage, HQ, high precision – across modalities, across platforms, across cultures • Collaboration between EU, EC, EP, Member States, research, industry, other stakeholders. • Basic and applied research, innovation, commercialisation • Policy change towards “LT-enabled multilingualism” • HQMT – overcome quality (and language) barriers, written and spoken, collaborate with human translators • Resources and technologies for all European languages 33EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies Human Language Project
  • 34. http://www.cracker-project.eu • http://www.meta-net.eu Version 1.0 of the SRIA • Strategic Agenda: “Language Technologiesfor Multilingual Europe – Towards a Human Language Project” • Key recommendation: set up Human Language Project • Also: establish Multilingual Digital Single Market • Informed by “LT for Multilingual Europe” survey • Takes into account: CEF AT, DSM, NGI • To be presented at META-FORUM 2017 (13/14 Nov. 2017) 34
  • 35. Summary & Conclusions • AI is disrupting all industries – including translation. • But: perfect machine translation is still far away. • Not only are tools for gist translation getting better and better, so are tools for human translators! • Translators can expect to make use of a vastly improved (adaptive) tool landscape in the next couple of years. • We are collaborating with human translators better to understand how translation processes work. • The goal of the Human Language Project is to move Europe into the pole position in this field. 35EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
  • 36. 13/14 November 2017 Brussels, Belgium – http://www.meta-forum.eu Register now! Participation is free of charge.
  • 37. Thank you! Many thanks to Josef van Genabith, Christian Dugast, Andrea Lösch (all DFKI) and to Maria Giagkou (ILSP). Dr. Georg Rehm DFKI Berlin ! georg.rehm@dfki.de ! http://de.linkedin.com/in/georgrehm ! https://www.slideshare.net/georgrehm Human Language Project Truly Multilingual Europe European Economy (MDSM) Attractive jobs for high potentials Education and young researchers Massive boost for research Foster innovation and new companies 13/14 November 2017 Brussels, Belgium – http://www.meta-forum.eu Register now! Participation is free of charge.