SlideShare a Scribd company logo
1 of 15
Download to read offline
Language-Independent NLP with
Deep Learning
Raphael Villedieu, Roman Teucher
19/20.06.2019
www.deecoob.com
www.deecoob.com 1
Data - Information - InsightData - Information - Insight
www.deecoob.com 1
1. Use Case at deecoob
2. Current mono-lingual approaches
3. International Use Case
4. Transformer Models
5. Multi-language capabilities
Overview
www.deecoob.com 2
Data - Information - InsightData - Information - Insight
www.deecoob.com 2
1. Use Case at deecoob
www.deecoob.com 3
Data - Information - InsightData - Information - Insight
www.deecoob.com 3
● pre-labeled text corpus
● Tf-Idf as text features
● Naïve Bayes as classifier
2. Current mono-lingual approaches
www.deecoob.com 4
Data - Information - InsightData - Information - Insight
www.deecoob.com 4
● Music event detection world-wide
○ different languages
○ texts containing more than one language
○ different character sets (umlauts, accents,
cyrillic, hebrew)
3. International Use Case
www.deecoob.com 5
Data - Information - InsightData - Information - Insight
www.deecoob.com 5
3. International Use Case : Naïve Approach
www.deecoob.com 6
Data - Information - InsightData - Information - Insight
www.deecoob.com 6
3. International Use Case : BERT approach
www.deecoob.com 7
Data - Information - InsightData - Information - Insight
www.deecoob.com 7
● use deep neural network
encoder-decoder
○ can process multiple languages at once
○ learns language-independent model
● input entire sequence at once
● make heavy use of attention
4. Transformer Models
www.deecoob.com 8
Data - Information - InsightData - Information - Insight
www.deecoob.com 8
Encoder-Decoder Stack
4. Transformer - Google BERT
http://jalammar.github.io/illustrated-transformer/
www.deecoob.com 9
Data - Information - InsightData - Information - Insight
www.deecoob.com 9
4. Transformer - Google BERT
https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/
model token dependencies through
multi-head self-attention
www.deecoob.com 10
Data - Information - InsightData - Information - Insight
www.deecoob.com 10
Step 1 :
● training on masked words
● randomly mask 15% of words
○ words “do not see themselves” in training
● Trained with Wikipedia corpus 104 languages, 12-layer, 768-hidden, 12-heads, 110M
parameters
4. Transformer - Google BERT
https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
www.deecoob.com 11
Data - Information - InsightData - Information - Insight
www.deecoob.com 11
Step 2 : two-sentence training
4. Transformer - Google BERT
https://www.kdnuggets.com/2018/12/bert-sota-nlp-model-explained.html
www.deecoob.com 12
Data - Information - InsightData - Information - Insight
www.deecoob.com 12
5. Multi-language capabilities
www.deecoob.com 13
Data - Information - InsightData - Information - Insight
www.deecoob.com 13
Results
www.deecoob.com 14
Data - Information - Insight
deecoob Technology GmbH
+49 (0) 351 410 470
www.deecoob.com
info@deecoob.com

More Related Content

Similar to Language independent nlp with deep learning

Similar to Language independent nlp with deep learning (20)

Drupal Internationalization Presentation at OSCMS
Drupal Internationalization Presentation at OSCMS Drupal Internationalization Presentation at OSCMS
Drupal Internationalization Presentation at OSCMS
 
FLOSS development
FLOSS developmentFLOSS development
FLOSS development
 
What is (not) Pharo 8?
What is (not) Pharo 8?What is (not) Pharo 8?
What is (not) Pharo 8?
 
NIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate reportNIF 2.0 Phd thesis intermediate report
NIF 2.0 Phd thesis intermediate report
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Drupal entity translation
Drupal entity translationDrupal entity translation
Drupal entity translation
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
The *on-going* future of Perl5
The *on-going* future of Perl5The *on-going* future of Perl5
The *on-going* future of Perl5
 
PyTorch for Delphi - Python Data Sciences Libraries.pdf
PyTorch for Delphi - Python Data Sciences Libraries.pdfPyTorch for Delphi - Python Data Sciences Libraries.pdf
PyTorch for Delphi - Python Data Sciences Libraries.pdf
 
Free and Open Source Software technology: General Overview
Free and Open Source Software technology: General OverviewFree and Open Source Software technology: General Overview
Free and Open Source Software technology: General Overview
 
Free and Open Source Software technology: General Overview
Free and Open Source Software technology: General OverviewFree and Open Source Software technology: General Overview
Free and Open Source Software technology: General Overview
 
P1 2017 python
P1 2017 pythonP1 2017 python
P1 2017 python
 
Everyday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge
 
Preserving access
Preserving accessPreserving access
Preserving access
 
How to choose a programming language 2.20.18 sd
How to choose a programming language   2.20.18 sdHow to choose a programming language   2.20.18 sd
How to choose a programming language 2.20.18 sd
 
F# Functional and MultiCore Programming
F# Functional and MultiCore Programming F# Functional and MultiCore Programming
F# Functional and MultiCore Programming
 
Functional rotterdam-20-09-16
Functional rotterdam-20-09-16Functional rotterdam-20-09-16
Functional rotterdam-20-09-16
 
Introduction to PHP (SDPHP)
Introduction to PHP   (SDPHP)Introduction to PHP   (SDPHP)
Introduction to PHP (SDPHP)
 

More from VANDA - Visual Analytics Interfaces for Big Data Environments

More from VANDA - Visual Analytics Interfaces for Big Data Environments (7)

Eventströme im E-Learning
Eventströme im E-LearningEventströme im E-Learning
Eventströme im E-Learning
 
Qualitative Trainingsdaten für Machine Learning effizient gewinnen
Qualitative Trainingsdaten für Machine Learning effizient gewinnenQualitative Trainingsdaten für Machine Learning effizient gewinnen
Qualitative Trainingsdaten für Machine Learning effizient gewinnen
 
Active Learning for Record Linkage
Active Learning for Record LinkageActive Learning for Record Linkage
Active Learning for Record Linkage
 
Visual Analytics Interfaces for Big Data Environments
Visual Analytics Interfaces for Big Data EnvironmentsVisual Analytics Interfaces for Big Data Environments
Visual Analytics Interfaces for Big Data Environments
 
Exploring Big Data Landscapes with Elastic Displays
Exploring Big Data Landscapes with Elastic DisplaysExploring Big Data Landscapes with Elastic Displays
Exploring Big Data Landscapes with Elastic Displays
 
A Framework for Training Hybrid Recommender Systems
A Framework for Training Hybrid Recommender SystemsA Framework for Training Hybrid Recommender Systems
A Framework for Training Hybrid Recommender Systems
 
Towards Glyph-based Visualizations for Big Data Clustering
Towards Glyph-based Visualizations for Big Data ClusteringTowards Glyph-based Visualizations for Big Data Clustering
Towards Glyph-based Visualizations for Big Data Clustering
 

Recently uploaded

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Recently uploaded (20)

💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 

Language independent nlp with deep learning

  • 1. Language-Independent NLP with Deep Learning Raphael Villedieu, Roman Teucher 19/20.06.2019 www.deecoob.com
  • 2. www.deecoob.com 1 Data - Information - InsightData - Information - Insight www.deecoob.com 1 1. Use Case at deecoob 2. Current mono-lingual approaches 3. International Use Case 4. Transformer Models 5. Multi-language capabilities Overview
  • 3. www.deecoob.com 2 Data - Information - InsightData - Information - Insight www.deecoob.com 2 1. Use Case at deecoob
  • 4. www.deecoob.com 3 Data - Information - InsightData - Information - Insight www.deecoob.com 3 ● pre-labeled text corpus ● Tf-Idf as text features ● Naïve Bayes as classifier 2. Current mono-lingual approaches
  • 5. www.deecoob.com 4 Data - Information - InsightData - Information - Insight www.deecoob.com 4 ● Music event detection world-wide ○ different languages ○ texts containing more than one language ○ different character sets (umlauts, accents, cyrillic, hebrew) 3. International Use Case
  • 6. www.deecoob.com 5 Data - Information - InsightData - Information - Insight www.deecoob.com 5 3. International Use Case : Naïve Approach
  • 7. www.deecoob.com 6 Data - Information - InsightData - Information - Insight www.deecoob.com 6 3. International Use Case : BERT approach
  • 8. www.deecoob.com 7 Data - Information - InsightData - Information - Insight www.deecoob.com 7 ● use deep neural network encoder-decoder ○ can process multiple languages at once ○ learns language-independent model ● input entire sequence at once ● make heavy use of attention 4. Transformer Models
  • 9. www.deecoob.com 8 Data - Information - InsightData - Information - Insight www.deecoob.com 8 Encoder-Decoder Stack 4. Transformer - Google BERT http://jalammar.github.io/illustrated-transformer/
  • 10. www.deecoob.com 9 Data - Information - InsightData - Information - Insight www.deecoob.com 9 4. Transformer - Google BERT https://mchromiak.github.io/articles/2017/Sep/12/Transformer-Attention-is-all-you-need/ model token dependencies through multi-head self-attention
  • 11. www.deecoob.com 10 Data - Information - InsightData - Information - Insight www.deecoob.com 10 Step 1 : ● training on masked words ● randomly mask 15% of words ○ words “do not see themselves” in training ● Trained with Wikipedia corpus 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters 4. Transformer - Google BERT https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
  • 12. www.deecoob.com 11 Data - Information - InsightData - Information - Insight www.deecoob.com 11 Step 2 : two-sentence training 4. Transformer - Google BERT https://www.kdnuggets.com/2018/12/bert-sota-nlp-model-explained.html
  • 13. www.deecoob.com 12 Data - Information - InsightData - Information - Insight www.deecoob.com 12 5. Multi-language capabilities
  • 14. www.deecoob.com 13 Data - Information - InsightData - Information - Insight www.deecoob.com 13 Results
  • 15. www.deecoob.com 14 Data - Information - Insight deecoob Technology GmbH +49 (0) 351 410 470 www.deecoob.com info@deecoob.com