SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Erlangen
Artificial Intelligence &
Machine Learning Meetup
presents
Dr. Jonas Rende and Thomas Stadelmann, DATEV eG
NLP@DATEV
Setting up a domain specific language model
1. Introduction to NLP („gentle“): Basic Idea of a language model
2. The Game Changer: Transfer Learning
3. Proof of Concept (PoC): Goals and key insights
4. Go hard or go home: Technical deep dive
Agenda
19.11.2020 Page 2AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Natural Language Processing: Definition
19.11.2020 Page 3
AI (especially
Deep Learning)
Computer
science
Information
processingLinguistics
NLP
Algorithmic processing and
analysis of natural language
AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
The (very very) Big Picture
19.11.2020 Page 4AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Corpus DATEV-LM
➢ Textual basis for a
linguistic model (language
model)
➢ Public german text data
➢ Public domain data
➢ DATEV-own domain
data
➢ Spoken language is evolving over time and
can not be fully defined by formal criteria in
contrast to a programming language
➢ Data driven approach: learning a language
by observing real world examples (corpus)
➢ Statistical language models: (SOTA: Deep
Learning):
Downstream
Tasks
➢ Tasks („head“) we want
to solve with the
corresponding
language model e.g.
problem classification
Probabilistic Language Models
19.11.2020 Page 5AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Outcome for further processing
− Vector representation of words
➢ Bag of Words
➢ Embeddings
➢ Contextualized Embeddings
𝑃 𝑤 = 𝑃(𝑤1, 𝑤2, … , 𝑤 𝑚−1 , 𝑤 𝑚)
Mathematical
Let 𝑤 be a sequence of
words with the length 𝑚
Definition (Page 105, Neural Network
Methods in Natural Language Processing,
2017)
Language modeling is
the task of assigning a
probability to sentences in a
language. […] Besides assigning
a probability to each sequence
of words, the language models
also assigns a probability for the
likelihood of a given word (or
a sequence of words) to follow a
sequence of words
𝑃(𝑤 𝑚|𝑤1, 𝑤2, … , 𝑤 𝑚−2 , 𝑤 𝑚−1)
The Game Changer: Transfer Learning
19.11.2020 Page 6AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Source: https://ruder.io/transfer-learning/index.html#whatistransferlearning
“Transfer learning is a means to extract knowledge from a
source setting and apply it to a different target setting.”
19.11.2020 Page 7AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Source: https://ruder.io/state-of-transfer-learning-in-nlp/ and deepset.ai
Source Task / Domain Target Task / Domain
Model A Model B
Knowledge
▪ Less labels required
▪ Re-usable for multiple problems
▪ Solve problems in new domains
▪ Faster development
▪ Performance
Why transfer learning?
Transfer Learning in NLP
19.11.2020 Page 8AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Source: Ruder (2019): A taxonomy for transfer learning in NLP
▪ Pretrain (vector) representations on a
large unlabelled text corpus with a
suitable method (e.g. word2vec, BERT, …)
▪ Unlabelled text corpus examples: Public
german text data (e.g. Wikipedia), Public
domain data (e.g. german legal), DATEV-
own domain data
Source: https://ruder.io/state-of-transfer-learning-in-nlp/
Extract pretrained model (e.g.,
representations and parameters)
Knowledge
Source
data
▪ Adapt your extracted (vector)
representations to a supervised learning
target task, using your labelled data
▪ Examples for tasks: Classification,
sentiment analysis, named entity
recognition, intent recognition, neural
search, question answering, text
summariziation, …
Target data
False Friend ☺
Transfer Learning is changing Research
19.11.2020 Page 9AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Source: Taken from deepset.ai
Embeddings (Word2Vec, Glove)
19.11.2020 Page 10AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Most famous example: King – Man + Woman = Queen
− Word Embeddings are real-valued dense vector representations (1𝑥𝑑) for words you can
perform mathematical operations on
− e.g. bat
− Based on co-occurence statistics which allows to capture semantic similarities
Source: https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
0.2 0,5 0,1 0,0 3,1 1,2 2,3 0,5
Contextualized Embeddings (e.g. Transformer)
19.11.2020 Page 11AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Embeddings are not context sensitive
The bat hits the baseball bat
The bat is sleeping in the cave bat
Models build up on the transformer architecture (e.g. BERT) are solving this problem
The bat hits the baseball bat
The bat is sleeping in the cave bat
Currently we are focusing on BERT
0.2 0,5 0,1 0,0 3,1 1,2 2,3 0,5
0.2 0,5 0,1 0,0 3,1 1,2 2,3 0,5
0.2 0,5 0,1 0,0 3,1 1,2 2,3 0,5
0.7 0,0 4,9 1,3 2,3 8,7 2,4 1,3
Domain knowledge is the key to success
19.11.2020 Page 12AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
Why domain knowledge?
− Publicly available language models are
pretrained on public text sources e.g. on
the entire articles on Wikipedia
− Freely available language models show
weaknesses if the documents contain a lot
of domain knowledge e.g. tax or legal
terms
How to take domain knowledge into account?
− Benchmark: GermanBERT by deepset.ai (no domain
knowledge)
− Language Modell Adaption: An existing Model is enriched
with domain knowledge (up to 10% of vocabulary at BERT)
− From Scratch: The Model ist trained from scratch so that the
proportion of domain knowledge can vary between 0% and
100%
We started with LM adaption. Currently, we evaluate training from scratch
Proof of concept:
research questions for language model domain adaptation
19.11.2020 Page 13AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
➢ Does incorporating our domain data improve the results of our
downstream tasks?
➢ How to set up a domain specific corpus containing multiple
subdomains?
➢ How mature are current open-source frameworks for NLP
transfer learning?
Hardware and Software stack
19.11.2020 Page 14AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
PyTorch
CUDA
GPU
Transformers
Apex
FARM
− https://github.com/deepset-ai/FARM
− https://github.com/huggingface/transformers
− https://github.com/pytorch/pytorch
− https://github.com/NVIDIA/apex
− We use GermanBERT as initial model which has
10% sentence piece tokens non-allocated
− We insert the top 3000 most important
sentence pieces of our corpus
− And train on…
− 1,2 GB of raw domain specific text
Evaluation
19.11.2020 Page 15AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
− We evaluate on 2 domain specific classification use cases
− Each task represents a subdomain:
▪ Service & Support (dataset: n ~ 1100)
▪ Legal & Professional (dataset: n ~ 350)
− Competing Benchmarks are
▪ GermanBERT (non-adapted)
▪ A self-trained fasttext model
− High variance in the evaluation runs: at least 25 runs
− Data Augmentation
▪ 4-Fold Backtranslation with fairseq‘s WMT-19 Model
Proof of concept: Key insights
19.11.2020 Page 16AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
➢ Does incorporating our
domain data improve the
results of our downstream
tasks?
Details
− Magnitude of improvement on par with previous results in literature (~2% p.
f1-macro improvement) for one use case (n~1100) and vastly exceeding on
the other (n~350)
− BERT-based approaches perform way better than fasttext
− Corpus composition is essential (especially if you have subdomains)
− There is not one language model to solve all tasks
− MLM Head (much) more important than NSP Head
f1-macro scores of evaluation tasks
adapted baseline adapted baseline
service & support legal & professional
Proof of concept: Key insights
19.11.2020 Page 17AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
➢ How to set up a domain specific corpus?
Answers
− Don‘t create one big file, keep it modular
− Split corpus by subdomains, document type, etc.
− Updating/Versioning of corpus needed as completely new topics
may arise (e.g. corona)
− Common preprocessing logic using adapters for different data
sources
− Sentence splitting is one of the hardest but not the most
important preprocessing step
Proof of concept: Key insights
19.11.2020 Page 18AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
➢ How mature are current open-source
frameworks for NLP transfer learning?
Answers
− All frameworks seem to be pretty stable (e.g. almost no breaking
changes or rough edges)
− But keep in mind: There is a trade-off between rapid progress and
stability …and progress is very fast
− Biggest pain are still varying environments (e.g. linux vs. windows)
− if it‘s broken, you can fix it
Why Deep Learning?
19.11.2020 Page 19AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
− Universal Approximation
Theorem (Hornik 1991)
− Availability of vast amounts of
data
− Availability of compute power
(GPUs)
− Obtain good results with ease in
comparision to excessive feature
engineering
− SOTA-Results in most ML-
disciplines https://www.slideshare.net/ExtractConf (Slide 30) by Andrew Ng
Neural Network Building Blocks
19.11.2020 Page 20AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
http://matrixmultiplication.xyz/
Operating principle of neural
networks visualised
http://neuralnetworksanddeeplearning.com/chap4.html
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6?gi=b4754d64054
https://hackernoon.com/gradient-descent-aynk-7cbe95a778da
https://www.asimovinstitute.org/neural-network-zoo/
NLP Network Architecture
19.11.2020 Page 21AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
− Key idea: Using the same information multiple times by applying different
transformations
− Context: locality, attention
A brief history of Deep Learning Network Architectures:
− Convolutional Neural Networks (CNNs) (Krizhevsky 2012)
− Recurrent Neural Networks (RNNs) (Mikolov 2010)
− ResNet (He 2016), LSTMs (Merity 2017), Transformers (Vaswani 2017), ELMo
(Peters 2018), ULMFit (Howard 2018), BERT (Devlin 2018), …
BERT and its origin
19.11.2020 Page 22AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
BERT Architecture
19.11.2020 Page 23AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
http://jalammar.github.io/illustrated-bert/
Context in NLP: Attention
19.11.2020 Page 24AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
− Semantic meaning of words
is context dependent
− Calculate scores that depict
the corresponding influence
of surrounding words
− Incorporate the meaning of
surrounding words according
to this score
Created with bertviz: https://github.com/jessevig/bertviz.git
Self-Attention
19.11.2020 Page 25AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
http://jalammar.github.io/illustrated-transformer/
− 3 Transformations (Matrices)
− Key: does the Query word’s
meaning depend on me?
− Query: does the Key word
influence my meaning?
− Value: word’s raw meaning
References (1/3)
◼ Hornik, Kurt. 1991.
Approximation capabilities of multilayer feedforward networks. In: Neural Networks 4. Jg., Nr. 2, S. 251-257.
http://www.vision.jhu.edu/teaching/learning/deeplearning18/assets/Hornik-91.pdf
◼ Merity, Stephen; Keskar, Nitish Shirish; Socher, Richard. 2017.
Regularizing and optimizing LSTM language models. arXiv preprint.
https://arxiv.org/abs/1708.02182
◼ Devlin, Jacob; Chang, Ming-Wie; Lee, Kenton; Toutanova, Kristina. 2018.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
https://arxiv.org/abs/1810.04805
◼ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz;
Polosukhin, Illia. 2017. Attention Is All You Need.
https://arxiv.org/abs/1706.03762
◼ Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. 2018.
Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
https://arxiv.org/abs/1802.05365
References (2/3)
◼ He, K., Zhang, X., Ren, S., & Sun, J. 2016.
Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 770-778).
https://arxiv.org/abs/1512.03385
◼ Howard, Jeremy; Ruder, Sebastian. 2018.
Universal Language Model Fine-tuning for Text Classification.
https://arxiv.org/abs/1801.06146
◼ Krizhevsky, A., Sutskever, I., & Hinton, G. E. 2012.
Imagenet classification with deep convolutional neural networks. In Advances in neural information
processing systems (pp. 1097-1105).
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
◼ Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. 2010.
Recurrent neural network based language model. In Eleventh annual conference of the international speech
communication association.
http://www.fit.vutbr.cz/research/groups/speech/servite/2010/rnnlm_mikolov.pdf
References (3/3)
◼ Allamar, Jay
Visualizing NLP Blog
http://jalammar.github.io/
◼ Ruder, Sebastian. 2019.
Neural Transfer Learning for Natural Language Processing.
https://ruder.io/thesis/
◼ Ruder, Sebastian.
NLP Transfer Learning Blog.
https://ruder.io/
◼ Stanford NLP with Deep Learning Kurs
◼ fast.ai: Practical Deep Learning for Coders.
◼ fast.ai: NLP Kurs
© DATEV eG, all rights reserved
Dr. Jonas Rende
Thomas Stadelmann
DATEV eG
Jonas.rende@datev.de
Thomas.Stadelmann@datev.de
To learn more about the meetup, click the Link
https://www.meetup.com/Erlangen-Artificial-Intelligence-Machine-Learning-Meetup
Erlangen
Artificial Intelligence &
Machine Learning Meetup
presents

Weitere ähnliche Inhalte

Ähnlich wie NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Thomas Stadelmann, DATEV eG

Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsShreyas Suresh Rao
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...Deep Learning Italia
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 
Nlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalNlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalJeffrey Shomaker
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003butest
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...TAUS - The Language Data Network
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Seth Grimes
 
Notes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop MapreduceNotes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop MapreduceEvert Lammerts
 
Sasaki webtechcon2010
Sasaki webtechcon2010Sasaki webtechcon2010
Sasaki webtechcon2010Felix Sasaki
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend Edureka!
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
 

Ähnlich wie NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Thomas Stadelmann, DATEV eG (20)

Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
Nlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalNlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_final
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003
 
Ontopia Code Camp
Ontopia Code CampOntopia Code Camp
Ontopia Code Camp
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
 
Notes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop MapreduceNotes on data-intensive processing with Hadoop Mapreduce
Notes on data-intensive processing with Hadoop Mapreduce
 
Hadoop.mapreduce
Hadoop.mapreduceHadoop.mapreduce
Hadoop.mapreduce
 
Sasaki webtechcon2010
Sasaki webtechcon2010Sasaki webtechcon2010
Sasaki webtechcon2010
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 

Mehr von Erlangen Artificial Intelligence & Machine Learning Meetup (7)

Knowledge Graphs, Daria Stepanova, Bosch Center for Artificial Intelligence
Knowledge Graphs, Daria Stepanova, Bosch Center for Artificial IntelligenceKnowledge Graphs, Daria Stepanova, Bosch Center for Artificial Intelligence
Knowledge Graphs, Daria Stepanova, Bosch Center for Artificial Intelligence
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
 
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
 
Ai for cultural history
Ai for cultural historyAi for cultural history
Ai for cultural history
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
Best practices for structuring Machine Learning code
Best practices for structuring Machine Learning codeBest practices for structuring Machine Learning code
Best practices for structuring Machine Learning code
 

Kürzlich hochgeladen

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 

Kürzlich hochgeladen (20)

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 

NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Thomas Stadelmann, DATEV eG

  • 2. Dr. Jonas Rende and Thomas Stadelmann, DATEV eG NLP@DATEV Setting up a domain specific language model
  • 3. 1. Introduction to NLP („gentle“): Basic Idea of a language model 2. The Game Changer: Transfer Learning 3. Proof of Concept (PoC): Goals and key insights 4. Go hard or go home: Technical deep dive Agenda 19.11.2020 Page 2AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
  • 4. Natural Language Processing: Definition 19.11.2020 Page 3 AI (especially Deep Learning) Computer science Information processingLinguistics NLP Algorithmic processing and analysis of natural language AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann
  • 5. The (very very) Big Picture 19.11.2020 Page 4AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Corpus DATEV-LM ➢ Textual basis for a linguistic model (language model) ➢ Public german text data ➢ Public domain data ➢ DATEV-own domain data ➢ Spoken language is evolving over time and can not be fully defined by formal criteria in contrast to a programming language ➢ Data driven approach: learning a language by observing real world examples (corpus) ➢ Statistical language models: (SOTA: Deep Learning): Downstream Tasks ➢ Tasks („head“) we want to solve with the corresponding language model e.g. problem classification
  • 6. Probabilistic Language Models 19.11.2020 Page 5AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Outcome for further processing − Vector representation of words ➢ Bag of Words ➢ Embeddings ➢ Contextualized Embeddings 𝑃 𝑤 = 𝑃(𝑤1, 𝑤2, … , 𝑤 𝑚−1 , 𝑤 𝑚) Mathematical Let 𝑤 be a sequence of words with the length 𝑚 Definition (Page 105, Neural Network Methods in Natural Language Processing, 2017) Language modeling is the task of assigning a probability to sentences in a language. […] Besides assigning a probability to each sequence of words, the language models also assigns a probability for the likelihood of a given word (or a sequence of words) to follow a sequence of words 𝑃(𝑤 𝑚|𝑤1, 𝑤2, … , 𝑤 𝑚−2 , 𝑤 𝑚−1)
  • 7. The Game Changer: Transfer Learning 19.11.2020 Page 6AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Source: https://ruder.io/transfer-learning/index.html#whatistransferlearning
  • 8. “Transfer learning is a means to extract knowledge from a source setting and apply it to a different target setting.” 19.11.2020 Page 7AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Source: https://ruder.io/state-of-transfer-learning-in-nlp/ and deepset.ai Source Task / Domain Target Task / Domain Model A Model B Knowledge ▪ Less labels required ▪ Re-usable for multiple problems ▪ Solve problems in new domains ▪ Faster development ▪ Performance Why transfer learning?
  • 9. Transfer Learning in NLP 19.11.2020 Page 8AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Source: Ruder (2019): A taxonomy for transfer learning in NLP ▪ Pretrain (vector) representations on a large unlabelled text corpus with a suitable method (e.g. word2vec, BERT, …) ▪ Unlabelled text corpus examples: Public german text data (e.g. Wikipedia), Public domain data (e.g. german legal), DATEV- own domain data Source: https://ruder.io/state-of-transfer-learning-in-nlp/ Extract pretrained model (e.g., representations and parameters) Knowledge Source data ▪ Adapt your extracted (vector) representations to a supervised learning target task, using your labelled data ▪ Examples for tasks: Classification, sentiment analysis, named entity recognition, intent recognition, neural search, question answering, text summariziation, … Target data False Friend ☺
  • 10. Transfer Learning is changing Research 19.11.2020 Page 9AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Source: Taken from deepset.ai
  • 11. Embeddings (Word2Vec, Glove) 19.11.2020 Page 10AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Most famous example: King – Man + Woman = Queen − Word Embeddings are real-valued dense vector representations (1𝑥𝑑) for words you can perform mathematical operations on − e.g. bat − Based on co-occurence statistics which allows to capture semantic similarities Source: https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/ 0.2 0,5 0,1 0,0 3,1 1,2 2,3 0,5
  • 12. Contextualized Embeddings (e.g. Transformer) 19.11.2020 Page 11AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Embeddings are not context sensitive The bat hits the baseball bat The bat is sleeping in the cave bat Models build up on the transformer architecture (e.g. BERT) are solving this problem The bat hits the baseball bat The bat is sleeping in the cave bat Currently we are focusing on BERT 0.2 0,5 0,1 0,0 3,1 1,2 2,3 0,5 0.2 0,5 0,1 0,0 3,1 1,2 2,3 0,5 0.2 0,5 0,1 0,0 3,1 1,2 2,3 0,5 0.7 0,0 4,9 1,3 2,3 8,7 2,4 1,3
  • 13. Domain knowledge is the key to success 19.11.2020 Page 12AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann Why domain knowledge? − Publicly available language models are pretrained on public text sources e.g. on the entire articles on Wikipedia − Freely available language models show weaknesses if the documents contain a lot of domain knowledge e.g. tax or legal terms How to take domain knowledge into account? − Benchmark: GermanBERT by deepset.ai (no domain knowledge) − Language Modell Adaption: An existing Model is enriched with domain knowledge (up to 10% of vocabulary at BERT) − From Scratch: The Model ist trained from scratch so that the proportion of domain knowledge can vary between 0% and 100% We started with LM adaption. Currently, we evaluate training from scratch
  • 14. Proof of concept: research questions for language model domain adaptation 19.11.2020 Page 13AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann ➢ Does incorporating our domain data improve the results of our downstream tasks? ➢ How to set up a domain specific corpus containing multiple subdomains? ➢ How mature are current open-source frameworks for NLP transfer learning?
  • 15. Hardware and Software stack 19.11.2020 Page 14AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann PyTorch CUDA GPU Transformers Apex FARM − https://github.com/deepset-ai/FARM − https://github.com/huggingface/transformers − https://github.com/pytorch/pytorch − https://github.com/NVIDIA/apex − We use GermanBERT as initial model which has 10% sentence piece tokens non-allocated − We insert the top 3000 most important sentence pieces of our corpus − And train on… − 1,2 GB of raw domain specific text
  • 16. Evaluation 19.11.2020 Page 15AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann − We evaluate on 2 domain specific classification use cases − Each task represents a subdomain: ▪ Service & Support (dataset: n ~ 1100) ▪ Legal & Professional (dataset: n ~ 350) − Competing Benchmarks are ▪ GermanBERT (non-adapted) ▪ A self-trained fasttext model − High variance in the evaluation runs: at least 25 runs − Data Augmentation ▪ 4-Fold Backtranslation with fairseq‘s WMT-19 Model
  • 17. Proof of concept: Key insights 19.11.2020 Page 16AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann ➢ Does incorporating our domain data improve the results of our downstream tasks? Details − Magnitude of improvement on par with previous results in literature (~2% p. f1-macro improvement) for one use case (n~1100) and vastly exceeding on the other (n~350) − BERT-based approaches perform way better than fasttext − Corpus composition is essential (especially if you have subdomains) − There is not one language model to solve all tasks − MLM Head (much) more important than NSP Head f1-macro scores of evaluation tasks adapted baseline adapted baseline service & support legal & professional
  • 18. Proof of concept: Key insights 19.11.2020 Page 17AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann ➢ How to set up a domain specific corpus? Answers − Don‘t create one big file, keep it modular − Split corpus by subdomains, document type, etc. − Updating/Versioning of corpus needed as completely new topics may arise (e.g. corona) − Common preprocessing logic using adapters for different data sources − Sentence splitting is one of the hardest but not the most important preprocessing step
  • 19. Proof of concept: Key insights 19.11.2020 Page 18AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann ➢ How mature are current open-source frameworks for NLP transfer learning? Answers − All frameworks seem to be pretty stable (e.g. almost no breaking changes or rough edges) − But keep in mind: There is a trade-off between rapid progress and stability …and progress is very fast − Biggest pain are still varying environments (e.g. linux vs. windows) − if it‘s broken, you can fix it
  • 20. Why Deep Learning? 19.11.2020 Page 19AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann − Universal Approximation Theorem (Hornik 1991) − Availability of vast amounts of data − Availability of compute power (GPUs) − Obtain good results with ease in comparision to excessive feature engineering − SOTA-Results in most ML- disciplines https://www.slideshare.net/ExtractConf (Slide 30) by Andrew Ng
  • 21. Neural Network Building Blocks 19.11.2020 Page 20AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann http://matrixmultiplication.xyz/ Operating principle of neural networks visualised http://neuralnetworksanddeeplearning.com/chap4.html https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6?gi=b4754d64054 https://hackernoon.com/gradient-descent-aynk-7cbe95a778da https://www.asimovinstitute.org/neural-network-zoo/
  • 22. NLP Network Architecture 19.11.2020 Page 21AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann − Key idea: Using the same information multiple times by applying different transformations − Context: locality, attention A brief history of Deep Learning Network Architectures: − Convolutional Neural Networks (CNNs) (Krizhevsky 2012) − Recurrent Neural Networks (RNNs) (Mikolov 2010) − ResNet (He 2016), LSTMs (Merity 2017), Transformers (Vaswani 2017), ELMo (Peters 2018), ULMFit (Howard 2018), BERT (Devlin 2018), …
  • 23. BERT and its origin 19.11.2020 Page 22AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
  • 24. BERT Architecture 19.11.2020 Page 23AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann http://jalammar.github.io/illustrated-bert/
  • 25. Context in NLP: Attention 19.11.2020 Page 24AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann − Semantic meaning of words is context dependent − Calculate scores that depict the corresponding influence of surrounding words − Incorporate the meaning of surrounding words according to this score Created with bertviz: https://github.com/jessevig/bertviz.git
  • 26. Self-Attention 19.11.2020 Page 25AI Meetup Erlangen – Dr. Jonas Rende & Thomas Stadelmann http://jalammar.github.io/illustrated-transformer/ − 3 Transformations (Matrices) − Key: does the Query word’s meaning depend on me? − Query: does the Key word influence my meaning? − Value: word’s raw meaning
  • 27. References (1/3) ◼ Hornik, Kurt. 1991. Approximation capabilities of multilayer feedforward networks. In: Neural Networks 4. Jg., Nr. 2, S. 251-257. http://www.vision.jhu.edu/teaching/learning/deeplearning18/assets/Hornik-91.pdf ◼ Merity, Stephen; Keskar, Nitish Shirish; Socher, Richard. 2017. Regularizing and optimizing LSTM language models. arXiv preprint. https://arxiv.org/abs/1708.02182 ◼ Devlin, Jacob; Chang, Ming-Wie; Lee, Kenton; Toutanova, Kristina. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805 ◼ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia. 2017. Attention Is All You Need. https://arxiv.org/abs/1706.03762 ◼ Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365. https://arxiv.org/abs/1802.05365
  • 28. References (2/3) ◼ He, K., Zhang, X., Ren, S., & Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://arxiv.org/abs/1512.03385 ◼ Howard, Jeremy; Ruder, Sebastian. 2018. Universal Language Model Fine-tuning for Text Classification. https://arxiv.org/abs/1801.06146 ◼ Krizhevsky, A., Sutskever, I., & Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf ◼ Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. 2010. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association. http://www.fit.vutbr.cz/research/groups/speech/servite/2010/rnnlm_mikolov.pdf
  • 29. References (3/3) ◼ Allamar, Jay Visualizing NLP Blog http://jalammar.github.io/ ◼ Ruder, Sebastian. 2019. Neural Transfer Learning for Natural Language Processing. https://ruder.io/thesis/ ◼ Ruder, Sebastian. NLP Transfer Learning Blog. https://ruder.io/ ◼ Stanford NLP with Deep Learning Kurs ◼ fast.ai: Practical Deep Learning for Coders. ◼ fast.ai: NLP Kurs
  • 30. © DATEV eG, all rights reserved Dr. Jonas Rende Thomas Stadelmann DATEV eG Jonas.rende@datev.de Thomas.Stadelmann@datev.de
  • 31. To learn more about the meetup, click the Link https://www.meetup.com/Erlangen-Artificial-Intelligence-Machine-Learning-Meetup Erlangen Artificial Intelligence & Machine Learning Meetup presents