SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Explaining Character-Aware Neural
Networks for Word-Level Prediction
Frederic Godin, Kris Demuynck, Joni Dambre, Wesley Deneve and Thomas Demeester
Department of Electronics and Information Systems
Ghent University, Belgium
Do They Discover Linguistic Rules?
Introduction
2
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Example: Rule-based tagger for PoS tagging
Brill (1994)’s transformation-based error-driven tagger
3
Template
Change the most-likely tag X to
Y if the last (1,2,3,4) characters
of the word are x
Rule
Change the tag common noun to
plural common noun if the word has
suffix -s
Easily interpretable
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Interpretability in NLP used to be easy
Rule-based/Tree-based models
Shallow statistical models (E.g., Logistic regression, CRF)
4
Very transparent: follow the trace
Essentially: weight + feature
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Current NLP interpretability...
5
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Our proposed method
6
We present contextual decomposition (CD) for CNNs
- Extends CD for LSTMs (Murdoch et al. 2018)
- White box approach to interpretability
We trace back morphological tagging decisions to the
character-level
- Which characters are important?
- Same patterns as linguistically known?
- Difference CNN and BiLSTM?
Contextual decomposition
for CNNs
7
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition
Idea: every output value can be “decomposed” in
- Relevant contributions originating from the input we are interested in
(E.g., some characters)
- Irrelevant contributions originating from all the other inputs (E.g., all
the other characters in a word)
8
CNNeconomicas plural
economicas
economicas
economicas
economicas
Relevant
relevant irrelevantrelevant
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs
Three main components of CNN
̶ Convolution
̶ Activation function
̶ Max-over-time pooling
Classification layer
9
^ e c o n o m i c a s $
...
Max over time
FC
Gender = feminine
CNN filters
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Convolution
Output of single convolutional filter at timestep t:
10
Relevant Irrelevant
n = filter size
S = Indexes of of relevant inputs
Wi = i-th column of filter W
^ e c o n o m i c a s $
Indexes: 8, 9, 10, 11
9 8, 10, 11
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Activation func.
Goal: Linearize activation function to be able to split output.
Linearization formula:
11
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Max pooling
Max-over-time pooling:
Determine t first and just copy that split:
12
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition of classification layer
Probability of certain class:
13
We simplify:
Relevant contribution to class j
Experiments
14
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Task
15
Morphological tagging: predict morphological labels for a word (gender,
tense, singular/plural,..)
economicas
For a subset of words, we have manual segmentations and
annotations
lemma=económico
gender=feminine
number=plural
economicas
lemma=económico
gender=feminine
number=pluraleconomicas
economicas
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Datasets
Universal dependencies 1.4:
̶ Finnish, Spanish and Swedish
̶ Select all unique words and their morphological labels
Manual annotations and segmentations of 300 test set words
16
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Architectures: CNN vs BiLSTM
17
^ e c o n o m i c a s $
FC
Gender = feminine
^ e c o n o m i c a s $
...
Max over time
FC
Gender = feminine
CNN filters
CNN BiLSTM
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Do the NN patterns follow manual segmentations?
18
All = every possible combination of characters
Cons = all consecutive character n-grams
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Visualizing contributions: 1 character
19
Spanish
^ g r a t u i t a $
Label: Gender=feminine
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Visualizing contributions: 2 characters (Swedish)
20
CNN BiLSTM
^ k r o n o r $ ^ k r o n o r $
^
k
r
o
n
o
r
$
^
k
r
o
n
o
r
$
Label: number=plural
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Most important patterns per language: Spanish
21
Linguistic rules for feminine gender:
- Feminine adjectives often end with “a”
- Nouns ending with “dad” or “ión” are often feminine
Found pattern:
- “a” is a very important pattern
- “dad” and “sió” are import trigrams
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Most important patterns per language: Swedish
22
Linguistic rules for plural form:
- 5 suffixes: or, ar, (e)r, n, and no ending
“na” is definite article in plural forms
Found pattern:
- “or” and “ar”
- But also “na” and “rn”
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Interactions/compositions of patterns
How do positive and negative patterns interact?
Consider the Spanish verb “gusta”
- Gender=Not Applicable (NA)
- We know that suffix “a” is indicator for gender=feminine
23
Consider most positive/negative set of characters per class:
The stem provides counterevidence for gender=feminine
Conclusion
24
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Summary
We introduced a white box approach to understanding CNNs
We showed that:
̶ BiLSTMs and CNNs sometimes choose different patterns
̶ The learned patterns coincide with our linguistic knowledge
̶ Sometimes other plausible patterns are used
25
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Questions?
26
Fréderic Godin
Ph.D. Researcher Deep Learning and NLP
IDLab
E frederic.godin@ugent.be
@frederic_godin
www.fredericgodin.com
idlab.technology / idlab.ugent.be

Weitere ähnliche Inhalte

Was ist angesagt?

Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Jie Bao
 

Was ist angesagt? (9)

NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Language Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory StudyLanguage Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory Study
 

Ähnlich wie Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...
Ana Luísa Pinho
 
Segmenting dna sequence into words
Segmenting dna sequence into wordsSegmenting dna sequence into words
Segmenting dna sequence into words
Liang Wang
 

Ähnlich wie Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules? (20)

CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semantics
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data Hub
 
Zadeh Bisc2004
Zadeh Bisc2004Zadeh Bisc2004
Zadeh Bisc2004
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Basics of coding theory
Basics of coding theoryBasics of coding theory
Basics of coding theory
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
 
Segmenting dna sequence into words
Segmenting dna sequence into wordsSegmenting dna sequence into words
Segmenting dna sequence into words
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language Processing
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 

Kürzlich hochgeladen

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Kürzlich hochgeladen (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

  • 1. Explaining Character-Aware Neural Networks for Word-Level Prediction Frederic Godin, Kris Demuynck, Joni Dambre, Wesley Deneve and Thomas Demeester Department of Electronics and Information Systems Ghent University, Belgium Do They Discover Linguistic Rules?
  • 3. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Example: Rule-based tagger for PoS tagging Brill (1994)’s transformation-based error-driven tagger 3 Template Change the most-likely tag X to Y if the last (1,2,3,4) characters of the word are x Rule Change the tag common noun to plural common noun if the word has suffix -s Easily interpretable
  • 4. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Interpretability in NLP used to be easy Rule-based/Tree-based models Shallow statistical models (E.g., Logistic regression, CRF) 4 Very transparent: follow the trace Essentially: weight + feature
  • 5. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Current NLP interpretability... 5
  • 6. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Our proposed method 6 We present contextual decomposition (CD) for CNNs - Extends CD for LSTMs (Murdoch et al. 2018) - White box approach to interpretability We trace back morphological tagging decisions to the character-level - Which characters are important? - Same patterns as linguistically known? - Difference CNN and BiLSTM?
  • 8. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition Idea: every output value can be “decomposed” in - Relevant contributions originating from the input we are interested in (E.g., some characters) - Irrelevant contributions originating from all the other inputs (E.g., all the other characters in a word) 8 CNNeconomicas plural economicas economicas economicas economicas Relevant relevant irrelevantrelevant
  • 9. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs Three main components of CNN ̶ Convolution ̶ Activation function ̶ Max-over-time pooling Classification layer 9 ^ e c o n o m i c a s $ ... Max over time FC Gender = feminine CNN filters
  • 10. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Convolution Output of single convolutional filter at timestep t: 10 Relevant Irrelevant n = filter size S = Indexes of of relevant inputs Wi = i-th column of filter W ^ e c o n o m i c a s $ Indexes: 8, 9, 10, 11 9 8, 10, 11
  • 11. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Activation func. Goal: Linearize activation function to be able to split output. Linearization formula: 11
  • 12. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Max pooling Max-over-time pooling: Determine t first and just copy that split: 12
  • 13. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition of classification layer Probability of certain class: 13 We simplify: Relevant contribution to class j
  • 15. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Task 15 Morphological tagging: predict morphological labels for a word (gender, tense, singular/plural,..) economicas For a subset of words, we have manual segmentations and annotations lemma=económico gender=feminine number=plural economicas lemma=económico gender=feminine number=pluraleconomicas economicas
  • 16. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Datasets Universal dependencies 1.4: ̶ Finnish, Spanish and Swedish ̶ Select all unique words and their morphological labels Manual annotations and segmentations of 300 test set words 16
  • 17. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Architectures: CNN vs BiLSTM 17 ^ e c o n o m i c a s $ FC Gender = feminine ^ e c o n o m i c a s $ ... Max over time FC Gender = feminine CNN filters CNN BiLSTM
  • 18. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Do the NN patterns follow manual segmentations? 18 All = every possible combination of characters Cons = all consecutive character n-grams
  • 19. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Visualizing contributions: 1 character 19 Spanish ^ g r a t u i t a $ Label: Gender=feminine
  • 20. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Visualizing contributions: 2 characters (Swedish) 20 CNN BiLSTM ^ k r o n o r $ ^ k r o n o r $ ^ k r o n o r $ ^ k r o n o r $ Label: number=plural
  • 21. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Most important patterns per language: Spanish 21 Linguistic rules for feminine gender: - Feminine adjectives often end with “a” - Nouns ending with “dad” or “ión” are often feminine Found pattern: - “a” is a very important pattern - “dad” and “sió” are import trigrams
  • 22. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Most important patterns per language: Swedish 22 Linguistic rules for plural form: - 5 suffixes: or, ar, (e)r, n, and no ending “na” is definite article in plural forms Found pattern: - “or” and “ar” - But also “na” and “rn”
  • 23. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Interactions/compositions of patterns How do positive and negative patterns interact? Consider the Spanish verb “gusta” - Gender=Not Applicable (NA) - We know that suffix “a” is indicator for gender=feminine 23 Consider most positive/negative set of characters per class: The stem provides counterevidence for gender=feminine
  • 25. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Summary We introduced a white box approach to understanding CNNs We showed that: ̶ BiLSTMs and CNNs sometimes choose different patterns ̶ The learned patterns coincide with our linguistic knowledge ̶ Sometimes other plausible patterns are used 25
  • 26. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Questions? 26
  • 27. Fréderic Godin Ph.D. Researcher Deep Learning and NLP IDLab E frederic.godin@ugent.be @frederic_godin www.fredericgodin.com idlab.technology / idlab.ugent.be