SlideShare ist ein Scribd-Unternehmen logo
1 von 49
The brain’s guide to dealing with context in
language understanding
Ted Willke, Javier Turek, and Vy Vo
Intel Labs
November 8th, 2019
Alex Huth and Shailee Jain
UT-Austin
Natural Language Understanding
!2
A form of natural language processing that deals with machine
reading comprehension.
Example:
“The problem to be solved is: Tom has twice as
many fish as Mary has guppies. If Mary has 3
guppies, what is the number of fish Tom has?”
(D.G. Bobrow, 1964)
A 1960’s example
!3
“The problem to be
solved is: If the
number of customers
Tom gets is twice the
square of 20 percent
of the number of
advertisements he
runs, and the number
of advertisements he
runs is 45, what is
the number of
customers Tom gets?”
Input Text
“The number (of/op)
customers Tom (gets/
verb) is 2 (times/op 1)
the (square/op 1) of 20
(percent/op 2) (of/op)
the number (of/op)
advertisements (he/pro)
runs (period/dlm) The
number (of/op)
advertisements (he/pro)
runs is 45 (period/dlm)
(what/qword) is the
number (of/op)
customers Tom (gets/
verb) (qmark/DLM)”
NLP
(Lisp example)
Canonical sentences, with mark-up
NLU
Answer
“The number of
customers Tom
gets is 162”
NLU derives meaning from

the lexicon, grammar and
context.
E.g., what is the meaning of

“(he/pro) runs” here?
(D.G. Bobrow, 1964)
Applications of NLU
!4
Super-valuable stuff!
Machine translation Question answering
(The Stanford Question Answering Dataset 2.0)
Machine reasoning
(Arista, Allen AI)(Google Translate)
(Even visual!)
(Zhu et al., 2015)
The importance of context in language understanding
•Retaining information about
narratives is key to effective
comprehension.
•This information must be:
•Represented
•Organized
•Effectively applied
https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Economic_inequality.html
The brain is great at this. What can it teach us?
Key questions for this talk
How does the brain organize and represent narratives?
What can deep learning models tell us about the brain?
Are the more effective ones more brain-like?
How well do deep learning models deal with narrative context?
Key questions for this talk
How does the brain organize and represent narratives?
What can deep learning models tell us about the brain?
Are the more effective ones more brain-like?
How well do deep learning models deal with narrative context?
The brain’s organization
!8
In order to understand language, the human brain explicitly
represents information at a hierarchy of different timescales
across different brain areas
•Early stages: auditory processing in
milliseconds to words at sub-second
Representations at long timescales shown to exist in separate
brain areas but little is known about their structure and format.
(Lerner et al., 2011)
•Later stages: derive meaning by
combining information across minutes
and hours
Key questions for this talk
How does the brain organize and represent narratives?
How well do deep learning models deal with narrative context?
What can deep learning models tell us about the brain?
Are the more effective ones more brain-like?
A look at recent state-of-the-art models
Recurrent Neural Networks
Temporal Convolutional Networks
Transformer Networks
Evaluating the performance of these models
•Sequence modeling
Given an input sequence x0, . . . , xT
and desired corresponding outputs (predictions) y0, . . . , yT
we wish to learn a function ̂y0, . . . , ̂yT = f(x0, . . . , xT)
where depends only on past inputs (causal).x0, . . . , xtyt
Use as a proxy to study the performance of backbone models for NLU
E.g., predicting next character

or word
•Sequence modeling applied to language is language modeling
•Self-supervised, basis for many other NLP tasks, and exploits context for prediction
Example sequence modeling tasks
•Add: Add two numbers that are marked in a long sequence, and output
the sum after a delay
•Copy: Copy a short sequence that appears much earlier in a long
sequence
•Classify (MNIST): Given a sequence of pixel values from MNIST
(784x1), predict the corresponding digit (0-9)
•Predict word (LAMBADA): Given a dataset of 10K passages from
novels, with average context of 4.6 sentences, predict the last word of a
target sentence
A look at recent state-of-the-art models
Recurrent Neural Networks
Temporal Convolutional Networks
Transformer Networks
Using recurrence to solve the problem
!14
Can process a sequence of vectors by applying

a recurrence formula at each time step:
xt
ht = fW(ht−1, xt)
new state some function

with params, W
old state input vector at time t
The same function and parameters are used at every time step!
Example:
Character-level
language model
!15
Predicting the next
character…
Vocabulary:
[h,e,l,o]
Training sequence:
“hello”
(Example adapted from Stanford’s excellent CS231n course. Thank you Fei-Fei Li, Justin Johnson, and Serena Young!)
Example:
Character-level
language model

sampling
!16
Vocabulary:
[h,e,l,o]
At test time,
sample characters

one at a time and

feed back to model
- Vanishing and exploding gradient problem
- Smaller weight given to long-term interactions
Dealing with longer timescales
!17
• Learning long-term dependencies is difficult
- Little training success for sequences > 10-20 in
length
• Solution: Gated RNNs
- Control over timescale of integration of feedback
- Eliminates repeated matrix multiplies
singular value < 1 singular value > 1
One possible solution: LSTM
• Long Short-Term Memory
!18
- Provides uninterrupted gradient flow
- Solves the problem at the expense of more
parameters
• As revolutionary for sequential processing as
CNNs were for spatial processing
- Toy problems: long sequence recall, long-distance
interactions (math), classification and ordering of
widely-separated symbols, noisy inputs, etc.
- Real applications: natural machine translation, text-to-
speech, music and handwriting generation
!19
Multilayer RNNs
depth
time
hl
t = tanh Wl
(
hl
t−1
hl−1
t )
h ∈ ℝn
Wl
= [n × 2n]
Writing Shakespeare
!20
Multi-layer RNN:

3-layers with 512 hidden nodes
…
…
…
…
…
depth
time
!21
At first:
and further…
train further…
and further….
(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
!22
After a few hours of training:
(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
!23(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
The Stacks Project: Open source textbook on algebraic geometry
•Latex source!
•455910 lines of code
Can RNNs learn complex

syntactic structures?
!24(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Algebraic Geometry (Latex)
Generates nearly compilable Latex!
!25(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Algebraic Geometry (Latex)
!26(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Algebraic Geometry (Latex)
Too long term of a dependency?
Never closes!
!27(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Code generation?
•Concatenated into a

giant file (474 MB of C)
•10 million parameter RNN
!28(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
•Concatenated into a

giant file (474 MB of C)
•10 million parameter RNN
Comments here and there
Proper syntax for strings and pointers
Correctly learns to use brackets
Often uses undefined variables!
Declares variables it never uses!
!29(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
Within scope
But vacuous!
Another problem with long-term dependencies
A look at recent state-of-the-art models
Recurrent Neural Networks
Temporal Convolutional Networks
Transformer Networks
!31
Temporal Convolutional Neural Networks
(Bai et al., 2018)
TCN = 1D FCN + causal convolution
Benefits:
• Parallelism!
• Flexible receptive field size
• Stable gradients
• Low memory for training
• Variable input lengths
Details:
• Uses dilated convolutions for exponential receptive field vs depth
• Effective history is and , where is the layer number
• Uses residuals, ReLUs, and weight normalization
• Spatial dropout
k(1 − d) d = 𝒪(2i
) i
!32
TCNs versus LSTMs
(Bai et al., 2018)
The ‘unlimited memory’ of LSTMs is quite limited
compared to the expansive receptive field of the
generic TCN.
Copy memory task (last 10 elements evaluated)
A look at recent state-of-the-art models
Recurrent Neural Networks
Temporal Convolutional Networks
Transformer Networks
!34
Transformer Networks
(Vaswani et al., 2017)
Relies entirely on attention to compute

representations!
Details:
• Encoder-decoder structure and auto-regressive model
• Multi-headed self-attention mechanisms
• FC feed forward networks applied to each position separately and identically
• Input and output embeddings used
• No recurrence and no convolution, so must inject positional encodings
Benefits:
• Low computational complexity
• Highly-parallelizable computation
• Low ‘path length’ for long-term

dependencies
Attention(Q, K, V) = softmax
(
QKT
dk )
V
Decoder attends

to all positions

in input seq
Encoder has

self-attention

for each layer
Decoder also has

self-attention masked

for causality
!35
Why self-attention?
(Vaswani et al., 2017)
is the sequence length, is the representation dimension, is the kernel size

for convolutions, and is the neighborhood size in restricted attention.
n d k
r
It’s not only the length of context that matters, but also the ease by which it
can be accessed.
longer path

lengths
d > n
more

ops
!36
Transformers vs TCNs
(Vaswani et al., 2017)
Google’s TCN for NMT
Even with a relative-limited context
(e.g., 128), Transformers win.
FAIR’s TCN with attention
Machine Translation
(Dai et al., 2019)
But with a segment-level recurrence mechanism,
it is freed of fixed context lengths and it soars.
Transformer-XL
WikiText-103 word-level sequence modeling
!37
Transformer-XL
(Dai et al., 2019)
Continued gains in performance to 1000+ contexts
Total hallucination!

(but nice generalization)
Key questions for this talk
How does the brain organize and represent narratives?
How well do deep learning models deal with narrative context?
What can deep learning models tell us about the brain?
Are the more effective ones more brain-like?
Are deep neural networks organized by timescale?
!39
=
?
Neural Network
Neural Network
Neural Network
The boy went out to fly an _____
airplane
short
intermediate
long
timescale
The methodology
!40
Story Neural models Neural activations
Goal: Determine how well NN layer activations predict fMRI data (regression).
Predicting brain activity with encoding models
!41
Eickenberg et al., NeuroImage 2017
Kell et al., Neuron 2018
Relative predictive power of models
!42(Jain et al., 2018)
LSTM vs Embedding
(Jain et al., unpublished)
Transformer vs Embedding
Layer-specific correlations for LSTM
!43
(Jain et al., 2018)
Low-level

speech processing

region
Higher

semantic region
white = no layer preference
Open questions
!44
Why do LSTMs perform so poorly?
Not all that predictive.
Not exhibiting layer-specific correlations.
Do TCNs and Transformers exhibit multi-timescale characteristics?
Layer-specific correlations for Transformer
!45
layers
early late
(Jain et al., unpublished)
Yes!
Layer-specific correlations for Transformer
!46
(Jain et al., unpublished)TCNs look similar.
Encoding model performance for Transformer
!47
• Averaged across 3 subjects
• Contextual models from all layers
outperform embedding
• Increasing context length (to a
point) helps all layers
• Long context representations are
still missing information!
TCNs exhibit similar characteristics but do not seem to learn the same representations.
(Jain et al., unpublished)
Summary and Challenges
!48
•The brain’s language pathway is organized into a multi-timescale hierarchy, making it
very effective at utilizing context
•Language models are catching up, with Transformer-XL in the lead
•TCNs and Transformers indeed have explicit multi-timescale hierarchies
- Last layers have lower predictive performance, why?
- How to get more out of context at longer timescales?
- Lack of clear timescales in RNNs should lead to a revisiting of their depth
characteristics. (E.g., see Turek et al. 2019, https://arxiv.org/abs/1909.00021)
•More study needed on representations
- What specific information is captured in representations across the cortex?
- Are the same representations found across deep learning architectures?
!49
Thank you!
ted.willke@intel.com
NeurIPS Workshop on
Context and Compositionality in Biological and Artificial Neural Systems
Saturday, December 14th, 2019
https://context-composition.github.io/

Weitere ähnliche Inhalte

Was ist angesagt?

Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoLidia Pivovarova
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationXiaohu ZHU
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Universitat Politècnica de Catalunya
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM健程 杨
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon
 

Was ist angesagt? (20)

Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
Neural Machine Translation (D2L10 Insight@DCU Machine Learning Workshop 2017)
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 

Ähnlich wie Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...Matīss ‎‎‎‎‎‎‎  
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxSan Kim
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesYannis Flet-Berliac
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
Molecular autoencoder
Molecular autoencoderMolecular autoencoder
Molecular autoencoderDan Elton
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 

Ähnlich wie Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding (20)

CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Dcnn for text
Dcnn for textDcnn for text
Dcnn for text
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language Services
 
rnn_review.10.pdf
rnn_review.10.pdfrnn_review.10.pdf
rnn_review.10.pdf
 
Neural Networks-1
Neural Networks-1Neural Networks-1
Neural Networks-1
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
Molecular autoencoder
Molecular autoencoderMolecular autoencoder
Molecular autoencoder
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...MLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

  • 1. The brain’s guide to dealing with context in language understanding Ted Willke, Javier Turek, and Vy Vo Intel Labs November 8th, 2019 Alex Huth and Shailee Jain UT-Austin
  • 2. Natural Language Understanding !2 A form of natural language processing that deals with machine reading comprehension. Example: “The problem to be solved is: Tom has twice as many fish as Mary has guppies. If Mary has 3 guppies, what is the number of fish Tom has?” (D.G. Bobrow, 1964)
  • 3. A 1960’s example !3 “The problem to be solved is: If the number of customers Tom gets is twice the square of 20 percent of the number of advertisements he runs, and the number of advertisements he runs is 45, what is the number of customers Tom gets?” Input Text “The number (of/op) customers Tom (gets/ verb) is 2 (times/op 1) the (square/op 1) of 20 (percent/op 2) (of/op) the number (of/op) advertisements (he/pro) runs (period/dlm) The number (of/op) advertisements (he/pro) runs is 45 (period/dlm) (what/qword) is the number (of/op) customers Tom (gets/ verb) (qmark/DLM)” NLP (Lisp example) Canonical sentences, with mark-up NLU Answer “The number of customers Tom gets is 162” NLU derives meaning from
 the lexicon, grammar and context. E.g., what is the meaning of
 “(he/pro) runs” here? (D.G. Bobrow, 1964)
  • 4. Applications of NLU !4 Super-valuable stuff! Machine translation Question answering (The Stanford Question Answering Dataset 2.0) Machine reasoning (Arista, Allen AI)(Google Translate) (Even visual!) (Zhu et al., 2015)
  • 5. The importance of context in language understanding •Retaining information about narratives is key to effective comprehension. •This information must be: •Represented •Organized •Effectively applied https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Economic_inequality.html The brain is great at this. What can it teach us?
  • 6. Key questions for this talk How does the brain organize and represent narratives? What can deep learning models tell us about the brain? Are the more effective ones more brain-like? How well do deep learning models deal with narrative context?
  • 7. Key questions for this talk How does the brain organize and represent narratives? What can deep learning models tell us about the brain? Are the more effective ones more brain-like? How well do deep learning models deal with narrative context?
  • 8. The brain’s organization !8 In order to understand language, the human brain explicitly represents information at a hierarchy of different timescales across different brain areas •Early stages: auditory processing in milliseconds to words at sub-second Representations at long timescales shown to exist in separate brain areas but little is known about their structure and format. (Lerner et al., 2011) •Later stages: derive meaning by combining information across minutes and hours
  • 9. Key questions for this talk How does the brain organize and represent narratives? How well do deep learning models deal with narrative context? What can deep learning models tell us about the brain? Are the more effective ones more brain-like?
  • 10. A look at recent state-of-the-art models Recurrent Neural Networks Temporal Convolutional Networks Transformer Networks
  • 11. Evaluating the performance of these models •Sequence modeling Given an input sequence x0, . . . , xT and desired corresponding outputs (predictions) y0, . . . , yT we wish to learn a function ̂y0, . . . , ̂yT = f(x0, . . . , xT) where depends only on past inputs (causal).x0, . . . , xtyt Use as a proxy to study the performance of backbone models for NLU E.g., predicting next character
 or word •Sequence modeling applied to language is language modeling •Self-supervised, basis for many other NLP tasks, and exploits context for prediction
  • 12. Example sequence modeling tasks •Add: Add two numbers that are marked in a long sequence, and output the sum after a delay •Copy: Copy a short sequence that appears much earlier in a long sequence •Classify (MNIST): Given a sequence of pixel values from MNIST (784x1), predict the corresponding digit (0-9) •Predict word (LAMBADA): Given a dataset of 10K passages from novels, with average context of 4.6 sentences, predict the last word of a target sentence
  • 13. A look at recent state-of-the-art models Recurrent Neural Networks Temporal Convolutional Networks Transformer Networks
  • 14. Using recurrence to solve the problem !14 Can process a sequence of vectors by applying
 a recurrence formula at each time step: xt ht = fW(ht−1, xt) new state some function
 with params, W old state input vector at time t The same function and parameters are used at every time step!
  • 15. Example: Character-level language model !15 Predicting the next character… Vocabulary: [h,e,l,o] Training sequence: “hello” (Example adapted from Stanford’s excellent CS231n course. Thank you Fei-Fei Li, Justin Johnson, and Serena Young!)
  • 16. Example: Character-level language model
 sampling !16 Vocabulary: [h,e,l,o] At test time, sample characters
 one at a time and
 feed back to model
  • 17. - Vanishing and exploding gradient problem - Smaller weight given to long-term interactions Dealing with longer timescales !17 • Learning long-term dependencies is difficult - Little training success for sequences > 10-20 in length • Solution: Gated RNNs - Control over timescale of integration of feedback - Eliminates repeated matrix multiplies singular value < 1 singular value > 1
  • 18. One possible solution: LSTM • Long Short-Term Memory !18 - Provides uninterrupted gradient flow - Solves the problem at the expense of more parameters • As revolutionary for sequential processing as CNNs were for spatial processing - Toy problems: long sequence recall, long-distance interactions (math), classification and ordering of widely-separated symbols, noisy inputs, etc. - Real applications: natural machine translation, text-to- speech, music and handwriting generation
  • 19. !19 Multilayer RNNs depth time hl t = tanh Wl ( hl t−1 hl−1 t ) h ∈ ℝn Wl = [n × 2n]
  • 20. Writing Shakespeare !20 Multi-layer RNN:
 3-layers with 512 hidden nodes … … … … … depth time
  • 21. !21 At first: and further… train further… and further…. (Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
  • 22. !22 After a few hours of training: (Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks)
  • 23. !23(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) The Stacks Project: Open source textbook on algebraic geometry •Latex source! •455910 lines of code Can RNNs learn complex
 syntactic structures?
  • 24. !24(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Algebraic Geometry (Latex) Generates nearly compilable Latex!
  • 25. !25(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Algebraic Geometry (Latex)
  • 26. !26(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Algebraic Geometry (Latex) Too long term of a dependency? Never closes!
  • 27. !27(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Code generation? •Concatenated into a
 giant file (474 MB of C) •10 million parameter RNN
  • 28. !28(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) •Concatenated into a
 giant file (474 MB of C) •10 million parameter RNN Comments here and there Proper syntax for strings and pointers Correctly learns to use brackets Often uses undefined variables! Declares variables it never uses!
  • 29. !29(Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks) Within scope But vacuous! Another problem with long-term dependencies
  • 30. A look at recent state-of-the-art models Recurrent Neural Networks Temporal Convolutional Networks Transformer Networks
  • 31. !31 Temporal Convolutional Neural Networks (Bai et al., 2018) TCN = 1D FCN + causal convolution Benefits: • Parallelism! • Flexible receptive field size • Stable gradients • Low memory for training • Variable input lengths Details: • Uses dilated convolutions for exponential receptive field vs depth • Effective history is and , where is the layer number • Uses residuals, ReLUs, and weight normalization • Spatial dropout k(1 − d) d = 𝒪(2i ) i
  • 32. !32 TCNs versus LSTMs (Bai et al., 2018) The ‘unlimited memory’ of LSTMs is quite limited compared to the expansive receptive field of the generic TCN. Copy memory task (last 10 elements evaluated)
  • 33. A look at recent state-of-the-art models Recurrent Neural Networks Temporal Convolutional Networks Transformer Networks
  • 34. !34 Transformer Networks (Vaswani et al., 2017) Relies entirely on attention to compute
 representations! Details: • Encoder-decoder structure and auto-regressive model • Multi-headed self-attention mechanisms • FC feed forward networks applied to each position separately and identically • Input and output embeddings used • No recurrence and no convolution, so must inject positional encodings Benefits: • Low computational complexity • Highly-parallelizable computation • Low ‘path length’ for long-term
 dependencies Attention(Q, K, V) = softmax ( QKT dk ) V Decoder attends
 to all positions
 in input seq Encoder has
 self-attention
 for each layer Decoder also has
 self-attention masked
 for causality
  • 35. !35 Why self-attention? (Vaswani et al., 2017) is the sequence length, is the representation dimension, is the kernel size
 for convolutions, and is the neighborhood size in restricted attention. n d k r It’s not only the length of context that matters, but also the ease by which it can be accessed. longer path
 lengths d > n more
 ops
  • 36. !36 Transformers vs TCNs (Vaswani et al., 2017) Google’s TCN for NMT Even with a relative-limited context (e.g., 128), Transformers win. FAIR’s TCN with attention Machine Translation (Dai et al., 2019) But with a segment-level recurrence mechanism, it is freed of fixed context lengths and it soars. Transformer-XL WikiText-103 word-level sequence modeling
  • 37. !37 Transformer-XL (Dai et al., 2019) Continued gains in performance to 1000+ contexts Total hallucination!
 (but nice generalization)
  • 38. Key questions for this talk How does the brain organize and represent narratives? How well do deep learning models deal with narrative context? What can deep learning models tell us about the brain? Are the more effective ones more brain-like?
  • 39. Are deep neural networks organized by timescale? !39 = ? Neural Network Neural Network Neural Network The boy went out to fly an _____ airplane short intermediate long timescale
  • 40. The methodology !40 Story Neural models Neural activations Goal: Determine how well NN layer activations predict fMRI data (regression).
  • 41. Predicting brain activity with encoding models !41 Eickenberg et al., NeuroImage 2017 Kell et al., Neuron 2018
  • 42. Relative predictive power of models !42(Jain et al., 2018) LSTM vs Embedding (Jain et al., unpublished) Transformer vs Embedding
  • 43. Layer-specific correlations for LSTM !43 (Jain et al., 2018) Low-level
 speech processing
 region Higher
 semantic region white = no layer preference
  • 44. Open questions !44 Why do LSTMs perform so poorly? Not all that predictive. Not exhibiting layer-specific correlations. Do TCNs and Transformers exhibit multi-timescale characteristics?
  • 45. Layer-specific correlations for Transformer !45 layers early late (Jain et al., unpublished) Yes!
  • 46. Layer-specific correlations for Transformer !46 (Jain et al., unpublished)TCNs look similar.
  • 47. Encoding model performance for Transformer !47 • Averaged across 3 subjects • Contextual models from all layers outperform embedding • Increasing context length (to a point) helps all layers • Long context representations are still missing information! TCNs exhibit similar characteristics but do not seem to learn the same representations. (Jain et al., unpublished)
  • 48. Summary and Challenges !48 •The brain’s language pathway is organized into a multi-timescale hierarchy, making it very effective at utilizing context •Language models are catching up, with Transformer-XL in the lead •TCNs and Transformers indeed have explicit multi-timescale hierarchies - Last layers have lower predictive performance, why? - How to get more out of context at longer timescales? - Lack of clear timescales in RNNs should lead to a revisiting of their depth characteristics. (E.g., see Turek et al. 2019, https://arxiv.org/abs/1909.00021) •More study needed on representations - What specific information is captured in representations across the cortex? - Are the same representations found across deep learning architectures?
  • 49. !49 Thank you! ted.willke@intel.com NeurIPS Workshop on Context and Compositionality in Biological and Artificial Neural Systems Saturday, December 14th, 2019 https://context-composition.github.io/