1. A presentation on
End to End Memory Networks (MemN2N)
Slides: 26
Time: 15 minutes
IE 594 Data Science 2
University of Illinois at Chicago, February 2017
Under the guidance of,
Prof. Dr. Ashkan Sharabiani
By,
Ashish Menkudale
2. The kitchen is north of the hallway.
The bathroom is west of the bedroom.
The den is east of the hallway.
The office is south of the bedroom.
How do you go from den to kitchen?
2
3. The kitchen is north of the hallway.
The bathroom is west of the bedroom.
The den is east of the hallway.
The office is south of the bedroom.
How do you go from den to kitchen?
Kitchen
Hallway
Bathroom Bedroom
Den
Office
West, North.
West
North
3
4. Brian is frog.
Lily is grey.
Brian is yellow.
Julius is green.
What color is Greg?
Greg is frog.
4
5. Brian is frog.
Lily is grey.
Brian is yellow.
Julius is green.
What color is Greg?
Greg is frog.
Yellow.
5
7. Warren Sturgis
McCulloch.
Computational
model for neural
networks
1898-1969
Memory Networks
• Memory network with large external memory.
required for low level tasks like object recognition.
• Writes everything to the memory, but reads only relative information.
• Attempts to add long term memory component to make it more like artificial intelligence.
• Two types:
• Strongly supervised memory network: Hard addressing.
• Weekly supervised memory network: Soft addressing.
• Hard addressing: max of the inner product between the internal state and memory contents.
Mary is in garden.
John is in office. Q: Where is John?
Bob is in kitchen.
Walter Pitts.
Computational
model for
neural networks
1923-1969
7
8. Memory Vectors
Example: Constructing memory vectors with bag of words (BoW)
Embed each word
Sum embedding vectors
“Sam drops apple” 𝑉 𝑆𝑎𝑚 + 𝑉 𝑑𝑟𝑜𝑝𝑠 + 𝑉apple = 𝑚
Embedding vectors
memory vector
Example: Temporal structure – special words for time and include them in bag of words.
1. Sam moved to garden
2. Sam moved to kitchen.
3. Sam drops apple.
𝑉 𝑆𝑎𝑚 + 𝑉 𝑑𝑟𝑜𝑝𝑠 + 𝑉apple + 𝑉time = 𝑚
Time EmbeddingTime Stamp
8
9. Bob is in kitchen. Mary is in garden. John is in office. Where is John?
Embed Embed Embed Embed
X X X
Max
Internal State vector
John is in office
Embed +
Decoder
Office
Output
Memory Controller
Memory Networks
Input
9
10. Issues with Memory Network
• Requires explicit supervision of attention during training.
Need to say which memory the model should use.
• Need a model that just requires supervision at output.
No supervision of attention required.
• Only feasible for simple tasks.
Severely limits application of model.
10
11. End-To-End Memory Networks
• Soft attention version of MemN2N.
• Flexible read-only memory.
• Multiple memory lookups (hops).
• Can consider multiple memory before deciding output.
• More reasoning power.
• End-to-end training.
• Only needs final output for training.
• Simple back-propagation.
11
Sainbayar
Sukhbaatar
Arthur
Szlam
Jason
Weston
Rob
Fergus
12. Tanh / ReLU
Dot product
Softmax
Weighted Sum
Memory
content
Sum
Linear
State
State
Memory module
Output Target
Loss
Function
Input
Controller
module
E.g. RNN
MemN2N architecture
12
13. MemN2N in action : Single memory lookup
Sentences {Xi}
Softmax
Question q
Embedding BInner product
Embedding A
Embedding C
Probability
Weighted Sum
∑
InputsWeightsOutput
O
u
Weight
Softmax
Predicted
Answer
Mary is in garden.
John is in office.
Bob is in kitchen.
Where is John?
Office
Training: estimate embedding matrices A, B & C and output matrix W.
13
15. Components
15
I (Input): No conversion keep original text X.
G (Generalization): Stores I (X) in next available memory slot.
O (Output): Loops over all memories.
Find best match of 𝑚i with X.
Find best match of 𝑚j with (𝑚i , X)
Can be extended to multiple number of hops.
R (Response) : Ranks all words in dictionary given o and returns best single word.
Infact, RNN can be used here for better sentence correction.
16. Weight Tying
16
Weight tying : Indicates how weight vectors are multiplied to input and output
component.
Two Methods:
Adjacent:
Similar to stack layers
Output embedding of one layer are input embedding of the next layer.
Layer wise:
Input embedding remains the same for every layer in architecture.
17. Scoring function
17
Question : Answers are mapped to story using word embedding.
Word Embedding : Maps different words in low dimensional vector space with advantage to
calculate distance between word vectors.
Allow us to find similarity score between different sentence to understand
maximum correlation between them.
Match (‘Where is football?’, ‘John picked up the football’).
qTUTUd : This model is default word embedding used in memory networks.
q – Question.
U – matrix by which word embedding are obtained.
d – Answer.
18. Model Selection
18
Model Selection: Determines how to model story, questions and answer vectors for
word embedding.
Two possible approach:
Bag of words model:
Considers each word in a sentence.
Embeds each word and sums resulting vector.
Does not take into account context for each word.
Position Encoding:
Considers position/context of sentence/words.
Takes care of preceding and forwarding words.
Maps it to low dimensional vector space.
Model Refining
Addition of noise.
increasing training dataset.
19. Decisions for Configuration
19
• Number of hops
• Number of epochs
• Embedding size
• Training dataset
• Validation dataset
• Model selection
• Weight tying
20. RNN viewpoint of MemN2N
Plain RNN Memory Network
RNN
Input Sequence
Memory
RNN
All Input
Selected input Addressing signal
Inputs are fed to RNN one-by-one in order. RNN has only one
chance to look at a certain input symbol.
Place all inputs in the memory. Let the model decide which
part it reads next.
20
21. • More generic input format
• Any set of vectors can be input
• Each vector can be
o BOW of symbols (including location)
o Image feature + feature position
• Location can be 1D, 2D, …
• Variable size
Advantages of MemN2N over RNN
• Out-of-order access to input data
• Less distracted by unimportant inputs
• Longer term memorization
• No vanishing or exploding gradient problems
21
22. bAbi Project: Task CAtegories
Training dataset: 1000 questions for each tasks. Testing dataset: 1000 questions for each tasks.
25. 1. GitHub project archives: https://github.com/vinhkhuc/MemN2N-babi-python
2. https://www.msri.org/workshops/796/schedules/20462/documents/2704/assets/24734
3. Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes: https://arxiv.org/pdf/1607.00036.pdf
4. bAbi answers: https://arxiv.org/pdf/1502.05698.pdf
5. Memory Networks by Microsoft research: https://www.youtube.com/watch?v=ZwvWY9Yy76Q&t=1s
6. Memory Networks (Jenil Shah): https://www.youtube.com/watch?v=BN7Kp0JD04o
7. N gram – SVM – generative models difference. http://stackoverflow.com/questions/20315897/n-grams-vs-other-
classifiers-in-text-categorization
8. Paper on results for bAbi tasks by Facebook AI team: https://papers.nips.cc/paper/5846-end-to-end-memory-
networks.pdf
9. Towards AI-complete question answering : a set of prerequisite toy tasks https://arxiv.org/pdf/1502.05698.pdf
25
References