SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Question Image Co-attention by Low-Rank Bilinear
Model for Visual Question Answering
Jitendra Kumar Kushwaha
IIST, Thiruvananthapuram
Project Guide:
Dr. Sumitra S.
Associate Professor
Dept. of Mathematics, IIST
May 30, 2019
Jitendra Kumar Kushwaha (IIST) May 30, 2019 1 / 31
Overview
1 Introduction
2 Applications
3 Image Feature Extraction
4 Question Modeling
5 Joint Representation
Bilinear Model
Low-Rank Bilinear Model with Hadamard Product
6 Co-attention Mechanism
7 Attended visual and question feature
8 Answer Prediction
9 Results
10 Conclusion and Future Work
11 References
Jitendra Kumar Kushwaha (IIST) May 30, 2019 2 / 31
Introduction
Objective
The goal of this thesis is to develop a model that can incorporate
language and visual inputs and have their joint understanding.
The model takes as input an image and a natural language question
about the image and produces a natural language answer as the
output.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 3 / 31
Introduction
Motivation
In neural network models we don’t know if model is making sensible
prediction or giving random guess. Incorporating Attention
mechanism can gives us estimate of what the model learns.
Instead of considering only image attention, Co-Attention mechanism
allows us to consider image and question attention.
In the Co-attention mechanism, image guides in finding question
attention and vice-versa.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 4 / 31
Applications
Applications
Aid visually-impaired users.
Summarize the visual data for analysts.
VQA in medical domain.
Connection of vision and language.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 5 / 31
Image Feature Extraction
Image Feature Extraction
The image model uses a CNN to get representation of images.
CNN architectures are used to extract the image feature map V form
raw image I.
The image feature V = {v1, v2 . . . , vN}, where the vn is the feature
vector at spatial location n.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 6 / 31
Image Feature Extraction
Pretrained Models
The feature V = CNNvgg (I) is chosen from the last convolution layer,
which retains spatial information of original images.
A visual feature vector V of a rescaled image size of 3 × 448 × 448, is
an output of the last convolution layer of VGG-19 networks, whose
dimension is 512 × 14 × 14. Alternatively, ResNet-152 is used, whose
dimension is of 2048 × 14 × 14.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 7 / 31
Question Modeling
Question Modeling
There are three levels the representations of question:
word-level
phrase-level
sentence-level
The words in the question are converted into a 1-hot encoded vector,
where the size of the vector is the size of vocabulary with binary (0
and 1) entries.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 8 / 31
Question Modeling
Question Modeling
1-hot encoded vectors are again embedded into a vector space to get:
Qw = {qw
1 , qw
2 , . . . qw
T }.
The embedded word vectors represent the word-level feature of the
question.
Corresponding to every word in the question, there is a vector that
represents that word.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 9 / 31
Question Modeling
Phrase-level Feature
To compute the phrase-level feature, 1-D CNN can be applied on
word-level feature vector with help of 3 filters: unigram, bigram, and
trigram.
The working of 1-D CNN is the same as 2-D CNN.
To obtain phrase-level features, the max-pooling applied across all the
three filters at each word location as shown in equation:
qp
t = max ˆqp
1,t, ˆqp
2,t, ˆqp
3,t , t ∈ {1, 2, . . . , T}
These three filters capture the semantic meaning by grouping the
words known as phrase-level features.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 10 / 31
Question Modeling
Phrase-level Feature
Jitendra Kumar Kushwaha (IIST) May 30, 2019 11 / 31
Question Modeling
Question Modeling
The LSTM embeds the phrase-level feature qp
t into the sentence-level
feature.
Corresponding to every word in the question, there is a vector that
represents that word.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 12 / 31
Joint Representation
Bilinear Model
The bilinear model provides rich joint representation of two distinct
input features.
Bilinear model uses a quadratic expansion of linear transformation
considering every pair of features.
Ci =
N
j=1
M
k=1
wijkxj yk = XT
Wi Y
The joint embedding C captures the semantic concept of both input
features(X and Y ).
The number of weight parameters required for joint embedding of
vector size of size L is L × (N × M).
Consists of third order tensor which limiting the applicability to
computationally complex tasks.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 13 / 31
Joint Representation
Low-Rank Bilinear Model with Hadamard Product
Low-rank bilinear method is to reduce the rank of the weight matrix
Wi to have less number of parameters for regularization.
Ci = XT
Wi Y = XT
Ui V T
i Y = I1T
(UT
i X ◦ V T
i Y )
Two third-order tensors are needed for a feature vector , whose
elements are {Ci }.
The order of weight tensors is reduced by one, with replacing I1 with
IP ∈ IRd×c
.
U ∈ IRN×d
and V ∈ IRM×d
are redefined to get the joint embedding
feature vector C ∈ IRc
:
C = IPT
(UT
X ◦ V T
Y )
Jitendra Kumar Kushwaha (IIST) May 30, 2019 14 / 31
Joint Representation
Low-Rank Bi-Linear Model with Hadamard Product
This imposes a restriction on the rank of Wi to be at most
d ≤ min(N, M)
This mechanism factors three-dimensional weight tensor for bilinear
model into three two-dimensional weight matrices.
This enforces the rank of the weight tensor to be low-rank.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 15 / 31
Co-attention Mechanism
Co-attention Mechanism
The attention mechanism produces a spatial map highlighting image
regions relevant to answering the question.
The attention models [Huijuan, 2016 ][Jin-Hwa Kim, 2017] focused
on problem of identifying where to look means visual attention.
This model discusses the problem of identifying which word to listen
or question attention is equally important.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 16 / 31
Co-attention Mechanism
Visual Attention
That visual attention distribution helps to get attended visual
features.
αv = softmax PT
αv
σ(W v
q
T
Q) ◦ σ(W v
v
T
V )
Where Pαv ∈ IRd×N
, σ is a hyperbolic tangent function,
W v
q ∈ IRT×1
, W v
v ∈ IRN×1
and αv ∈ IRN
.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 17 / 31
Co-attention Mechanism
Question Attention
That question attention distribution helps to get attended question
features.
αq = softmax PT
αq
σ(W q
v
T
V ) ◦ σ(W q
q
T
Q)
Where Pαq ∈ IRd×T
, σ is a hyperbolic tangent function,
W q
q ∈ IRT×1
, W q
v ∈ IRN×1
and αq ∈ IRT
.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 18 / 31
Attended visual and question feature
Attended visual and question feature
Attended question feature is a linear combination of question
attention and question feature vectors.
Attended visual feature is a linear combination of visual attention and
visual-spatial region vectors.
ˆV =
N
n=1
αvn Vn , ˆQ =
T
t=1
αqt Qt
This is a fine-grained representation of image and question.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 19 / 31
Answer Prediction
Answer Prediction
The VQA task is treated as a classification task.
Answer prediction is based on the co-attended question and visual
features.
p(a|V , Q; Θ) =softmax PT
σ(Wq
T ˆQ) ◦ σ(Wv
T ˆV )
ˆa =arg max
a∈Ω
p a| V , Q; Θ
Jitendra Kumar Kushwaha (IIST) May 30, 2019 20 / 31
Answer Prediction
Experimental Setup
The size of the joint embedding of the visual and question feature is d,
which is the same with the rank d in low-rank bilinear model. The size of
the set of candidate answers is Ω. The decay rate and dropout are α and
p.
The RMSProp optimizer has been used with base learning rate 4e−4 and
the decay rate α= 0.90 as well as correction factor =1e−8. The batch
size is set to 100.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 21 / 31
Answer Prediction
Question Image Co-Attention with LBM Model
Jitendra Kumar Kushwaha (IIST) May 30, 2019 22 / 31
Answer Prediction
Datatset
The VQA v2.0 dataset is the largest dataset for the VQA task.
VQA v2.0 dataset comprises 248,349 questions for training, 121,512
questions validation and 244,302 questions testing.
On the basis of answer-type, the questions are divided into three
categories:
yes/no (binary)
number(number of objects)
other(one more than one-word answer)
Each question has 10 human annotated free-response answers.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 23 / 31
Answer Prediction
Evaluation Metric
The accuracy of a predicted answer a is evaluated as followed:
Accuracy(a) = min
Count(a)
3
, 1
Where Count(a) is the number of human(Amazon Mechanical Turk)
annotated answers matched with predicted answer a.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 24 / 31
Results
Results
Table: Assessment of Architecture on the VQA dataset
MODEL ALL YES/NO NUMBER OTHERS
With W-Att, ResNet 62.23 82.28 39.06 42.13
With P-Att, ResNet 67.07 84.50 40.37 43.71
With S-Att, ResNet 65.20 80.50 37.62 43.20
With P-Att, VGG 63.79 82.73 37.92 53.46
Jitendra Kumar Kushwaha (IIST) May 30, 2019 25 / 31
Results
Results
Table: Result on VQA v2.0 dataset and comparison with other models
MODEL ALL YES/NO NUMBER OTHERS
SMem[Huijuan, 2016 ] 58.24 80.80 37.53 46.32
SAN 58.85 79.11 36.41 46.42
qru[R. Li, 2016] 60.72 82.29 37.02 47.67
HieCoAtt[J. Lu, 2016] 62.06 79.95 38.22 51.95
MCB[Akira Fukui, 2016] 65.40 82.30 37.20 57.40
MLB[Jin-Hwa Kim, 2017] 65.84 83.84 37.87 56.76
With P-Att, ResNet 67.07 84.50 40.37 43.71
Jitendra Kumar Kushwaha (IIST) May 30, 2019 26 / 31
Conclusion and Future Work
Conclusion
In this thesis work, a VQA model has been proposed using
co-attention mechanism with low-rank bilinear model (LBM).
The LBM model gives the richer joint representation to determine
semantic objects and concepts.
The Co-Attention mechanism explores the natural symmetry between
image and question.
The experimental results achieved the better performance than state
of the art [Jin-Hwa Kim, 2017].
Jitendra Kumar Kushwaha (IIST) May 30, 2019 27 / 31
Conclusion and Future Work
Future Work
A VQA model can be developed to deal with spatial reasoning images
and questions.
The multiple low-rank bilinear model (LBM) can be applied, which
enhances the representativeness of co-attention mechanism.
The attention at word embedding module may capture the
informative and semantic concepts of question-words.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 28 / 31
References
References
Huijuan Xu and Kate Saenko. Ask, attend and answer: Exploring question-guided
spatial attention for visual question answering. In ECCV, pages 451466.Springer,
2016.
R. Li and J. Jia,Visual question answering with question representation up-date
(qru), in NIPS, 2016, pp. 46554663.
J. Lu, J. Yang, D. Batra, and D. Parikh,Hierarchical question-image co-attention
for visual question answering, in NIPS, 2016, pp. 289297
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell,and
Marcus Rohrbach.Multimodal Compact Bilinear Pooling for Visual Ques-tion
Answering and Visual Grounding. Conference on Empirical Methods in Natural
Language Processing, 2016
Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim
Jung-WooHa,Byoung-Tak Zhang ,Hadamard Product for Low-Rank Bilinear
Pooling.,ICLR, 2017
Opper, M., and Winther, O. (1999). A Bayesian approach to online learning.
OnLine Learning in Neural Networks. Cambridge University Press.
Jitendra Kumar Kushwaha (IIST) May 30, 2019 29 / 31
Jitendra Kumar Kushwaha (IIST) May 30, 2019 30 / 31
The End
Jitendra Kumar Kushwaha (IIST) May 30, 2019 31 / 31

Weitere ähnliche Inhalte

Was ist angesagt?

CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
CHN and Swap Heuristic to Solve the Maximum Independent Set ProblemCHN and Swap Heuristic to Solve the Maximum Independent Set Problem
CHN and Swap Heuristic to Solve the Maximum Independent Set ProblemIJECEIAES
 
Gans - Generative Adversarial Nets
Gans - Generative Adversarial NetsGans - Generative Adversarial Nets
Gans - Generative Adversarial NetsSajalRastogi8
 
DESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEM
DESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEMDESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEM
DESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEMLuma Tawfiq
 
Event Coreference Resolution using Mincut based Graph Clustering
Event Coreference Resolution using Mincut based Graph Clustering Event Coreference Resolution using Mincut based Graph Clustering
Event Coreference Resolution using Mincut based Graph Clustering cscpconf
 
Using Grid Puzzle to Solve Constraint-Based Scheduling Problem
Using Grid Puzzle to Solve Constraint-Based Scheduling ProblemUsing Grid Puzzle to Solve Constraint-Based Scheduling Problem
Using Grid Puzzle to Solve Constraint-Based Scheduling Problemcsandit
 
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM MethodsA Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM Methodsijdmtaiir
 
A HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMSA HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMSijfcstjournal
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Large-Scale Nonparametric Estimation of Vehicle Travel Time Distributions
Large-Scale Nonparametric Estimation of Vehicle Travel Time DistributionsLarge-Scale Nonparametric Estimation of Vehicle Travel Time Distributions
Large-Scale Nonparametric Estimation of Vehicle Travel Time DistributionsRikiya Takahashi
 
Neuro-Fuzzy Model for Strategic Intellectual Property Cost Management
Neuro-Fuzzy Model for Strategic Intellectual Property Cost ManagementNeuro-Fuzzy Model for Strategic Intellectual Property Cost Management
Neuro-Fuzzy Model for Strategic Intellectual Property Cost ManagementEditor IJCATR
 
Amelioration of Modeling and Solving the Weighted Constraint Satisfaction Pro...
Amelioration of Modeling and Solving the Weighted Constraint Satisfaction Pro...Amelioration of Modeling and Solving the Weighted Constraint Satisfaction Pro...
Amelioration of Modeling and Solving the Weighted Constraint Satisfaction Pro...IJCSIS Research Publications
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
On Intuitionistic Fuzzy Transportation Problem Using Pentagonal Intuitionisti...
On Intuitionistic Fuzzy Transportation Problem Using Pentagonal Intuitionisti...On Intuitionistic Fuzzy Transportation Problem Using Pentagonal Intuitionisti...
On Intuitionistic Fuzzy Transportation Problem Using Pentagonal Intuitionisti...YogeshIJTSRD
 

Was ist angesagt? (18)

CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
CHN and Swap Heuristic to Solve the Maximum Independent Set ProblemCHN and Swap Heuristic to Solve the Maximum Independent Set Problem
CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
 
Gans - Generative Adversarial Nets
Gans - Generative Adversarial NetsGans - Generative Adversarial Nets
Gans - Generative Adversarial Nets
 
DESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEM
DESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEMDESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEM
DESIGN SUITABLE FEED FORWARD NEURAL NETWORK TO SOLVE TROESCH'S PROBLEM
 
Event Coreference Resolution using Mincut based Graph Clustering
Event Coreference Resolution using Mincut based Graph Clustering Event Coreference Resolution using Mincut based Graph Clustering
Event Coreference Resolution using Mincut based Graph Clustering
 
Using Grid Puzzle to Solve Constraint-Based Scheduling Problem
Using Grid Puzzle to Solve Constraint-Based Scheduling ProblemUsing Grid Puzzle to Solve Constraint-Based Scheduling Problem
Using Grid Puzzle to Solve Constraint-Based Scheduling Problem
 
Bb25322324
Bb25322324Bb25322324
Bb25322324
 
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM MethodsA Study on Youth Violence and Aggression using DEMATEL with FCM Methods
A Study on Youth Violence and Aggression using DEMATEL with FCM Methods
 
A HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMSA HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Large-Scale Nonparametric Estimation of Vehicle Travel Time Distributions
Large-Scale Nonparametric Estimation of Vehicle Travel Time DistributionsLarge-Scale Nonparametric Estimation of Vehicle Travel Time Distributions
Large-Scale Nonparametric Estimation of Vehicle Travel Time Distributions
 
Bq25399403
Bq25399403Bq25399403
Bq25399403
 
Neuro-Fuzzy Model for Strategic Intellectual Property Cost Management
Neuro-Fuzzy Model for Strategic Intellectual Property Cost ManagementNeuro-Fuzzy Model for Strategic Intellectual Property Cost Management
Neuro-Fuzzy Model for Strategic Intellectual Property Cost Management
 
Amelioration of Modeling and Solving the Weighted Constraint Satisfaction Pro...
Amelioration of Modeling and Solving the Weighted Constraint Satisfaction Pro...Amelioration of Modeling and Solving the Weighted Constraint Satisfaction Pro...
Amelioration of Modeling and Solving the Weighted Constraint Satisfaction Pro...
 
Bm35359363
Bm35359363Bm35359363
Bm35359363
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
On Intuitionistic Fuzzy Transportation Problem Using Pentagonal Intuitionisti...
On Intuitionistic Fuzzy Transportation Problem Using Pentagonal Intuitionisti...On Intuitionistic Fuzzy Transportation Problem Using Pentagonal Intuitionisti...
On Intuitionistic Fuzzy Transportation Problem Using Pentagonal Intuitionisti...
 

Ähnlich wie Presentation jitendra

APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEAPPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEcscpconf
 
Learning Strategy with Groups on Page Based Students' Profiles
Learning Strategy with Groups on Page Based Students' ProfilesLearning Strategy with Groups on Page Based Students' Profiles
Learning Strategy with Groups on Page Based Students' Profilesaciijournal
 
Learning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profilesLearning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profilesaciijournal
 
Brain tumor segmentation using asymmetry based histogram thresholding and k m...
Brain tumor segmentation using asymmetry based histogram thresholding and k m...Brain tumor segmentation using asymmetry based histogram thresholding and k m...
Brain tumor segmentation using asymmetry based histogram thresholding and k m...eSAT Publishing House
 
Adaptive beamforming using lms algorithm
Adaptive beamforming using lms algorithmAdaptive beamforming using lms algorithm
Adaptive beamforming using lms algorithmeSAT Publishing House
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...IJCSEIT Journal
 
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...TELKOMNIKA JOURNAL
 
Minghui Conference Cross-Validation Talk
Minghui Conference Cross-Validation TalkMinghui Conference Cross-Validation Talk
Minghui Conference Cross-Validation TalkWei Wang
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 
An Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingAn Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingCSCJournals
 
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...Waqas Tariq
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 

Ähnlich wie Presentation jitendra (20)

APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGEAPPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
APPLICATION OF IMAGE FUSION FOR ENHANCING THE QUALITY OF AN IMAGE
 
ML_Lec1.pdf
ML_Lec1.pdfML_Lec1.pdf
ML_Lec1.pdf
 
Second subjective assignment
Second  subjective assignmentSecond  subjective assignment
Second subjective assignment
 
Learning Strategy with Groups on Page Based Students' Profiles
Learning Strategy with Groups on Page Based Students' ProfilesLearning Strategy with Groups on Page Based Students' Profiles
Learning Strategy with Groups on Page Based Students' Profiles
 
Learning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profilesLearning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profiles
 
Brain tumor segmentation using asymmetry based histogram thresholding and k m...
Brain tumor segmentation using asymmetry based histogram thresholding and k m...Brain tumor segmentation using asymmetry based histogram thresholding and k m...
Brain tumor segmentation using asymmetry based histogram thresholding and k m...
 
Adaptive beamforming using lms algorithm
Adaptive beamforming using lms algorithmAdaptive beamforming using lms algorithm
Adaptive beamforming using lms algorithm
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Review_Cibe Sridharan
Review_Cibe SridharanReview_Cibe Sridharan
Review_Cibe Sridharan
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
 
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...
 
Minghui Conference Cross-Validation Talk
Minghui Conference Cross-Validation TalkMinghui Conference Cross-Validation Talk
Minghui Conference Cross-Validation Talk
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
Rn d presentation_gurulingannk
Rn d presentation_gurulingannkRn d presentation_gurulingannk
Rn d presentation_gurulingannk
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
An Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video CodingAn Efficient Multiplierless Transform algorithm for Video Coding
An Efficient Multiplierless Transform algorithm for Video Coding
 
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
Particle Swarm Optimization in the fine-tuning of Fuzzy Software Cost Estimat...
 
MUMS: Transition & SPUQ Workshop - Dimension Reduction and Global Sensititvit...
MUMS: Transition & SPUQ Workshop - Dimension Reduction and Global Sensititvit...MUMS: Transition & SPUQ Workshop - Dimension Reduction and Global Sensititvit...
MUMS: Transition & SPUQ Workshop - Dimension Reduction and Global Sensititvit...
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 

Kürzlich hochgeladen

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 

Kürzlich hochgeladen (20)

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 

Presentation jitendra

  • 1. Question Image Co-attention by Low-Rank Bilinear Model for Visual Question Answering Jitendra Kumar Kushwaha IIST, Thiruvananthapuram Project Guide: Dr. Sumitra S. Associate Professor Dept. of Mathematics, IIST May 30, 2019 Jitendra Kumar Kushwaha (IIST) May 30, 2019 1 / 31
  • 2. Overview 1 Introduction 2 Applications 3 Image Feature Extraction 4 Question Modeling 5 Joint Representation Bilinear Model Low-Rank Bilinear Model with Hadamard Product 6 Co-attention Mechanism 7 Attended visual and question feature 8 Answer Prediction 9 Results 10 Conclusion and Future Work 11 References Jitendra Kumar Kushwaha (IIST) May 30, 2019 2 / 31
  • 3. Introduction Objective The goal of this thesis is to develop a model that can incorporate language and visual inputs and have their joint understanding. The model takes as input an image and a natural language question about the image and produces a natural language answer as the output. Jitendra Kumar Kushwaha (IIST) May 30, 2019 3 / 31
  • 4. Introduction Motivation In neural network models we don’t know if model is making sensible prediction or giving random guess. Incorporating Attention mechanism can gives us estimate of what the model learns. Instead of considering only image attention, Co-Attention mechanism allows us to consider image and question attention. In the Co-attention mechanism, image guides in finding question attention and vice-versa. Jitendra Kumar Kushwaha (IIST) May 30, 2019 4 / 31
  • 5. Applications Applications Aid visually-impaired users. Summarize the visual data for analysts. VQA in medical domain. Connection of vision and language. Jitendra Kumar Kushwaha (IIST) May 30, 2019 5 / 31
  • 6. Image Feature Extraction Image Feature Extraction The image model uses a CNN to get representation of images. CNN architectures are used to extract the image feature map V form raw image I. The image feature V = {v1, v2 . . . , vN}, where the vn is the feature vector at spatial location n. Jitendra Kumar Kushwaha (IIST) May 30, 2019 6 / 31
  • 7. Image Feature Extraction Pretrained Models The feature V = CNNvgg (I) is chosen from the last convolution layer, which retains spatial information of original images. A visual feature vector V of a rescaled image size of 3 × 448 × 448, is an output of the last convolution layer of VGG-19 networks, whose dimension is 512 × 14 × 14. Alternatively, ResNet-152 is used, whose dimension is of 2048 × 14 × 14. Jitendra Kumar Kushwaha (IIST) May 30, 2019 7 / 31
  • 8. Question Modeling Question Modeling There are three levels the representations of question: word-level phrase-level sentence-level The words in the question are converted into a 1-hot encoded vector, where the size of the vector is the size of vocabulary with binary (0 and 1) entries. Jitendra Kumar Kushwaha (IIST) May 30, 2019 8 / 31
  • 9. Question Modeling Question Modeling 1-hot encoded vectors are again embedded into a vector space to get: Qw = {qw 1 , qw 2 , . . . qw T }. The embedded word vectors represent the word-level feature of the question. Corresponding to every word in the question, there is a vector that represents that word. Jitendra Kumar Kushwaha (IIST) May 30, 2019 9 / 31
  • 10. Question Modeling Phrase-level Feature To compute the phrase-level feature, 1-D CNN can be applied on word-level feature vector with help of 3 filters: unigram, bigram, and trigram. The working of 1-D CNN is the same as 2-D CNN. To obtain phrase-level features, the max-pooling applied across all the three filters at each word location as shown in equation: qp t = max ˆqp 1,t, ˆqp 2,t, ˆqp 3,t , t ∈ {1, 2, . . . , T} These three filters capture the semantic meaning by grouping the words known as phrase-level features. Jitendra Kumar Kushwaha (IIST) May 30, 2019 10 / 31
  • 11. Question Modeling Phrase-level Feature Jitendra Kumar Kushwaha (IIST) May 30, 2019 11 / 31
  • 12. Question Modeling Question Modeling The LSTM embeds the phrase-level feature qp t into the sentence-level feature. Corresponding to every word in the question, there is a vector that represents that word. Jitendra Kumar Kushwaha (IIST) May 30, 2019 12 / 31
  • 13. Joint Representation Bilinear Model The bilinear model provides rich joint representation of two distinct input features. Bilinear model uses a quadratic expansion of linear transformation considering every pair of features. Ci = N j=1 M k=1 wijkxj yk = XT Wi Y The joint embedding C captures the semantic concept of both input features(X and Y ). The number of weight parameters required for joint embedding of vector size of size L is L × (N × M). Consists of third order tensor which limiting the applicability to computationally complex tasks. Jitendra Kumar Kushwaha (IIST) May 30, 2019 13 / 31
  • 14. Joint Representation Low-Rank Bilinear Model with Hadamard Product Low-rank bilinear method is to reduce the rank of the weight matrix Wi to have less number of parameters for regularization. Ci = XT Wi Y = XT Ui V T i Y = I1T (UT i X ◦ V T i Y ) Two third-order tensors are needed for a feature vector , whose elements are {Ci }. The order of weight tensors is reduced by one, with replacing I1 with IP ∈ IRd×c . U ∈ IRN×d and V ∈ IRM×d are redefined to get the joint embedding feature vector C ∈ IRc : C = IPT (UT X ◦ V T Y ) Jitendra Kumar Kushwaha (IIST) May 30, 2019 14 / 31
  • 15. Joint Representation Low-Rank Bi-Linear Model with Hadamard Product This imposes a restriction on the rank of Wi to be at most d ≤ min(N, M) This mechanism factors three-dimensional weight tensor for bilinear model into three two-dimensional weight matrices. This enforces the rank of the weight tensor to be low-rank. Jitendra Kumar Kushwaha (IIST) May 30, 2019 15 / 31
  • 16. Co-attention Mechanism Co-attention Mechanism The attention mechanism produces a spatial map highlighting image regions relevant to answering the question. The attention models [Huijuan, 2016 ][Jin-Hwa Kim, 2017] focused on problem of identifying where to look means visual attention. This model discusses the problem of identifying which word to listen or question attention is equally important. Jitendra Kumar Kushwaha (IIST) May 30, 2019 16 / 31
  • 17. Co-attention Mechanism Visual Attention That visual attention distribution helps to get attended visual features. αv = softmax PT αv σ(W v q T Q) ◦ σ(W v v T V ) Where Pαv ∈ IRd×N , σ is a hyperbolic tangent function, W v q ∈ IRT×1 , W v v ∈ IRN×1 and αv ∈ IRN . Jitendra Kumar Kushwaha (IIST) May 30, 2019 17 / 31
  • 18. Co-attention Mechanism Question Attention That question attention distribution helps to get attended question features. αq = softmax PT αq σ(W q v T V ) ◦ σ(W q q T Q) Where Pαq ∈ IRd×T , σ is a hyperbolic tangent function, W q q ∈ IRT×1 , W q v ∈ IRN×1 and αq ∈ IRT . Jitendra Kumar Kushwaha (IIST) May 30, 2019 18 / 31
  • 19. Attended visual and question feature Attended visual and question feature Attended question feature is a linear combination of question attention and question feature vectors. Attended visual feature is a linear combination of visual attention and visual-spatial region vectors. ˆV = N n=1 αvn Vn , ˆQ = T t=1 αqt Qt This is a fine-grained representation of image and question. Jitendra Kumar Kushwaha (IIST) May 30, 2019 19 / 31
  • 20. Answer Prediction Answer Prediction The VQA task is treated as a classification task. Answer prediction is based on the co-attended question and visual features. p(a|V , Q; Θ) =softmax PT σ(Wq T ˆQ) ◦ σ(Wv T ˆV ) ˆa =arg max a∈Ω p a| V , Q; Θ Jitendra Kumar Kushwaha (IIST) May 30, 2019 20 / 31
  • 21. Answer Prediction Experimental Setup The size of the joint embedding of the visual and question feature is d, which is the same with the rank d in low-rank bilinear model. The size of the set of candidate answers is Ω. The decay rate and dropout are α and p. The RMSProp optimizer has been used with base learning rate 4e−4 and the decay rate α= 0.90 as well as correction factor =1e−8. The batch size is set to 100. Jitendra Kumar Kushwaha (IIST) May 30, 2019 21 / 31
  • 22. Answer Prediction Question Image Co-Attention with LBM Model Jitendra Kumar Kushwaha (IIST) May 30, 2019 22 / 31
  • 23. Answer Prediction Datatset The VQA v2.0 dataset is the largest dataset for the VQA task. VQA v2.0 dataset comprises 248,349 questions for training, 121,512 questions validation and 244,302 questions testing. On the basis of answer-type, the questions are divided into three categories: yes/no (binary) number(number of objects) other(one more than one-word answer) Each question has 10 human annotated free-response answers. Jitendra Kumar Kushwaha (IIST) May 30, 2019 23 / 31
  • 24. Answer Prediction Evaluation Metric The accuracy of a predicted answer a is evaluated as followed: Accuracy(a) = min Count(a) 3 , 1 Where Count(a) is the number of human(Amazon Mechanical Turk) annotated answers matched with predicted answer a. Jitendra Kumar Kushwaha (IIST) May 30, 2019 24 / 31
  • 25. Results Results Table: Assessment of Architecture on the VQA dataset MODEL ALL YES/NO NUMBER OTHERS With W-Att, ResNet 62.23 82.28 39.06 42.13 With P-Att, ResNet 67.07 84.50 40.37 43.71 With S-Att, ResNet 65.20 80.50 37.62 43.20 With P-Att, VGG 63.79 82.73 37.92 53.46 Jitendra Kumar Kushwaha (IIST) May 30, 2019 25 / 31
  • 26. Results Results Table: Result on VQA v2.0 dataset and comparison with other models MODEL ALL YES/NO NUMBER OTHERS SMem[Huijuan, 2016 ] 58.24 80.80 37.53 46.32 SAN 58.85 79.11 36.41 46.42 qru[R. Li, 2016] 60.72 82.29 37.02 47.67 HieCoAtt[J. Lu, 2016] 62.06 79.95 38.22 51.95 MCB[Akira Fukui, 2016] 65.40 82.30 37.20 57.40 MLB[Jin-Hwa Kim, 2017] 65.84 83.84 37.87 56.76 With P-Att, ResNet 67.07 84.50 40.37 43.71 Jitendra Kumar Kushwaha (IIST) May 30, 2019 26 / 31
  • 27. Conclusion and Future Work Conclusion In this thesis work, a VQA model has been proposed using co-attention mechanism with low-rank bilinear model (LBM). The LBM model gives the richer joint representation to determine semantic objects and concepts. The Co-Attention mechanism explores the natural symmetry between image and question. The experimental results achieved the better performance than state of the art [Jin-Hwa Kim, 2017]. Jitendra Kumar Kushwaha (IIST) May 30, 2019 27 / 31
  • 28. Conclusion and Future Work Future Work A VQA model can be developed to deal with spatial reasoning images and questions. The multiple low-rank bilinear model (LBM) can be applied, which enhances the representativeness of co-attention mechanism. The attention at word embedding module may capture the informative and semantic concepts of question-words. Jitendra Kumar Kushwaha (IIST) May 30, 2019 28 / 31
  • 29. References References Huijuan Xu and Kate Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, pages 451466.Springer, 2016. R. Li and J. Jia,Visual question answering with question representation up-date (qru), in NIPS, 2016, pp. 46554663. J. Lu, J. Yang, D. Batra, and D. Parikh,Hierarchical question-image co-attention for visual question answering, in NIPS, 2016, pp. 289297 Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell,and Marcus Rohrbach.Multimodal Compact Bilinear Pooling for Visual Ques-tion Answering and Visual Grounding. Conference on Empirical Methods in Natural Language Processing, 2016 Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim Jung-WooHa,Byoung-Tak Zhang ,Hadamard Product for Low-Rank Bilinear Pooling.,ICLR, 2017 Opper, M., and Winther, O. (1999). A Bayesian approach to online learning. OnLine Learning in Neural Networks. Cambridge University Press. Jitendra Kumar Kushwaha (IIST) May 30, 2019 29 / 31
  • 30. Jitendra Kumar Kushwaha (IIST) May 30, 2019 30 / 31
  • 31. The End Jitendra Kumar Kushwaha (IIST) May 30, 2019 31 / 31