Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
What’s in a Question: Using Visual Questions as a Form of Supervision
1. What’s in a Question:
Using Visual Questions
as a Form of Supervision
Siddha Ganju, Olga Russakovsky, Abhinav Gupta
http://sidgan.me/whats_in_a_question/
2. Questions are informative
Information from the question:
• The animal in the scene is a ‘dog’
• ‘Breed’ is a property of ‘dog’
• All ‘dogs’ in the scene are of the
same ‘breed’
• Knowing the ‘breed’ may be
important
Goal: quantify and utilize this information
What breed of
dog is this?
What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/
3. Analysis of Visual Questions (with no answers)
What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/
• Questions image captions
• Questions object classification
• Questions to improve VQA
Question Inferred objects
What color is the bus?
4. CNN
Text
Embedding
What is under the
plane
dolphin (0.2)
yes (0.1)
water (0.7)
Multiple choice
Can this plane land
on water How many
planes are there
Text
Embedding
iBOWIMG [Zhou ArXiv15]
iBOWIMG-2x
Improving VQA with extra questions
What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/
5. Experiment #1: With unanswered questions
Results on MSCOCO VQA 1.0
What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/
Training: 1 target question (with answer) and
2 other questions per image
Test: VQA Val set
Model Accuracy
iBOWIMG [Zhou ArXiv15] 47.3
iBOWIMG-2x 50.4
6. Experiment #2: Standard benchmark
Results on MSCOCO VQA 1.0
What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/
Accuracy by question type
Model Accuracy Yes/no Number Word
iBOWIMG [Zhou ArXiv15] 55.7 76.5 34.9 42.6
iBOWIMG-2x 62.8 80.7 37.9 53.1
Training: All questions and answers on train+val
Test: VQA test-dev set
7. Conclusions
• Study using visual questions themselves as a form of
supervision
• Provide both qualitative and quantitative analysis of
how much information is contained within the questions
• Demonstrate significant improvements over baselines
on standard benchmarks
What’s in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju, Olga Russakovsky, Abhinav Gupta (CMU) http://sidgan.me/whats_in_a_question/