Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Which visual questions are
difficult to answer?
Analysis with entropy of answer distributions
Kento Terao Toru Tamaki Biss...
What is this
item?
Is the catcher
wearing
the safety
gear?
WHICH IS DIFFICULT
TO ANSER?
Some questions are easy, some are difficult
Q : What is the player’s
position behind the batter?
yes
no
catcher
1
・・・
・・・
...
Related works and our key contribution
# unique answers: 6
Entropy: 1.61
Ours: analyzing distributions of multiple VQA mod...
Proposed method
2. K-means clustering on
3D entropy vectors
Model I: image only
Model Q: question only
Model Q+I: image an...
Experiments
Dataset
• VQA v2 [Goyal+, CVPR2017]
• Training set for training VQA models
• Validation set for clustering and...
Results and observations
• All methods show poor
performances on the most
difficult cluster (about 10%
accuracy)
• The val...
Examples in cluster 0 Annotations agree
VQA models agree and answer correctly (about 85% accuracy)
Examples in cluster 9 Visual questions are difficult to answer,
even when annotations agree (about 10% accuracy)
Check out !
Github
• Clustering results and visualization code available
Visual Question Difficulty (VQD)
https://github.c...
Nächste SlideShare
Wird geladen in …5
×

Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

69 Aufrufe

Veröffentlicht am

Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shin'ichi Satoh, Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions, Visual Question Answering and Dialog Workshop at CVPR 2020, June 14
https://youtu.be/g24WtI3vS1Y

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

  1. 1. Which visual questions are difficult to answer? Analysis with entropy of answer distributions Kento Terao Toru Tamaki Bisser Raytchev Shin’ichi SatohKazufumi Kaneda Hiroshima University NII Visual Question Answering and Dialog Workshop at CVPR 2020, June 14 github https://github.com/tttamaki/vqd arXiv https://arxiv.org/abs/2004.05595
  2. 2. What is this item? Is the catcher wearing the safety gear? WHICH IS DIFFICULT TO ANSER?
  3. 3. Some questions are easy, some are difficult Q : What is the player’s position behind the batter? yes no catcher 1 ・・・ ・・・ Q : How many people? many over100 50 100 ・・・ ・・・ answer distribution VQA Model VQA Model answer distribution Motivation • Finding which one is difficult? Application • Using the difficulty for developing new VQA models Contribution • Providing a practical and surprisingly simple way to asses the difficulty • Finding question clusters difficult for any VQA models Difficult question Easy question
  4. 4. Related works and our key contribution # unique answers: 6 Entropy: 1.61 Ours: analyzing distributions of multiple VQA models • No annotation of difficulty • Estimating difficulty of visual questions, even in the test set many over100 50 100 ・・・ ・・・ VQA model 1 Answer distributions Related works: analyzing distributions of answers by humans • Estimating # unique answers [Gurari and Grauman, CHI’17] • Predicting reasons why they disagree [Bhattacharya+, ICCV2019] • Predicting entropy as annotation diversity [Yang+, HCOMP 2018] Q: How many people can fit in the 2 buses? cloud workers 40, 80, 100, 100, 100, 100, 200, many, many, lot Ground truth answers many over100 50 100 ・・・ ・・・ VQA model 2 many over10 50 100 ・・・ ・・・ VQA model 3
  5. 5. Proposed method 2. K-means clustering on 3D entropy vectors Model I: image only Model Q: question only Model Q+I: image and question Q+I baseline: Pythia v0.1 [CVPR2018] dim = 3,129 dim = 3 1. Computing 3D entropy vectors 3. Analyzing accuracy of VQA models for each cluster entropy
  6. 6. Experiments Dataset • VQA v2 [Goyal+, CVPR2017] • Training set for training VQA models • Validation set for clustering and analysis Protocol • Training I, Q, and Q+I models on the training set • For each model • predicting answer distributions of each of visual question in the validation set • computing entropy values • Performing k-means clustering (k=10) on the validation set • Computing statistics for each cluster, and sort clusters in order of entropy • Assigning questions in the test set to clusters Comparisons • Predicting by using the state-of-the-art VQA models (trained on the training set) • BUTD [CVPR2018] • MFB [EMNLP2016] • MFH [TNNLS2018] • BAN-4/8 [NeurIPS2018] • MCAN-small/large [CVPR2019] • Pythia v0.3 [CVPR2019]
  7. 7. Results and observations • All methods show poor performances on the most difficult cluster (about 10% accuracy) • The values of cluster entropy are highly correlated with the cluster accuracy; entropy values increase while accuracy decreases from cluster 0 to 9 • As the cluster difficulty increases, the answers predicted by the different methods begin to differ
  8. 8. Examples in cluster 0 Annotations agree VQA models agree and answer correctly (about 85% accuracy)
  9. 9. Examples in cluster 9 Visual questions are difficult to answer, even when annotations agree (about 10% accuracy)
  10. 10. Check out ! Github • Clustering results and visualization code available Visual Question Difficulty (VQD) https://github.com/tttamaki/vqd Paper • More in-depth discussions can be found on arXiv https://arxiv.org/abs/2004.05595 You may use the difficulty in your model for questions in the training, validation, and test sets

×