Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach, Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9992-10002
https://openaccess.thecvf.com/content_CVPR_2020/html/Hu_Iterative_Answer_Prediction_With_Pointer-Augmented_Multimodal_Transformers_for_TextVQA_CVPR_2020_paper.html