SlideShare ist ein Scribd-Unternehmen logo
1 von 18
CRiTERIA - Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related
risks - has received funding from the European Union’s Horizon 2020 research and innovation action program under grant agreement No 101021866.
Combining textual and visual features into
multiple joint latent spaces, and introducing
dual softmax re-ranking, for Ad-hoc Video
Search
Damianos Galanopoulos, Vasileios Mezaris
Information Technologies Institute-CERTH
TRECVID 2022 Workshop
December 6 - 9, 2022
Problem statement
2
● Retrieving video shots using natural-language textual queries
○ “Find a shot of a man with a white beard”
Our 2021 model 1,2,3
3
1 K. Gkountakos, D. Galanopoulos, D. Touska, K. Ioannidis, S. Vrochidis, V. Mezaris, I. Kompatsiaris, "ITI-CERTH participation in ActEV and AVS Tracks of
TRECVID 2021", Proc. TRECVID 2021 Workshop, Dec. 2021.
2 D. Galanopoulos, V. Mezaris, "Hard-negatives or Non-negatives? A hard-negative selection strategy for cross-modal retrieval using the improved
marginal ranking loss", Proc. IEEE/CVF ICCV Workshops (ICCVW), pp. 2312-2316, Oct. 2021. DOI:10.1109/ICCVW54120.2021.00261.
3 D. Galanopoulos, V. Mezaris, "Attention Mechanisms, Signal Encodings and Fusion Strategies for Improved Ad-hoc Video Search with Dual Encoding
Networks", Proc. ACM Int. Conf. on Multimedia Retrieval (ICMR 2020), Dublin, Ireland, 2020. DOI:10.1145/3372278.3390737.
● We utilize the T x V cross-modal network
● Combination of textual and visual features
● Multiple latent spaces development
● Multi-loss-based learning
● Similarity revision using a dual softmax operation
Our 2022 approach 4
4
4 D. Galanopoulos, V. Mezaris, "Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space
Learning for Text-Based Video Retrieval", Proc. ECCV 2022 Workshop on AI for Creative Video Editing and
Understanding (CVEU), Oct. 2022. https://arxiv.org/abs/2211.11351.
T x V model
Our 2022 approach
5
BoW
BERT
W2V
CLIP
ATT
CLIP
encoder
ResNet152-11k
ResNeXt-101-wsl
ViT-B/32 CLIP
Dual Softmax Inference
6
● Query-video similarities revision
● Inspired by the Dual Softmax loss
● Two approaches:
○ Evaluation with a priori knowledge of all queries
○ Query-agnostic evaluation
Dual Softmax Inference
7
● Two different background queries strategies
● AVS 2022 queries
● AVS 2019-2020-2021 queries
● Textual Features
○ Bag-of-words
○ BERT
○ Word2Vec
○ ViT-B/32 CLIP
● Textual Encoders
○ Attention-based dual encoding network (ATT)
○ ViT-B/32 CLIP encoder
Features and encoders
8
Features and encoders
9
● Visual Features
○ ResNet-152 trained on the ImageNet11k dataset
○ ResNeXt-101 re-trained by weakly supervised learning on web
images and fine-tuned on ImageNet
○ ViT-B/32 CLIP model
● Datasets:
○ Training: MSR-VTT, TGIF, ActivityNet Captions and Vatex
○ Evaluation:
■ V3C2 evaluated on TRECVID AVS 2022 queries
■ V3C1 evaluated on TRECVID AVS 2019-2021 queries
■ IACC.3 evaluated on TRECVID AVS 2016-2018 queries
● Evaluation metric:
○ Mean extended inferred average precision (MXinfAP)
Datasets and metrics
10
● ITI_CERTH.22_run_2:
○ T x V model with 2 textual encoders and 3 visual features
○ Late fusion of 6 trained models with different configurations
○ Dual softmax using all AVS2022 queries as background queries
● ITI_CERTH.22_run_1:
○ Similar to run #2
○ Dual softmax using AVS 2019-2021 queries as background
Submitted runs
11
Results
12
MXinfAP for all submitted runs and T x V baseline for the fully-
automatic AVS task.
Results
13
AVS 2022 ranking list of all submitted runs regarding the main task in
MXinfAP terms. Red bars indicate our submitted runs.
“A person is in the act of swinging”
Dual Softmax impact
14
T x V model T x V model with DS (run #2)
“A drone landing or rising from the ground”
Dual Softmax impact
15
T x V model T x V model with DS (run #2)
“A type of cloth hanging on a rack, hanger, or line”
Dual Softmax impact
16
T x V model T x V model with DS (run #2)
Conclusions
17
● We utilize the cross-modal T x V network
● We combine multiple textual and visual features
● The dual softmax operation boosts the performance
● The a priori knowledge of evaluation queries slightly benefits the
performance
● A query-agnostic dual softmax revision serves real-life scenarios,
with a small trade-off regarding the overall performance
Thank you!
18
Damianos Galanopoulos, Vasileios Mezaris
Information Technologies Institute-CERTH
dgalanop@iti.gr, bmezaris@iti.gr
www.iti.gr/~bmezaris

Weitere ähnliche Inhalte

Ähnlich wie Combining textual and visual features for Ad-hoc Video Search

State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016GeoSolutions
 
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
Artificial Intelligence for Vision:  A walkthrough of recent breakthroughsArtificial Intelligence for Vision:  A walkthrough of recent breakthroughs
Artificial Intelligence for Vision: A walkthrough of recent breakthroughsNikolas Markou
 
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...Edge AI and Vision Alliance
 
CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGCAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGIRJET Journal
 
labelCloud – A Lightweight Domain-Independent Labeling Tool for 3D Object Det...
labelCloud – A Lightweight Domain-Independent Labeling Tool for 3D Object Det...labelCloud – A Lightweight Domain-Independent Labeling Tool for 3D Object Det...
labelCloud – A Lightweight Domain-Independent Labeling Tool for 3D Object Det...Niklas Kühl
 
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyMulti-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyIJERA Editor
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingAlpen-Adria-Universität
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfVignesh V Menon
 
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from QualcommEdge AI and Vision Alliance
 
Paper discussion:Video-to-Video Synthesis (NIPS 2018)
Paper discussion:Video-to-Video Synthesis (NIPS 2018)Paper discussion:Video-to-Video Synthesis (NIPS 2018)
Paper discussion:Video-to-Video Synthesis (NIPS 2018)Motaz Sabri
 
CAE DESIGN ENGINEER
CAE DESIGN ENGINEERCAE DESIGN ENGINEER
CAE DESIGN ENGINEERvishwanath n
 
MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when p...
MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when p...MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when p...
MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when p...multimediaeval
 
State of GeoServer 2.10
State of GeoServer 2.10State of GeoServer 2.10
State of GeoServer 2.10Jody Garnett
 
Chainer OpenPOWER developer congress HandsON 20170522_ota
Chainer OpenPOWER developer congress HandsON 20170522_otaChainer OpenPOWER developer congress HandsON 20170522_ota
Chainer OpenPOWER developer congress HandsON 20170522_otaPreferred Networks
 
A Blind Multiple Watermarks based on Human Visual Characteristics
A Blind Multiple Watermarks based on Human Visual Characteristics A Blind Multiple Watermarks based on Human Visual Characteristics
A Blind Multiple Watermarks based on Human Visual Characteristics IJECEIAES
 
A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...
A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...
A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...researchinventy
 
Crowdsourcing video annotation
Crowdsourcing video annotationCrowdsourcing video annotation
Crowdsourcing video annotationMichiel Hildebrand
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureRouyun Pan
 

Ähnlich wie Combining textual and visual features for Ad-hoc Video Search (20)

State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016State of GeoServer - FOSS4G 2016
State of GeoServer - FOSS4G 2016
 
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
Artificial Intelligence for Vision:  A walkthrough of recent breakthroughsArtificial Intelligence for Vision:  A walkthrough of recent breakthroughs
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
 
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
“COVID-19 Safe Distancing Measures in Public Spaces with Edge AI,” a Presenta...
 
CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGCAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNING
 
labelCloud – A Lightweight Domain-Independent Labeling Tool for 3D Object Det...
labelCloud – A Lightweight Domain-Independent Labeling Tool for 3D Object Det...labelCloud – A Lightweight Domain-Independent Labeling Tool for 3D Object Det...
labelCloud – A Lightweight Domain-Independent Labeling Tool for 3D Object Det...
 
A0540106
A0540106A0540106
A0540106
 
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive StudyMulti-View Video Coding Algorithms/Techniques: A Comprehensive Study
Multi-View Video Coding Algorithms/Techniques: A Comprehensive Study
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive Streaming
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm“Efficient Video Perception Through AI,” a Presentation from Qualcomm
“Efficient Video Perception Through AI,” a Presentation from Qualcomm
 
Paper discussion:Video-to-Video Synthesis (NIPS 2018)
Paper discussion:Video-to-Video Synthesis (NIPS 2018)Paper discussion:Video-to-Video Synthesis (NIPS 2018)
Paper discussion:Video-to-Video Synthesis (NIPS 2018)
 
CAE DESIGN ENGINEER
CAE DESIGN ENGINEERCAE DESIGN ENGINEER
CAE DESIGN ENGINEER
 
MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when p...
MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when p...MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when p...
MediaEval 2017 - Interestingness Task: Multimodality and Deep Learning when p...
 
State of GeoServer 2.10
State of GeoServer 2.10State of GeoServer 2.10
State of GeoServer 2.10
 
Chainer OpenPOWER developer congress HandsON 20170522_ota
Chainer OpenPOWER developer congress HandsON 20170522_otaChainer OpenPOWER developer congress HandsON 20170522_ota
Chainer OpenPOWER developer congress HandsON 20170522_ota
 
A Blind Multiple Watermarks based on Human Visual Characteristics
A Blind Multiple Watermarks based on Human Visual Characteristics A Blind Multiple Watermarks based on Human Visual Characteristics
A Blind Multiple Watermarks based on Human Visual Characteristics
 
A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...
A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...
A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...
 
Crowdsourcing video annotation
Crowdsourcing video annotationCrowdsourcing video annotation
Crowdsourcing video annotation
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 

Mehr von VasileiosMezaris

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationVasileiosMezaris
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskVasileiosMezaris
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosVasileiosMezaris
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...VasileiosMezaris
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsVasileiosMezaris
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionVasileiosMezaris
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersVasileiosMezaris
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...VasileiosMezaris
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video SummarizationVasileiosMezaris
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web applicationVasileiosMezaris
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video SummarizationVasileiosMezaris
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AIVasileiosMezaris
 
PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020VasileiosMezaris
 
GAN-based video summarization
GAN-based video summarizationGAN-based video summarization
GAN-based video summarizationVasileiosMezaris
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrievalVasileiosMezaris
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruningVasileiosMezaris
 

Mehr von VasileiosMezaris (20)

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and Localization
 
CERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages TaskCERTH-ITI at MediaEval 2023 NewsImages Task
CERTH-ITI at MediaEval 2023 NewsImages Task
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees Videos
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
 
Gated-ViGAT
Gated-ViGATGated-ViGAT
Gated-ViGAT
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attention
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web application
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video Summarization
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AI
 
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
 
PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020
 
GAN-based video summarization
GAN-based video summarizationGAN-based video summarization
GAN-based video summarization
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrieval
 
Fractional step discriminant pruning
Fractional step discriminant pruningFractional step discriminant pruning
Fractional step discriminant pruning
 

Kürzlich hochgeladen

preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 

Kürzlich hochgeladen (20)

preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 

Combining textual and visual features for Ad-hoc Video Search

  • 1. CRiTERIA - Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks - has received funding from the European Union’s Horizon 2020 research and innovation action program under grant agreement No 101021866. Combining textual and visual features into multiple joint latent spaces, and introducing dual softmax re-ranking, for Ad-hoc Video Search Damianos Galanopoulos, Vasileios Mezaris Information Technologies Institute-CERTH TRECVID 2022 Workshop December 6 - 9, 2022
  • 2. Problem statement 2 ● Retrieving video shots using natural-language textual queries ○ “Find a shot of a man with a white beard”
  • 3. Our 2021 model 1,2,3 3 1 K. Gkountakos, D. Galanopoulos, D. Touska, K. Ioannidis, S. Vrochidis, V. Mezaris, I. Kompatsiaris, "ITI-CERTH participation in ActEV and AVS Tracks of TRECVID 2021", Proc. TRECVID 2021 Workshop, Dec. 2021. 2 D. Galanopoulos, V. Mezaris, "Hard-negatives or Non-negatives? A hard-negative selection strategy for cross-modal retrieval using the improved marginal ranking loss", Proc. IEEE/CVF ICCV Workshops (ICCVW), pp. 2312-2316, Oct. 2021. DOI:10.1109/ICCVW54120.2021.00261. 3 D. Galanopoulos, V. Mezaris, "Attention Mechanisms, Signal Encodings and Fusion Strategies for Improved Ad-hoc Video Search with Dual Encoding Networks", Proc. ACM Int. Conf. on Multimedia Retrieval (ICMR 2020), Dublin, Ireland, 2020. DOI:10.1145/3372278.3390737.
  • 4. ● We utilize the T x V cross-modal network ● Combination of textual and visual features ● Multiple latent spaces development ● Multi-loss-based learning ● Similarity revision using a dual softmax operation Our 2022 approach 4 4 4 D. Galanopoulos, V. Mezaris, "Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval", Proc. ECCV 2022 Workshop on AI for Creative Video Editing and Understanding (CVEU), Oct. 2022. https://arxiv.org/abs/2211.11351.
  • 5. T x V model Our 2022 approach 5 BoW BERT W2V CLIP ATT CLIP encoder ResNet152-11k ResNeXt-101-wsl ViT-B/32 CLIP
  • 6. Dual Softmax Inference 6 ● Query-video similarities revision ● Inspired by the Dual Softmax loss ● Two approaches: ○ Evaluation with a priori knowledge of all queries ○ Query-agnostic evaluation
  • 7. Dual Softmax Inference 7 ● Two different background queries strategies ● AVS 2022 queries ● AVS 2019-2020-2021 queries
  • 8. ● Textual Features ○ Bag-of-words ○ BERT ○ Word2Vec ○ ViT-B/32 CLIP ● Textual Encoders ○ Attention-based dual encoding network (ATT) ○ ViT-B/32 CLIP encoder Features and encoders 8
  • 9. Features and encoders 9 ● Visual Features ○ ResNet-152 trained on the ImageNet11k dataset ○ ResNeXt-101 re-trained by weakly supervised learning on web images and fine-tuned on ImageNet ○ ViT-B/32 CLIP model
  • 10. ● Datasets: ○ Training: MSR-VTT, TGIF, ActivityNet Captions and Vatex ○ Evaluation: ■ V3C2 evaluated on TRECVID AVS 2022 queries ■ V3C1 evaluated on TRECVID AVS 2019-2021 queries ■ IACC.3 evaluated on TRECVID AVS 2016-2018 queries ● Evaluation metric: ○ Mean extended inferred average precision (MXinfAP) Datasets and metrics 10
  • 11. ● ITI_CERTH.22_run_2: ○ T x V model with 2 textual encoders and 3 visual features ○ Late fusion of 6 trained models with different configurations ○ Dual softmax using all AVS2022 queries as background queries ● ITI_CERTH.22_run_1: ○ Similar to run #2 ○ Dual softmax using AVS 2019-2021 queries as background Submitted runs 11
  • 12. Results 12 MXinfAP for all submitted runs and T x V baseline for the fully- automatic AVS task.
  • 13. Results 13 AVS 2022 ranking list of all submitted runs regarding the main task in MXinfAP terms. Red bars indicate our submitted runs.
  • 14. “A person is in the act of swinging” Dual Softmax impact 14 T x V model T x V model with DS (run #2)
  • 15. “A drone landing or rising from the ground” Dual Softmax impact 15 T x V model T x V model with DS (run #2)
  • 16. “A type of cloth hanging on a rack, hanger, or line” Dual Softmax impact 16 T x V model T x V model with DS (run #2)
  • 17. Conclusions 17 ● We utilize the cross-modal T x V network ● We combine multiple textual and visual features ● The dual softmax operation boosts the performance ● The a priori knowledge of evaluation queries slightly benefits the performance ● A query-agnostic dual softmax revision serves real-life scenarios, with a small trade-off regarding the overall performance
  • 18. Thank you! 18 Damianos Galanopoulos, Vasileios Mezaris Information Technologies Institute-CERTH dgalanop@iti.gr, bmezaris@iti.gr www.iti.gr/~bmezaris