SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
ViSiL: Fine-grained Spatio-Temporal
Video Similarity Learning
Giorgos Kordopatis-Zilos Symeon Papadopoulos Ioannis Patras Ioannis Kompatsiaris
Problem statement
Given two arbitrary videos, calculate their similarity based on their visual content.
Query Video
Complementary
Scene Video
Duplicate
Scene Video
Incident
Scene Video
Application scenario
• Video Retrieval
Video-level methods
Z. Gao et al. “ER3: A unified framework for event retrieval, recognition and recounting”. CVPR, 2017.
G. Kordopatis-Zilos et al. “Near-duplicate video retrieval with deep metric learning”. ICCVW, 2017.
Video similarity calculation disregards
spatio-temporal information of videos
Frame-level methods
Y. Jiang and J. Wang. “Partial copy detection in videos: A benchmark and an evaluation of popular methods”. Tran. on Big Data, 2016.
L. Baraldi et al. “LAMV: Learning to align and match videos with kernelized temporal layers”. CVPR, 2018.
Frame-to-frame similarity
calculation disregards the
spatial structure of frames
Motivation
Fine-grained similarity calculation
• Learn a video similarity function that respects:
• Spatial structure of video frames (intra-frame relations)
• Temporal structure of videos (inter-frame relations)
Frame-to-frame similarity
Chamfer Similarity
Frame-to-frame similarity
Baseline frame-to-frame
similarity matrix
ViSiL frame-to-frame
similarity matrix
Video-to-video similarity
Video Similarity Learning network
• 4-layer CNN
• Captures the temporal structures
on similarity matrix with the
convolutional filters
Chamfer Similarity
Training ViSiL
Experimental results
Near-Duplicate Video Retrieval
(CC_WEB_VIDEO)
Fine-grained Incident
Video Retrieval
(FIVR-200K)
Action Video Retrieval
(ActivityNet)
Event-based Video Retrieval (EVVE)
Visual examples
query video database video
frame-to-frame
similarity matrix
ViSiL output video-to-video
similarity
0.8
0.5
0.7
near-duplicate
videos
same event
videos
same action
videos
Thank you!
Poster ID: No. 39
Code & models:
https://github.com/MKLab-ITI/visil
With the support of:
Get in touch:
Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo
No. EP/R026424/1No. 825297

Weitere ähnliche Inhalte

Ähnlich wie ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"..."How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
Edge AI and Vision Alliance
 

Ähnlich wie ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning (6)

Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iii
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...
 
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"..."How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
"How Image Sensor and Video Compression Parameters Impact Vision Algorithms,"...
 
Sparse representation in image and video copy detection
Sparse representation in image and video copy detectionSparse representation in image and video copy detection
Sparse representation in image and video copy detection
 
06-08 ppt.pptx
06-08 ppt.pptx06-08 ppt.pptx
06-08 ppt.pptx
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

  • 1. ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning Giorgos Kordopatis-Zilos Symeon Papadopoulos Ioannis Patras Ioannis Kompatsiaris
  • 2. Problem statement Given two arbitrary videos, calculate their similarity based on their visual content. Query Video Complementary Scene Video Duplicate Scene Video Incident Scene Video Application scenario • Video Retrieval
  • 3. Video-level methods Z. Gao et al. “ER3: A unified framework for event retrieval, recognition and recounting”. CVPR, 2017. G. Kordopatis-Zilos et al. “Near-duplicate video retrieval with deep metric learning”. ICCVW, 2017. Video similarity calculation disregards spatio-temporal information of videos
  • 4. Frame-level methods Y. Jiang and J. Wang. “Partial copy detection in videos: A benchmark and an evaluation of popular methods”. Tran. on Big Data, 2016. L. Baraldi et al. “LAMV: Learning to align and match videos with kernelized temporal layers”. CVPR, 2018. Frame-to-frame similarity calculation disregards the spatial structure of frames
  • 5. Motivation Fine-grained similarity calculation • Learn a video similarity function that respects: • Spatial structure of video frames (intra-frame relations) • Temporal structure of videos (inter-frame relations)
  • 7. Frame-to-frame similarity Baseline frame-to-frame similarity matrix ViSiL frame-to-frame similarity matrix
  • 8. Video-to-video similarity Video Similarity Learning network • 4-layer CNN • Captures the temporal structures on similarity matrix with the convolutional filters Chamfer Similarity
  • 10. Experimental results Near-Duplicate Video Retrieval (CC_WEB_VIDEO) Fine-grained Incident Video Retrieval (FIVR-200K) Action Video Retrieval (ActivityNet) Event-based Video Retrieval (EVVE)
  • 11. Visual examples query video database video frame-to-frame similarity matrix ViSiL output video-to-video similarity 0.8 0.5 0.7 near-duplicate videos same event videos same action videos
  • 12. Thank you! Poster ID: No. 39 Code & models: https://github.com/MKLab-ITI/visil With the support of: Get in touch: Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr / @g_kordo No. EP/R026424/1No. 825297