SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Near-Duplicate Video Retrieval by
Aggregating Intermediate CNN Layers
Giorgos Kordopatis-Zilos1,2, Symeon Papadopoulos1,
Ioannis Patras2 and Yiannis Kompatsiaris1
1Information Technologies Institute, CERTH, Thessaloniki, Greece
2Queen Mary University of London, Mile end Campus, UK, E14NS
23rd International Conference on MultiMedia Modeling
Reykjavík, Iceland, 4-6 January 2017
Problem & Motivation
• Near-Duplicate Video Retrieval (NDVR)
• Given a query video, search a video dataset to retrieve (visually)
highly similar videos
• Rank the candidate videos based on their similarity to the query
• Various applications
• content verification
• video retrieval, management and recommendation
• copyright protection
• Crucial importance of NDVR, due to the exponential growth
of video content
Near-Duplicate Videos: Definition
• Variety of definitions and understandings regarding the
near-duplicate videos
• Adopt definition by Wu et al. (2007)
• photometric variations: gamma, contrast, brightness, etc.
• editing operations: resize, shift, crop, flip
• insertion of patterns: caption, logo, subtitles, sliding captions, etc.
• re-encoding: video format, compression
• video modifications: frame rate, frame insertion, deletion, swap
X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In
Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
Related Work
• Variety of approaches (Liu et al., 2013)
• Video-level matching: comparison of global signatures
• Global feature vectors
• Fingerprints
• Hash codes
• Frame-level matching: frames or sequences
• Local descriptors
• Spatiotemporal features
• Hybrid-level matching
• Filter-and-refine methods
• TRECVID content-based copy detection (Kraaij & Awad, 2011)
• duplicates artificially generated by standard transformations
W. Kraaij, and G. Awad. TRECVID 2011 content-based copy detection: Task overview. Proc. TRECVid 2010, 2011
J. Liu, Z. Huang, H. Cai, H. T. Shen, C. W. Ngo, and W. Wang. Near-duplicate video retrieval: Current research and
future trends. ACM Computing Surveys, vol.45, no. 4, 44, 2013
Feature Extraction (1/2)
• Employ a pre-trained CNN with 𝐿 convolutional layers
• Apply max pooling on every channel of the feature map of
each layer (Zheng et al., 2016)
𝑣 𝑙
𝑖 = max 𝑀 𝑙
(∙,∙, 𝑖) , 𝑖 = 1, 2, … 𝑐 𝑙
, 𝑙 = 1, 2, … 𝐿
• 𝐿 𝑐 𝑙-dimensional vectors generated
L. Zheng, Y. Zhao, S. Wang, J. Wang, and Q. Tian. Good Practice in CNN Feature Transfer. arXiv:1604.00133, 2016
Feature Extraction (2/2)
• Pre-trained CNN networks from Caffe (Jia et al., 2014):
a) AlexNet, b) VGGNet, c) GoogLeNet
• Feature extraction uses the convolution layers of the
architectures
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe:
Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM int. conference on
Multimedia, pp. 675-678, 2014
AlexNet VGGNet GoogLeNet
Vector Aggregation
Vector Aggregation
Vector Aggregation
Vector Aggregation
Layer Aggregation
Video Indexing and Querying
• tf-idf weighting of visual words
𝑤𝑡𝑑 = 𝑛 𝑡𝑑 ∙ log 𝐷 𝑏 /𝑛 𝑡
• Inverted file indexing structure for fast search
• Retrieve candidates with at least one common visual word
• Rank candidates based on cosine similarity of their tf-idf
representations
𝑠𝑖𝑚 𝑞, 𝑝 =
𝒘 𝒒 ∙ 𝒘 𝒑
𝒘 𝒒 𝒘 𝒑
Evaluation: Dataset
• Dataset: CC_WEB_VIDEO
• Videos: 13,139 videos
• Keyframes: 397,965 images
CC_WEB_VIDEO: http://vireo.cs.cityu.edu.hk/webvideo/
Dataset Annotation
• Evaluation metrics
• precision-recall (PR)
• mean Average Precision (mAP)
𝐴𝑃 =
1
𝑛
𝑖=0
𝑛
𝑖
𝑟𝑖
Query video Near-duplicate Videos
Dataset Examples
Results I
Impact of CNN architecture and vocabulary size
Results II
Performance using individual layers
AlexNet VGGNet GoogLeNet
Results III
• Performance per query
• Best runs
• CNN-V: Vector-based aggregation GoogLeNet
• CNN-L: Layer-based aggregation VGGNet
Lower precision in hard
queries
• query 18 (Bus uncle)
• query 22 (Numa Gary)
Evaluation: Comparison to SoA
• Color Histograms (CH) (Wu et al., 2007) - Video-level matching, color histograms
• Auto Color Correlograms (ACC) (Cai et al., 2011) - Frame-level matching, auto-
color correlograms, BoW, tf-idf weighted cosine similarity
• Local Structure (LS) (Wu et al., 2007) - Hybrid-level matching, Color Histograms,
keyframes similarity of PCA-SIFT descriptors
• Multiple Feature Hashing (MFH) (Song et al., 2013) - Video-level matching, hash
multiple features into Hamming space, combination of the keyframe hash code
to a global video representation
• Pattern-based approach (PPT) (Chou et al., 2015) - Hybrid-level matching,
pattern-based indexing tree (PI-tree), m-pattern-based dynamic programming
(mPDP), time-shift m-pattern similarity (TPS)
X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In
Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
Y. Cai, L. Yang, W. Ping, F. Wang, T. Mei, X. S. Hua, and S. Li. Million-scale near-duplicate video retrieval system. In
Proceedings of the 19th ACM international conference on Multimedia, pp. 837-838, 2011
J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo Effective multiple feature hashing for large-scale near-duplicate
video retrieval. In IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1997-2008, 2013
C. L. Chou, H. T. Chen, and S. Y. Lee. Pattern-Based Near-Duplicate Video Retrieval and Localization on Web-Scale
Videos. IEEE Transactions on Multimedia, vol. 17, no. 3, pp. 382-395, 2015
Results IV
Comparison against existing NDVR approaches
Future Work
• Exploit the C3D features (Tran et al., 2015)
• Conduct more comprehensive evaluations
• More challenging datasets: larger scale, more similar but non-
relevant videos (distractors)
• Partial Duplicate Video Retrieval (PDVR)
• Assess the applicability of the approach on the PDVR problem
D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri. Learning spatiotemporal features with 3D convolutional networks.
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497, 2015
Thank you!
Get in touch:
• George Kordopatis-Zilos: georgekordopatis@iti.gr
• Symeon Papadopoulos: papadop@iti.gr / @sympap
With the support of:

Weitere ähnliche Inhalte

Was ist angesagt?

High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage Engines
Pere Urbón-Bayes
 

Was ist angesagt? (20)

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary Defense
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at Scale
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage Engines
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Geospatial data
Geospatial dataGeospatial data
Geospatial data
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 

Andere mochten auch

Andere mochten auch (9)

Behavior design to connect people to nature vf
Behavior design to connect people to nature vfBehavior design to connect people to nature vf
Behavior design to connect people to nature vf
 
Behavior design for better health care access
Behavior design for better health care accessBehavior design for better health care access
Behavior design for better health care access
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
 
Community Detection in Social Media
Community Detection in Social MediaCommunity Detection in Social Media
Community Detection in Social Media
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionality
 

Ähnlich wie Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers

Predicting Engagement in Video Lectures
Predicting Engagement in Video LecturesPredicting Engagement in Video Lectures
Predicting Engagement in Video Lectures
Sahan Bulathwela
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
CSCJournals
 
Large-scale Video Classification with Convolutional Neural Net.docx
Large-scale Video Classification with Convolutional Neural Net.docxLarge-scale Video Classification with Convolutional Neural Net.docx
Large-scale Video Classification with Convolutional Neural Net.docx
croysierkathey
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & Sociology
Neil Chue Hong
 

Ähnlich wie Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers (20)

Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...
Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...
Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...
 
Video + Language 2019
Video + Language 2019Video + Language 2019
Video + Language 2019
 
Video + Language
Video + LanguageVideo + Language
Video + Language
 
Video+Language: From Classification to Description
Video+Language: From Classification to DescriptionVideo+Language: From Classification to Description
Video+Language: From Classification to Description
 
Research and activity report
Research and activity reportResearch and activity report
Research and activity report
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
Predicting Engagement in Video Lectures
Predicting Engagement in Video LecturesPredicting Engagement in Video Lectures
Predicting Engagement in Video Lectures
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
Large-scale Video Classification with Convolutional Neural Net.docx
Large-scale Video Classification with Convolutional Neural Net.docxLarge-scale Video Classification with Convolutional Neural Net.docx
Large-scale Video Classification with Convolutional Neural Net.docx
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
 
Automated Podcasting System for Universities
Automated Podcasting System for UniversitiesAutomated Podcasting System for Universities
Automated Podcasting System for Universities
 
On the Influence Propagation of Web Videos
On the Influence Propagation of Web VideosOn the Influence Propagation of Web Videos
On the Influence Propagation of Web Videos
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & Sociology
 
Sparse representation in image and video copy detection
Sparse representation in image and video copy detectionSparse representation in image and video copy detection
Sparse representation in image and video copy detection
 
Visual Search for Musical Performances and Endoscopic Videos
Visual Search for Musical Performances and Endoscopic VideosVisual Search for Musical Performances and Endoscopic Videos
Visual Search for Musical Performances and Endoscopic Videos
 
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
 

Mehr von Symeon Papadopoulos

Mehr von Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers

  • 1. Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers Giorgos Kordopatis-Zilos1,2, Symeon Papadopoulos1, Ioannis Patras2 and Yiannis Kompatsiaris1 1Information Technologies Institute, CERTH, Thessaloniki, Greece 2Queen Mary University of London, Mile end Campus, UK, E14NS 23rd International Conference on MultiMedia Modeling Reykjavík, Iceland, 4-6 January 2017
  • 2. Problem & Motivation • Near-Duplicate Video Retrieval (NDVR) • Given a query video, search a video dataset to retrieve (visually) highly similar videos • Rank the candidate videos based on their similarity to the query • Various applications • content verification • video retrieval, management and recommendation • copyright protection • Crucial importance of NDVR, due to the exponential growth of video content
  • 3. Near-Duplicate Videos: Definition • Variety of definitions and understandings regarding the near-duplicate videos • Adopt definition by Wu et al. (2007) • photometric variations: gamma, contrast, brightness, etc. • editing operations: resize, shift, crop, flip • insertion of patterns: caption, logo, subtitles, sliding captions, etc. • re-encoding: video format, compression • video modifications: frame rate, frame insertion, deletion, swap X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
  • 4. Related Work • Variety of approaches (Liu et al., 2013) • Video-level matching: comparison of global signatures • Global feature vectors • Fingerprints • Hash codes • Frame-level matching: frames or sequences • Local descriptors • Spatiotemporal features • Hybrid-level matching • Filter-and-refine methods • TRECVID content-based copy detection (Kraaij & Awad, 2011) • duplicates artificially generated by standard transformations W. Kraaij, and G. Awad. TRECVID 2011 content-based copy detection: Task overview. Proc. TRECVid 2010, 2011 J. Liu, Z. Huang, H. Cai, H. T. Shen, C. W. Ngo, and W. Wang. Near-duplicate video retrieval: Current research and future trends. ACM Computing Surveys, vol.45, no. 4, 44, 2013
  • 5. Feature Extraction (1/2) • Employ a pre-trained CNN with 𝐿 convolutional layers • Apply max pooling on every channel of the feature map of each layer (Zheng et al., 2016) 𝑣 𝑙 𝑖 = max 𝑀 𝑙 (∙,∙, 𝑖) , 𝑖 = 1, 2, … 𝑐 𝑙 , 𝑙 = 1, 2, … 𝐿 • 𝐿 𝑐 𝑙-dimensional vectors generated L. Zheng, Y. Zhao, S. Wang, J. Wang, and Q. Tian. Good Practice in CNN Feature Transfer. arXiv:1604.00133, 2016
  • 6. Feature Extraction (2/2) • Pre-trained CNN networks from Caffe (Jia et al., 2014): a) AlexNet, b) VGGNet, c) GoogLeNet • Feature extraction uses the convolution layers of the architectures Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM int. conference on Multimedia, pp. 675-678, 2014 AlexNet VGGNet GoogLeNet
  • 12. Video Indexing and Querying • tf-idf weighting of visual words 𝑤𝑡𝑑 = 𝑛 𝑡𝑑 ∙ log 𝐷 𝑏 /𝑛 𝑡 • Inverted file indexing structure for fast search • Retrieve candidates with at least one common visual word • Rank candidates based on cosine similarity of their tf-idf representations 𝑠𝑖𝑚 𝑞, 𝑝 = 𝒘 𝒒 ∙ 𝒘 𝒑 𝒘 𝒒 𝒘 𝒑
  • 13. Evaluation: Dataset • Dataset: CC_WEB_VIDEO • Videos: 13,139 videos • Keyframes: 397,965 images CC_WEB_VIDEO: http://vireo.cs.cityu.edu.hk/webvideo/ Dataset Annotation • Evaluation metrics • precision-recall (PR) • mean Average Precision (mAP) 𝐴𝑃 = 1 𝑛 𝑖=0 𝑛 𝑖 𝑟𝑖
  • 14. Query video Near-duplicate Videos Dataset Examples
  • 15. Results I Impact of CNN architecture and vocabulary size
  • 16. Results II Performance using individual layers AlexNet VGGNet GoogLeNet
  • 17. Results III • Performance per query • Best runs • CNN-V: Vector-based aggregation GoogLeNet • CNN-L: Layer-based aggregation VGGNet Lower precision in hard queries • query 18 (Bus uncle) • query 22 (Numa Gary)
  • 18. Evaluation: Comparison to SoA • Color Histograms (CH) (Wu et al., 2007) - Video-level matching, color histograms • Auto Color Correlograms (ACC) (Cai et al., 2011) - Frame-level matching, auto- color correlograms, BoW, tf-idf weighted cosine similarity • Local Structure (LS) (Wu et al., 2007) - Hybrid-level matching, Color Histograms, keyframes similarity of PCA-SIFT descriptors • Multiple Feature Hashing (MFH) (Song et al., 2013) - Video-level matching, hash multiple features into Hamming space, combination of the keyframe hash code to a global video representation • Pattern-based approach (PPT) (Chou et al., 2015) - Hybrid-level matching, pattern-based indexing tree (PI-tree), m-pattern-based dynamic programming (mPDP), time-shift m-pattern similarity (TPS) X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007 Y. Cai, L. Yang, W. Ping, F. Wang, T. Mei, X. S. Hua, and S. Li. Million-scale near-duplicate video retrieval system. In Proceedings of the 19th ACM international conference on Multimedia, pp. 837-838, 2011 J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo Effective multiple feature hashing for large-scale near-duplicate video retrieval. In IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1997-2008, 2013 C. L. Chou, H. T. Chen, and S. Y. Lee. Pattern-Based Near-Duplicate Video Retrieval and Localization on Web-Scale Videos. IEEE Transactions on Multimedia, vol. 17, no. 3, pp. 382-395, 2015
  • 19. Results IV Comparison against existing NDVR approaches
  • 20. Future Work • Exploit the C3D features (Tran et al., 2015) • Conduct more comprehensive evaluations • More challenging datasets: larger scale, more similar but non- relevant videos (distractors) • Partial Duplicate Video Retrieval (PDVR) • Assess the applicability of the approach on the PDVR problem D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497, 2015
  • 21. Thank you! Get in touch: • George Kordopatis-Zilos: georgekordopatis@iti.gr • Symeon Papadopoulos: papadop@iti.gr / @sympap With the support of: