SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Similarity-based retrieval of
multimedia content
Dr. Symeon Papadopoulos
Senior Researcher, CERTH-ITI
Monday Jan 28, 2019 @ Media AUTh
Our lab
Multimedia Knowledge and
Social Media Analytics Laboratory
• Part of Information Technologies Institute (ITI) -
Centre for Research and Technology Hellas (CERTH)
• 60+ researchers (20+ post-docs)
• key areas: multimedia, social media, computer vision,
data mining, machine learning
• applications: media, security, culture, environment
• involved in 60+ projects and published 600+ papers
https://mklab.iti.gr/
Related projects
2018-20212016-2018
https://www.invid-project.eu/ https://weverify.eu/
https://www.smartinsights.com/
internet-marketing-
statistics/happens-online-60-
seconds/
500 hours of video per min =
720,000 hours per day >
82 years of video per day!
Pope Francis
Pope Benedict
2007: iPhone release
2008: Android release
2010: iPad release
http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
Detecting disinformation
Claim:
Hurricane Irma, Sep 2017
Fact:
Hurricane Dolores, May 2016
A shark thriving in hurricanes
https://www.snopes.com/photos/animals/puertorico.asp
Memes
Similarity-based media search
Two main problems
•How to compute similarity between two
items (in accordance with my needs)?
•How to search (using above similarity
function) in very large collections in
reasonable time?
visual similarity
an overview of approaches
What is similar?
• Variety of definitions and understandings regarding what
can be considered to be similar
• Near-duplicate videos: definition by Wu et al. (2007)
• photometric variations: gamma, contrast, brightness, etc.
• editing operations: resize, shift, crop, flip
• insertion of patterns: caption, logo, subtitles, sliding captions, etc.
• re-encoding: video format, compression
• video modifications: frame rate, frame insertion, deletion, swap
X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In
Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
Hashing
• Cryptographic or checksum hashing: MD5, SHA1
• Input: bitstream (not just images or videos)
• Output: hash code 128-bit (MD5), 160-bit (SHA1), etc.
• Property: minor changes in input can lead to completely
different hash codes
https://jenssegers.com/61/perceptual-image-hashes
Example
EA6BF04059B4CB0D
889296F1788B321B
8435D4A072804237
308F9566508C963C
http://onlinemd5.com/
Perceptual hashing
• Generate a fingerprint that can be used to compare
images using the Hamming Distance
• Instance: Average Hashing (aHash)
• Reduce size  8x8 pixels
• Reduce colour  RGB to grayscale
• Calculate average colour  among 64 grayscale values
• Compute hash  for each pixel, binary value depending
on whether it is higher or lower than average
 64-bit signature
aHash: example
11001001011010010011110000011000
00001000000000000000011100111111
https://jenssegers.com/61/perceptual-image-hashes
dHash and pHash
• dHash: Difference Hash
• same steps as aHash
• hash is generated based on whether left pixel is brighter
than the right one
• less false positives compared to aHash
• pHash: Perceptual Hash
• more complicated algorithm
• resize to 32x32
• DCT on luma (brightness) component
• top left 8x8  hash by comparing to median value
pHash examples
Hamming distance = 0
Hamming distance = 24
Hamming distance = 29
Hamming distance = 27
https://www.phash.org/demo/ (select DCT hash)
Pixel-based similarity doesn’t
match perception
All three variations of the first image are equidistant
from it in terms of L2 pixel distance!
http://cs231n.github.io/classification/
Global descriptors
• A single vector that attempts to capture the main
visual properties of an image, e.g. distribution of
colour, spatial layout of brightness, textures, etc.
• Popular choices include:
• GIST – spatial envelope (Oliva & Torralba, 2001)
• Color: Dominant Color, Scalable Color, Color Structure,
Color Layout Descriptor (MPEG-7, 2001)
• Texture: Texture Browsing, Homogeneous Texture, Edge
Histogram (MPEG-7, 2001)
A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation
of the spatial envelope. IJCV, 42(3):145–175, 2001
Text of ISO/IEC 15 938-3 Multimedia Content Description Interface—Part 3: Visual.
Final Committee Draft, ISO/IEC/JTC1/SC29/ WG11, Doc. N4062, Mar. 2001
GIST-based near-duplicate search
Douze, M., JĂŠgou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009, July). Evaluation of gist descriptors for web-
scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval (p. 19). ACM.
Local descriptors
• Basic scheme:
• Detect a set of features (i.e. interest points) in an image
• Extract one descriptor around each feature
• Plenty of options for both parts, e.g.:
• Feature detectors: Canny, Sobel, Harris, FAST, Laplacian
of Gaussian (LoG), Difference of Gaussians (DoG),
Determinant of Hessian (DoH), MSER
• Feature descriptors: SIFT, GLOH, SURF, ORB
• Much higher accuracy at the cost of increased
complexity
Scale-Invariant Feature Transforms (SIFT)
Set of descriptors
A single descriptor
(16 histograms of 8 bins 
128 dims)
http://faculty.ucmerced.edu/mhyang/project/iccv13_exemplar/ICCV13_exemplarCut/vlfeat-0.9.14/doc/overview/sift.html
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer
vision, 60(2), 91-110.
Example: SIFT matching
https://www.cc.gatech.edu/~hays/compvision/proj2/
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
extract a set of local features from each image
Bag of Visual Words (BoVW)
• a representative
sample of features
selected
• features are clustered
• cluster centroids (or
medoids) are
considered to be the
visual codebook
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Bag of Visual Words (BoVW)
https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
Indexing and Querying
• tf-idf weighting of visual words
𝑤𝑡𝑑 = 𝑛 𝑡𝑑 ∙ log 𝐷 𝑏 /𝑛 𝑡
• Inverted file indexing structure for fast search
• Retrieve candidates with at least one common
visual word
• Rank candidates, e.g. based on cosine similarity
of their tf-idf representations
𝑠𝑖𝑚 𝑞, 𝑝 =
𝒘 𝒒 ∙ 𝒘 𝒑
𝒘 𝒒 𝒘 𝒑
BoVW Discussion
• BoVW is a sparse representation: each image is
associated with few visual words (compared to the
whole vocabulary)
• Convenient for indexing and look-up
• Completely misses spatial layout  extensions
• Performance depends on:
• size of vocabulary
• dataset where vocabulary was learned
Neural network features
https://www.pnas.org/content/116/4/1074 (artist Lucy Reading-Ikkanda)
Popular CNN architectures
VGGNet (2014)
GoogleNet (2014)
https://cs.stanford.edu/people
/karpathy/cnnembed/
video search
towards building a reverse video
search engine
From Image to Video Similarity
• A video can be considered as a richer
representation compared to images:
• set of images (frames)
• frames and motion
• frames and motion and audio
• For efficiency purposes, we typically simplify or
discard part of the information:
• frames  descriptors  average
• frames  visual words  bag of frame-words
Video search architecture
Video indexing calls
/index (HTTP GET request)
Add the provided video to the video index
• url: the URL of the video that is going to be indexed
• async: flag for asynchronous processing
/youtube (HTTP GET request)
Query YouTube API with either a video ID or a provided text query
and add the retrieved videos to the video index
• video_id: video ID to query YouTube API
• text: provided text to query YouTube API
• max: maximum number of videos to be add to the video index
/delete (HTTP DELETE request)
Delete the provided video from the video index
• url: the URL of the video that is going to be deleted
Video search calls
/search (HTTP GET request)
Video-level search: retrieve relevant video by calculating the
similarity between the entire videos
• url: URL of the query video
• t_sim: similarity threshold
• t_rank: rank threshold
/partial (HTTP GET request)
Shot-level search: retrieve relevant video segments from the indexed
videos in the database
• url: URL of the query video
• v_sim: video similarity threshold
• s_sim: shot similarity threshold
Combining CNNs and BoVW
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by
aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
An improved setup
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by
aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
Learning similarity
Before training
After training
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with
Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE
Support for partial duplicate search
FIVR-200K
a dataset for evaluating NDVR
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018).
FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
FIVR-200K
• A video dataset to help research on the problem of
Fine-grained Incident Video Retrieval
• Duplicate Scene Videos (DSVs)
• Complementary Scene Videos (CSVs)
• Incident Scene Videos (ISVs)
• 225,960 videos around 4,687 news events from Jan
1st 2013 to Dec 31st 2017
Wikipedia: current events
https://en.wikipedia.org/wiki/Portal:Current_events
Dataset statistics
Number of events
Number of videos
Dataset statistics
Video category
Video duration
Dataset statistics
Dataset statistics
Example videos
Boston Marathon bombing
query near-duplicate
complementary view same incident
Las Vegas shootings
query near-duplicate
complementary view same incident
Our Video Search Tool
http://ndd.iti.gr/video_search/
Ideas
• Pick one video around one event between 2013
and 2017 and try to find similar versions of it
• Pick one of the clusters-events in the Browse
section and try to find some important videos that
cover the event
• Given an event of interest, identify in which sources
it is covered (language, country, type of channel)
• Add videos from a newer event and use them to
perform new searches
Source code
https://github.com/MKLab-ITI/intermediate-cnn-features
https://github.com/MKLab-ITI/ndvr-dml
Papers
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y.
(2017, January). Near-duplicate video retrieval by aggregating
intermediate CNN layers. In International Conference on Multimedia
Modeling (pp. 251-263). Springer
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y.
(2017, October). Near-Duplicate Video Retrieval with Deep Metric
Learning. In 2017 IEEE International Conference on Computer Vision
Workshop (ICCVW), (pp. 347-356). IEEE
• Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I.
(2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint
arXiv:1809.04094
Acknowledgements
• Giorgos Kordopatis-Zilos / near-duplicate video
retrieval, back-end development, FIVR-200K
collection and annotation
• Lazaros Apostolidis / web front-end development
• Polichronis Charitidis / FIVR-200K annotation
Thank you for your attention!
Akis Papadopoulos papadop@iti.gr
@sympap

Weitere ähnliche Inhalte

Was ist angesagt?

Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronMostafa G. M. Mostafa
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Unit3 dip
Unit3 dipUnit3 dip
Unit3 dipImran Khan
 
filters for noise in image processing
filters for noise in image processingfilters for noise in image processing
filters for noise in image processingSardar Alam
 
Google net
Google netGoogle net
Google netBrian Kim
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningYan Xu
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Mostafa G. M. Mostafa
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
image compression ppt
image compression pptimage compression ppt
image compression pptShivangi Saxena
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer LearningHichem Felouat
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Image Processing with OpenCV
Image Processing with OpenCVImage Processing with OpenCV
Image Processing with OpenCVdebayanin
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 

Was ist angesagt? (20)

Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Unit3 dip
Unit3 dipUnit3 dip
Unit3 dip
 
filters for noise in image processing
filters for noise in image processingfilters for noise in image processing
filters for noise in image processing
 
Google net
Google netGoogle net
Google net
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
Edge detection
Edge detectionEdge detection
Edge detection
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
IMAGE SEGMENTATION.
IMAGE SEGMENTATION.IMAGE SEGMENTATION.
IMAGE SEGMENTATION.
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
image compression ppt
image compression pptimage compression ppt
image compression ppt
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Image Processing with OpenCV
Image Processing with OpenCVImage Processing with OpenCV
Image Processing with OpenCV
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 

Ähnlich wie Similarity-based retrieval of multimedia content

A04840107
A04840107A04840107
A04840107IOSR-JEN
 
Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)LinkedTV
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingAlpen-Adria-Universität
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfVignesh V Menon
 
Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018InVID Project
 
Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches ijsc
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...SWAMI06
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...webhostingguy
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfHow to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfPubrica
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoAlpen-Adria-Universität
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...IRJET Journal
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
 
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxHow to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxPubrica
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionSuhas Pillai
 

Ähnlich wie Similarity-based retrieval of multimedia content (20)

A04840107
A04840107A04840107
A04840107
 
Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)Video Hyperlinking Tutorial (Part B)
Video Hyperlinking Tutorial (Part B)
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive Streaming
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018Presentation of the InVID verification technologies at IPTC 2018
Presentation of the InVID verification technologies at IPTC 2018
 
Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches Video Compression Algorithm Based on Frame Difference Approaches
Video Compression Algorithm Based on Frame Difference Approaches
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
 
Guru_poster
Guru_posterGuru_poster
Guru_poster
 
Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
50120130404055
5012013040405550120130404055
50120130404055
 
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdfHow to prepare a perfect video abstract for your research paper – Pubrica.pdf
How to prepare a perfect video abstract for your research paper – Pubrica.pdf
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional Video
 
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...An Stepped Forward Security System for Multimedia Content Material for Cloud ...
An Stepped Forward Security System for Multimedia Content Material for Cloud ...
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
Query clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrievalQuery clip genre recognition using tree pruning technique for video retrieval
Query clip genre recognition using tree pruning technique for video retrieval
 
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptxHow to prepare a perfect video abstract for your research paper – Pubrica.pptx
How to prepare a perfect video abstract for your research paper – Pubrica.pptx
 
What’s new in MPEG?
What’s new in MPEG?What’s new in MPEG?
What’s new in MPEG?
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity Recognition
 

Mehr von Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Symeon Papadopoulos
 

Mehr von Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015
 

KĂźrzlich hochgeladen

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

KĂźrzlich hochgeladen (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Similarity-based retrieval of multimedia content

  • 1. Similarity-based retrieval of multimedia content Dr. Symeon Papadopoulos Senior Researcher, CERTH-ITI Monday Jan 28, 2019 @ Media AUTh
  • 2. Our lab Multimedia Knowledge and Social Media Analytics Laboratory • Part of Information Technologies Institute (ITI) - Centre for Research and Technology Hellas (CERTH) • 60+ researchers (20+ post-docs) • key areas: multimedia, social media, computer vision, data mining, machine learning • applications: media, security, culture, environment • involved in 60+ projects and published 600+ papers https://mklab.iti.gr/
  • 5. 500 hours of video per min = 720,000 hours per day > 82 years of video per day!
  • 6. Pope Francis Pope Benedict 2007: iPhone release 2008: Android release 2010: iPad release http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
  • 7.
  • 8.
  • 9. Detecting disinformation Claim: Hurricane Irma, Sep 2017 Fact: Hurricane Dolores, May 2016
  • 10. A shark thriving in hurricanes https://www.snopes.com/photos/animals/puertorico.asp
  • 11.
  • 12. Memes
  • 13. Similarity-based media search Two main problems •How to compute similarity between two items (in accordance with my needs)? •How to search (using above similarity function) in very large collections in reasonable time?
  • 15. What is similar? • Variety of definitions and understandings regarding what can be considered to be similar • Near-duplicate videos: definition by Wu et al. (2007) • photometric variations: gamma, contrast, brightness, etc. • editing operations: resize, shift, crop, flip • insertion of patterns: caption, logo, subtitles, sliding captions, etc. • re-encoding: video format, compression • video modifications: frame rate, frame insertion, deletion, swap X. Wu, A. G. Hauptmann, and C. W. Ngo. Practical elimination of near-duplicates from web video search. In Proceedings of the 15th ACM international conference on Multimedia, pp. 218-227, 2007
  • 16. Hashing • Cryptographic or checksum hashing: MD5, SHA1 • Input: bitstream (not just images or videos) • Output: hash code 128-bit (MD5), 160-bit (SHA1), etc. • Property: minor changes in input can lead to completely different hash codes https://jenssegers.com/61/perceptual-image-hashes
  • 18. Perceptual hashing • Generate a fingerprint that can be used to compare images using the Hamming Distance • Instance: Average Hashing (aHash) • Reduce size  8x8 pixels • Reduce colour  RGB to grayscale • Calculate average colour  among 64 grayscale values • Compute hash  for each pixel, binary value depending on whether it is higher or lower than average  64-bit signature
  • 20. dHash and pHash • dHash: Difference Hash • same steps as aHash • hash is generated based on whether left pixel is brighter than the right one • less false positives compared to aHash • pHash: Perceptual Hash • more complicated algorithm • resize to 32x32 • DCT on luma (brightness) component • top left 8x8  hash by comparing to median value
  • 21. pHash examples Hamming distance = 0 Hamming distance = 24 Hamming distance = 29 Hamming distance = 27 https://www.phash.org/demo/ (select DCT hash)
  • 22. Pixel-based similarity doesn’t match perception All three variations of the first image are equidistant from it in terms of L2 pixel distance! http://cs231n.github.io/classification/
  • 23. Global descriptors • A single vector that attempts to capture the main visual properties of an image, e.g. distribution of colour, spatial layout of brightness, textures, etc. • Popular choices include: • GIST – spatial envelope (Oliva & Torralba, 2001) • Color: Dominant Color, Scalable Color, Color Structure, Color Layout Descriptor (MPEG-7, 2001) • Texture: Texture Browsing, Homogeneous Texture, Edge Histogram (MPEG-7, 2001) A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42(3):145–175, 2001 Text of ISO/IEC 15 938-3 Multimedia Content Description Interface—Part 3: Visual. Final Committee Draft, ISO/IEC/JTC1/SC29/ WG11, Doc. N4062, Mar. 2001
  • 24. GIST-based near-duplicate search Douze, M., JĂŠgou, H., Sandhawalia, H., Amsaleg, L., & Schmid, C. (2009, July). Evaluation of gist descriptors for web- scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval (p. 19). ACM.
  • 25. Local descriptors • Basic scheme: • Detect a set of features (i.e. interest points) in an image • Extract one descriptor around each feature • Plenty of options for both parts, e.g.: • Feature detectors: Canny, Sobel, Harris, FAST, Laplacian of Gaussian (LoG), Difference of Gaussians (DoG), Determinant of Hessian (DoH), MSER • Feature descriptors: SIFT, GLOH, SURF, ORB • Much higher accuracy at the cost of increased complexity
  • 26. Scale-Invariant Feature Transforms (SIFT) Set of descriptors A single descriptor (16 histograms of 8 bins  128 dims) http://faculty.ucmerced.edu/mhyang/project/iccv13_exemplar/ICCV13_exemplarCut/vlfeat-0.9.14/doc/overview/sift.html Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.
  • 28. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 29. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb extract a set of local features from each image
  • 30. Bag of Visual Words (BoVW) • a representative sample of features selected • features are clustered • cluster centroids (or medoids) are considered to be the visual codebook https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 31. Bag of Visual Words (BoVW) https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb
  • 32. Indexing and Querying • tf-idf weighting of visual words 𝑤𝑡𝑑 = 𝑛 𝑡𝑑 ∙ log 𝐷 𝑏 /𝑛 𝑡 • Inverted file indexing structure for fast search • Retrieve candidates with at least one common visual word • Rank candidates, e.g. based on cosine similarity of their tf-idf representations 𝑠𝑖𝑚 𝑞, 𝑝 = 𝒘 𝒒 ∙ 𝒘 𝒑 𝒘 𝒒 𝒘 𝒑
  • 33. BoVW Discussion • BoVW is a sparse representation: each image is associated with few visual words (compared to the whole vocabulary) • Convenient for indexing and look-up • Completely misses spatial layout  extensions • Performance depends on: • size of vocabulary • dataset where vocabulary was learned
  • 35. Popular CNN architectures VGGNet (2014) GoogleNet (2014)
  • 37. video search towards building a reverse video search engine
  • 38. From Image to Video Similarity • A video can be considered as a richer representation compared to images: • set of images (frames) • frames and motion • frames and motion and audio • For efficiency purposes, we typically simplify or discard part of the information: • frames  descriptors  average • frames  visual words  bag of frame-words
  • 40. Video indexing calls /index (HTTP GET request) Add the provided video to the video index • url: the URL of the video that is going to be indexed • async: flag for asynchronous processing /youtube (HTTP GET request) Query YouTube API with either a video ID or a provided text query and add the retrieved videos to the video index • video_id: video ID to query YouTube API • text: provided text to query YouTube API • max: maximum number of videos to be add to the video index /delete (HTTP DELETE request) Delete the provided video from the video index • url: the URL of the video that is going to be deleted
  • 41. Video search calls /search (HTTP GET request) Video-level search: retrieve relevant video by calculating the similarity between the entire videos • url: URL of the query video • t_sim: similarity threshold • t_rank: rank threshold /partial (HTTP GET request) Shot-level search: retrieve relevant video segments from the indexed videos in the database • url: URL of the query video • v_sim: video similarity threshold • s_sim: shot similarity threshold
  • 42. Combining CNNs and BoVW Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
  • 43. An improved setup Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer
  • 44. Learning similarity Before training After training Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE
  • 45. Support for partial duplicate search
  • 46. FIVR-200K a dataset for evaluating NDVR Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
  • 47. FIVR-200K • A video dataset to help research on the problem of Fine-grained Incident Video Retrieval • Duplicate Scene Videos (DSVs) • Complementary Scene Videos (CSVs) • Incident Scene Videos (ISVs) • 225,960 videos around 4,687 news events from Jan 1st 2013 to Dec 31st 2017
  • 49. Dataset statistics Number of events Number of videos
  • 54. Boston Marathon bombing query near-duplicate complementary view same incident
  • 55. Las Vegas shootings query near-duplicate complementary view same incident
  • 56. Our Video Search Tool http://ndd.iti.gr/video_search/
  • 57. Ideas • Pick one video around one event between 2013 and 2017 and try to find similar versions of it • Pick one of the clusters-events in the Browse section and try to find some important videos that cover the event • Given an event of interest, identify in which sources it is covered (language, country, type of channel) • Add videos from a newer event and use them to perform new searches
  • 59. Papers • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, January). Near-duplicate video retrieval by aggregating intermediate CNN layers. In International Conference on Multimedia Modeling (pp. 251-263). Springer • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, Y. (2017, October). Near-Duplicate Video Retrieval with Deep Metric Learning. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), (pp. 347-356). IEEE • Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., & Kompatsiaris, I. (2018). FIVR: Fine-grained Incident Video Retrieval. arXiv preprint arXiv:1809.04094
  • 60. Acknowledgements • Giorgos Kordopatis-Zilos / near-duplicate video retrieval, back-end development, FIVR-200K collection and annotation • Lazaros Apostolidis / web front-end development • Polichronis Charitidis / FIVR-200K annotation
  • 61. Thank you for your attention! Akis Papadopoulos papadop@iti.gr @sympap