Visual Search for Musical Performances and Endoscopic Videos

VISUAL SEARCH
FOR MUSICAL PERFORMANCES
AND ENDOSCOPIC VIDEOS
Degree’s Final Project Dissertation
Telecommunications Engineering
Jennifer Roldán
Supervisors:
Assoc. Prof. Mathias Lux
Assoc. Prof. Xavier Giró

Outline of the Thesis
1. Introduction
i. Motivation
ii. Gantt chart. Work Plan
2. Overview. Existing Demo-Application
3. Methods
i. Global features using Late Fusion Methods
ii. Local features: SIMPLE descriptor
4. Data sets
i. Musical Performances
ii. Endoscopic Videos
5. Experiments
i. Quantitative evaluation
ii. Qualitative evaluation. Thinking-aloud test
6. Conclusions and Further Work
Sep 2014 – May 2015
Slide 2

Motivation
• Application for covering the surgeons’ needs and
automatize data processing
• Endoscopic videos (confidential data)
• Focus of the project
• Video retrieval on demand for surgeons
• Musical performances (free data set)
• Reproducible results for evaluation
• Quantitative and qualitative studies
Slide 3
Introduction · Overview · Methods · Data sets · Experiments · Conclusions

Gantt Chart. Work Plan
Slide 4
Use of existing tools and
define the Thesis’s statements
Experiments with
endoscopic videos
Two papers submitted
in 13th CBMI Congress
Project development
with Jiku Mobile data set

Existing Demo Application
Slide 5
Fig. All results are presented in HTML 5 and can be viewed in a
recent version of common browsers.

Existing Demo Application
Slide 6
Publicated at ACM Mutimedia Open Source Competition [1]
• Open source library for CBIR
• Based on Lucene
• Java text retrieval framework
• Indexing and Search
• Supporting Global and Local features
(Integrate until 20 descriptors)
[1] Mathias Lux. LIRE: Open source image retrieval in java. In Proceedings of the 21st ACM international conference on Multimedia, pages 843{846.ACM, 2013.

Methodology
Slide 7
1. Previous methods in demo application:
• Global Features
i. CEDD. Color and Edge Directivity Descriptor
ii. Color Histogram.
iii.PHOG. Pyramid Histogram of Oriented Gradients
• Late Fusion Methods
2. Extend the methods to local features for retrieval
• Use an existing tool to study better results
• SIMPLE descriptor

Method 1
Global features using Late Fusion
Feature extraction and indexing Similarity measure Fusion
Fig. System Architecture
Slide 8

Method 1
Global descriptors for each IRM:
1. CEDD
2. Color Histogram
3. PHOG
Slide 9

Method 1
Normalization:
• Two different approaches
• N limited images:
1. rank: 𝑅 𝐾 n =
N+1−Rk n
N
2. score: 𝑅 𝐾 n =
Rk n −min(RK)
max Rk −min(𝑅 𝑘)
Slide 10

Method 1
Fusion Methods:
a. Sum:
𝑅𝑡 n =
𝑘
𝑅 𝑘 𝑛 = 𝑅1 𝑛 + 𝑅2 𝑛 + ⋯ + 𝑅 𝐾 𝑛
b. Sum with combMNZ:
sum x number of IRM returned by image n
Final Ranked Lists:
1. Sum (ranks)
2. Sum (scores)
3. Sum with comMNZ (ranks)
4. Sum with comMNZ (scores)
4
Slide 11

“Searching Images with MPEG-7 (& MPEG-7-like)
Powered Localized dEscriptors (SIMPLE)” [2]
SURF detector + CEDD descriptor
• Extraction of global features as local ones (image key points)
• Codebook of 512 VW using Bag-Of-Visual-Words (BOVW) model
• K-means clustering algorithm with vocabulary of 512 words.
Method 2
Local features. SIMPLE descriptor
Slide 12
[2] Chryssanthi Iakovidou, Nektarios Anagnostopoulos, Athanasios Ch Kapoutsis, Yiannis Boutalis, and Savvas A Chatzichristos. Searching images
with MPEG-7 (& mpeg-7-like) powered localized descriptors: the SIMPLE answer to effective content based image retrieval. In 12th International
Workshop on Content-Based Multimedia Indexing (CBMI), pages 1-6. IEEE, 2014.

Data sets
Video Retrieval for two different cases
Slide 13
1
2
Musical Performances
Endoscopic Videos

 Freely available data set. It
allows us to compare results
 Jiku Mobile data set
• 473 video clips
• Mobile devices
• Multiple users
• 5 events and several performances
 Test
• 356 videos randomly selected
• Based on 1 frame per second
• 412 query images
Fig. Query images event domain
Slide 14
1

Fig. Query images medical domain
Endoscopic Videos2
 Confidential and anonymized
data
 Live video stream data set
• Surgeons’ recordings in HQ
• Inside of their subjects
• 33 hours roughly covered
• 54 laparoscopy procedures
 Test
• 1,276 videos randomly selected
• Based on 5 frame per second
• 600 query images
Slide 15

Experiments
Video Retrieval tested by two different evaluations
Slide 16
1
2
Quantitative evaluation
Qualitative evaluation
(Thinking-aloud Test)

Evaluation Social Study, at AAU
Quantitative study:
• To find the position of the video where the query image belongs
• Results Global Features
• Results Local Features
Qualitative study. Thinking-aloud Test
• Interface semi-interactive web-page
• Participants are researchers and non-researchers within the
CODE-MM Project
• 6 Volunteers for Musical Performances Test
• 2 Volunteers for Endoscopic Videos Test
Slide 17
1
2

Thinking-aloud Test
• Interface semi-interactive web-page blindly labeled with 3 Search
Engines (A, B, C)
i. sum of ranks method and global features  Search Engine A
ii. sum of scores method and global features  Search Engine B
iii. SIMPLE (SURF detector + CEDD descriptor)  Search Engine C
• Participants must show their thoughts in loud-voice
• Sessions are recorded
Slide 18

Thinking-aloud Test
Slide 19
Fig. Screenshots of the different movements of the first volunteer
Fig. Screenshot from the thinking aloud test
Fig. Interface for the thinking aloud test

Experiments
Slide 20
1
2
Endoscopic Videos

Table I. Results of the tests on where that actual video can be found in the results. The first four
columns give the four different tested feature fusion approaches, the fifth one gives the results
on the use of the SIMPLE-CEDD descriptors
Benchmarking based on the 412 set of queries:
Quantitative Evaluation
Slide 21
Source video of the query image ranked in the first position of the result list
• Global features: 96,6% of the queries
• Local features: 91,5% of the queries

Qualitative Evaluation
Fig. Most used query images in the user test (left to right)
Global features ( A, B )
• Search Model: Abstract
exploratory
• Different sub-events, same
view point
Local features ( C )
• Search Model: Semantically
similar content
• Same performance, different
viewpoints
• Good results in earlier video’s
position
Overall impression
Slide 22

Global Features using Late Fusion SIMPLE: SURF detector + CEDD
Slide 23

Experiments
Slide 24
1
2
Endoscopic Videos

Benchmarking based on the 600 set of queries:
Endoscopic Videos
Quantitative Evaluation
Table II. Results of the tests on where that actual video can be found in the results. The first four
columns give the four different tested feature fusion approaches, the fifth one gives the results on
the use of the SIMPLE-CEDD descriptors
Slide 25
Source video of the query image ranked in the first position of the result list
• Global features: 78.3% of the queries
• Local features: 79,8% of the queries

Endoscopic Videos
Global features ( A, B )
• Search Model: Abstract
exploratory
• Relevant shots in the top results
(semantically dissimilar)
Local features ( C )
• Search Model: Semantically
similar content
• Same movements in surgeries
• Good results for finding the
query’s video source
Overall impression
Fig. Shots (photos) manually created from the surgeon in the course of
the procedure.
Slide 26

Fig. Screenshots of the result presentation showing the three top videos and the query image. All results
are presented in HTML5 and can be viewed in recent browsers supporting HTML5 videos and JavaScript.
Best matching frames are indicated by triangles in the red and grey time line below the video player.
SIMPLE: SURF detector + CEDD descriptor
Slide 27

Conclusions and Further Work
An existing tool is adapted and extended
for content-based video retrieval
Slide 28
Global features
Exploratory
search mode
Local features
Semantically
similar content
Further work:
• ad-hoc search within surgery procedures.
• faster indexing strategies
• fusion of local and global features.
• different implementation of SIMPLE descriptor (Random
Detector + modified-CEDD descriptor).

Appendix
Slide 29
[3] Roldan-Carlos J, Lux M, Giró-i-Nieto X, Muñoz-Trallero P, Anagnostopoulos N. Event Video Retrieval using Global and Local Descriptors in Visual Domain.
In: IEEE/ACM International Workshop on Content-Based Multimedia Indexing - CBMI 2015 .
[4] Roldan-Carlos J, Lux M, Giró-i-Nieto X, Muñoz-Trallero P, Anagnostopoulos N. Visual Information Retrieval in Endoscopic Video Archives. In: IEEE/ACM
International Workshop on Content-Based Multimedia Indexing - CBMI 2015 . Prague, Czech Republic: In Presshttp://arxiv.org/abs/1504.07874
Two papers were presented in the Special Session on Medical Multimedia
Processing [3] [4] (acceptance rate for special sessions= 55%)

Thank you for your attention
Do you have any question?
7 May 2015
Visual Search for
and Endoscopic Videos
Jennifer Roldán

Visual Search for Musical Performances and Endoscopic Videos

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie Visual Search for Musical Performances and Endoscopic Videos

Ähnlich wie Visual Search for Musical Performances and Endoscopic Videos (20)

Mehr von Universitat Politècnica de Catalunya

Mehr von Universitat Politècnica de Catalunya (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Visual Search for Musical Performances and Endoscopic Videos

Hinweis der Redaktion