Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Visual Search for Musical Performances and Endoscopic Videos
1. VISUAL SEARCH
FOR MUSICAL PERFORMANCES
AND ENDOSCOPIC VIDEOS
Degree’s Final Project Dissertation
Telecommunications Engineering
Jennifer Roldán
Supervisors:
Assoc. Prof. Mathias Lux
Assoc. Prof. Xavier Giró
2. Outline of the Thesis
1. Introduction
i. Motivation
ii. Gantt chart. Work Plan
2. Overview. Existing Demo-Application
3. Methods
i. Global features using Late Fusion Methods
ii. Local features: SIMPLE descriptor
4. Data sets
i. Musical Performances
ii. Endoscopic Videos
5. Experiments
i. Quantitative evaluation
ii. Qualitative evaluation. Thinking-aloud test
6. Conclusions and Further Work
Sep 2014 – May 2015
Slide 2
3. Motivation
• Application for covering the surgeons’ needs and
automatize data processing
• Endoscopic videos (confidential data)
• Focus of the project
• Video retrieval on demand for surgeons
• Musical performances (free data set)
• Reproducible results for evaluation
• Quantitative and qualitative studies
Slide 3
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
4. Gantt Chart. Work Plan
Slide 4
Use of existing tools and
define the Thesis’s statements
Experiments with
endoscopic videos
Two papers submitted
in 13th CBMI Congress
Project development
with Jiku Mobile data set
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
5. Existing Demo Application
Slide 5
Fig. All results are presented in HTML 5 and can be viewed in a
recent version of common browsers.
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
6. Existing Demo Application
Slide 6
Publicated at ACM Mutimedia Open Source Competition [1]
• Open source library for CBIR
• Based on Lucene
• Java text retrieval framework
• Indexing and Search
• Supporting Global and Local features
(Integrate until 20 descriptors)
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
[1] Mathias Lux. LIRE: Open source image retrieval in java. In Proceedings of the 21st ACM international conference on Multimedia, pages 843{846.ACM, 2013.
7. Methodology
Slide 7
1. Previous methods in demo application:
• Global Features
i. CEDD. Color and Edge Directivity Descriptor
ii. Color Histogram.
iii.PHOG. Pyramid Histogram of Oriented Gradients
• Late Fusion Methods
2. Extend the methods to local features for retrieval
• Use an existing tool to study better results
• SIMPLE descriptor
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
8. Method 1
Global features using Late Fusion
Feature extraction and indexing Similarity measure Fusion
Fig. System Architecture
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Slide 8
9. Method 1
Global features using Late Fusion
Feature extraction and indexing Similarity measure Fusion
Global descriptors for each IRM:
1. CEDD
2. Color Histogram
3. PHOG
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Slide 9
10. Method 1
Global features using Late Fusion
Feature extraction and indexing Similarity measure Fusion
Normalization:
• Two different approaches
• N limited images:
1. rank: 𝑅 𝐾 n =
N+1−Rk n
N
2. score: 𝑅 𝐾 n =
Rk n −min(RK)
max Rk −min(𝑅 𝑘)
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Slide 10
11. Method 1
Global features using Late Fusion
Feature extraction and indexing Similarity measure Fusion
Fusion Methods:
a. Sum:
𝑅𝑡 n =
𝑘
𝑅 𝑘 𝑛 = 𝑅1 𝑛 + 𝑅2 𝑛 + ⋯ + 𝑅 𝐾 𝑛
b. Sum with combMNZ:
sum x number of IRM returned by image n
Final Ranked Lists:
1. Sum (ranks)
2. Sum (scores)
3. Sum with comMNZ (ranks)
4. Sum with comMNZ (scores)
4
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Slide 11
12. “Searching Images with MPEG-7 (& MPEG-7-like)
Powered Localized dEscriptors (SIMPLE)” [2]
SURF detector + CEDD descriptor
• Extraction of global features as local ones (image key points)
• Codebook of 512 VW using Bag-Of-Visual-Words (BOVW) model
• K-means clustering algorithm with vocabulary of 512 words.
Method 2
Local features. SIMPLE descriptor
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Slide 12
[2] Chryssanthi Iakovidou, Nektarios Anagnostopoulos, Athanasios Ch Kapoutsis, Yiannis Boutalis, and Savvas A Chatzichristos. Searching images
with MPEG-7 (& mpeg-7-like) powered localized descriptors: the SIMPLE answer to effective content based image retrieval. In 12th International
Workshop on Content-Based Multimedia Indexing (CBMI), pages 1-6. IEEE, 2014.
13. Data sets
Video Retrieval for two different cases
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Slide 13
1
2
Musical Performances
Endoscopic Videos
14. Musical Performances
Freely available data set. It
allows us to compare results
Jiku Mobile data set
• 473 video clips
• Mobile devices
• Multiple users
• 5 events and several performances
Test
• 356 videos randomly selected
• Based on 1 frame per second
• 412 query images
Fig. Query images event domain
Slide 14
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
1
15. Fig. Query images medical domain
Endoscopic Videos2
Confidential and anonymized
data
Live video stream data set
• Surgeons’ recordings in HQ
• Inside of their subjects
• 33 hours roughly covered
• 54 laparoscopy procedures
Test
• 1,276 videos randomly selected
• Based on 5 frame per second
• 600 query images
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Slide 15
16. Experiments
Video Retrieval tested by two different evaluations
Slide 16
1
2
Quantitative evaluation
Qualitative evaluation
(Thinking-aloud Test)
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
17. Evaluation Social Study, at AAU
Quantitative study:
• To find the position of the video where the query image belongs
• Results Global Features
• Results Local Features
Qualitative study. Thinking-aloud Test
• Interface semi-interactive web-page
• Participants are researchers and non-researchers within the
CODE-MM Project
• 6 Volunteers for Musical Performances Test
• 2 Volunteers for Endoscopic Videos Test
Slide 17
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
1
2
18. Evaluation Social Study, at AAU
Thinking-aloud Test
• Interface semi-interactive web-page blindly labeled with 3 Search
Engines (A, B, C)
i. sum of ranks method and global features Search Engine A
ii. sum of scores method and global features Search Engine B
iii. SIMPLE (SURF detector + CEDD descriptor) Search Engine C
• Participants must show their thoughts in loud-voice
• Sessions are recorded
Slide 18
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
19. Evaluation Social Study, at AAU
Thinking-aloud Test
Slide 19
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Fig. Screenshots of the different movements of the first volunteer
Fig. Screenshot from the thinking aloud test
Fig. Interface for the thinking aloud test
20. Experiments
Video Retrieval tested by two different evaluations
Slide 20
1
2
Musical Performances
Endoscopic Videos
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
21. Table I. Results of the tests on where that actual video can be found in the results. The first four
columns give the four different tested feature fusion approaches, the fifth one gives the results
on the use of the SIMPLE-CEDD descriptors
Benchmarking based on the 412 set of queries:
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Musical Performances
Quantitative Evaluation
Slide 21
Source video of the query image ranked in the first position of the result list
• Global features: 96,6% of the queries
• Local features: 91,5% of the queries
22. Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Musical Performances
Qualitative Evaluation
Fig. Most used query images in the user test (left to right)
Global features ( A, B )
• Search Model: Abstract
exploratory
• Different sub-events, same
view point
Local features ( C )
• Search Model: Semantically
similar content
• Same performance, different
viewpoints
• Good results in earlier video’s
position
Overall impression
Slide 22
23. Global Features using Late Fusion SIMPLE: SURF detector + CEDD
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Musical Performances
Qualitative Evaluation
Slide 23
25. Benchmarking based on the 600 set of queries:
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Endoscopic Videos
Quantitative Evaluation
Table II. Results of the tests on where that actual video can be found in the results. The first four
columns give the four different tested feature fusion approaches, the fifth one gives the results on
the use of the SIMPLE-CEDD descriptors
Slide 25
Source video of the query image ranked in the first position of the result list
• Global features: 78.3% of the queries
• Local features: 79,8% of the queries
26. Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Endoscopic Videos
Qualitative Evaluation
Global features ( A, B )
• Search Model: Abstract
exploratory
• Relevant shots in the top results
(semantically dissimilar)
Local features ( C )
• Search Model: Semantically
similar content
• Same movements in surgeries
• Good results for finding the
query’s video source
Overall impression
Fig. Shots (photos) manually created from the surgeon in the course of
the procedure.
Slide 26
27. Qualitative Evaluation
Fig. Screenshots of the result presentation showing the three top videos and the query image. All results
are presented in HTML5 and can be viewed in recent browsers supporting HTML5 videos and JavaScript.
Best matching frames are indicated by triangles in the red and grey time line below the video player.
SIMPLE: SURF detector + CEDD descriptor
Slide 27
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
28. Conclusions and Further Work
An existing tool is adapted and extended
for content-based video retrieval
Slide 28
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
Global features
Exploratory
search mode
Local features
Semantically
similar content
Further work:
• ad-hoc search within surgery procedures.
• faster indexing strategies
• fusion of local and global features.
• different implementation of SIMPLE descriptor (Random
Detector + modified-CEDD descriptor).
29. Appendix
Slide 29
Introduction · Overview · Methods · Data sets · Experiments · Conclusions
[3] Roldan-Carlos J, Lux M, Giró-i-Nieto X, Muñoz-Trallero P, Anagnostopoulos N. Event Video Retrieval using Global and Local Descriptors in Visual Domain.
In: IEEE/ACM International Workshop on Content-Based Multimedia Indexing - CBMI 2015 .
[4] Roldan-Carlos J, Lux M, Giró-i-Nieto X, Muñoz-Trallero P, Anagnostopoulos N. Visual Information Retrieval in Endoscopic Video Archives. In: IEEE/ACM
International Workshop on Content-Based Multimedia Indexing - CBMI 2015 . Prague, Czech Republic: In Presshttp://arxiv.org/abs/1504.07874
Two papers were presented in the Special Session on Medical Multimedia
Processing [3] [4] (acceptance rate for special sessions= 55%)
30. Thank you for your attention
Do you have any question?
7 May 2015
Visual Search for
Musical Performances
and Endoscopic Videos
Jennifer Roldán
Hinweis der Redaktion
HOW IT WORKS:
It represents a challenge to find the particular scenes within the videos recorded during the procedure in the daily work of surgeons.
re-find easily these shots within video streams by visual queries and,
The similar frames are shown in the time line where the images were taken.
summarize the video content of the surgeries