SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Fast object re-detection and localization
in video for spatio-temporal fragment
creation
Evlampios Apostolidis, Vasileios Mezaris, Ioannis Kompatsiaris
Information Technologies Institute / Centre for Research and Technology Hellas

ICME MMIX 2013, San Jose, CA, USA, July 2013

Information Technologies Institute
Centre for Research and Technology Hellas

1
Overview
•
•
•
•

Introduction - problem formulation
Related work
Baseline approach
Proposed approach
– GPU-based processing
– Video-structure-based sampling of video frames
– Robustness to scale variations
• Experiments and results
• Conclusions

Information Technologies Institute
Centre for Research and Technology Hellas

2
Introduction – problem formulation
• Object re-detection: a particular case of image matching
• Main goal: find instances of a specific object within a single video or a
collection of videos
– Input: object of interest + video file
– Processing: similarity estimation by means of image matching
– Output: detected instances of the object of interest

Information Technologies Institute
Centre for Research and Technology Hellas

3
Introduction – problem formulation
Extension for interactive and linked TV
• Semi-automatic identification and annotation of object-specific spatiotemporal media fragments
–
–
–
–

Annotate the object of interest
Run the object re-detection algorithm
Get automatically instance-based annotated video fragments
Find related content fragments and establish links between them

Assign a label
to the object of
interest

Instance-based
annotated
video fragment

Links to related
content
Information Technologies Institute
Centre for Research and Technology Hellas

4
Related work
• Extraction and matching of scale- and rotation-invariant local descriptors
is one of the most popular SoA approaches for similarity estimation
between pairs of images
– Local feature extraction
• Edge detectors (e.g. Canny), corner detectors (e.g. Harris-Laplace)

– Local feature description
• SIFT or extensions of it, SURF, BRISK, binary descriptors such as BRIEF, …

– Matching of local descriptors
• k-Nearest Neighbor search between descriptor pairs using brute-force or hashing

– Filtering of erroneous matches
• Symmetry test between the pairs of matched descriptors
• Ratio test regarding the distances of the calculated nearest neighbors
• Geometric verification between the pair of images using RANSAC

– Extensions
• Combined use of keypoints and motion information (tracking)
• Bag-of-Words (BoW) matching for pruning
Information Technologies Institute
Centre for Research and Technology Hellas

5
Proposed approach
• Starting from a baseline approach,
– Improve detection accuracy
– Reduce the needed processing time
• Work directions:
– GPU-based processing
– Video-structure-based sampling of frames
– Enhancing robustness to scale variations

Information Technologies Institute
Centre for Research and Technology Hellas

6
GPU-based processing
Accelerated parts of the overall pipeline:
• Video decompression
into frames
• Keypoint detection and
description
• Brute-Force matching
and 2-NN search
• Drawing of the
calculated bounding
boxes (optional)

Information Technologies Institute
Centre for Research and Technology Hellas

7
Video-structure-based sampling
• Sequential processing of video frames is replaced by a structure-based
one, using the analysis results of a shot segmentation method
Example
Check shot 1
2
No detection!
Detection!
Check the frames
Move to the
of thisshot
next shot
Detect and highlight
the object of interest

Information Technologies Institute
Centre for Research and Technology Hellas

8
Robustness to scale variations
Problem
• Major changes in scale may lead to detection failure due to the significant
limitation of the area that is used for matching
• Zoom-in case: the middle image (b) corresponds to a small upper right
area of the object O in the left one (a)
• Zoom-out case: in the right image (c) the object O occupies a very small
part of the frame
• Both cases lead to a considerable reduction of the number of matched
pairs of descriptors, and thus often to detection failure

a

b

Information Technologies Institute
Centre for Research and Technology Hellas

c
9
Robustness to scale variations
Solution
• we automatically generate a zoomed-out and a centralized zoomed-in
instance of the object O and we utilize them in the matching procedure
• Zoomed-in instance:
– selection of a center-aligned sub-area of the original object O and
enlargement to the actual size of O by applying bilinear interpolation
– 70% of the original image area  140% zoom-in factor

• Zoomed-out instance:
– shrink the original image O into a smaller one using nearest neighbor
interpolation
– the maximum zoom-out factor is determined by the restrictions of the GPUbased implementation of SURF

Information Technologies Institute
Centre for Research and Technology Hellas

10
Experiments and Results
• System specifications
– Intel Core i7 processor at 3.4GHz
– 8GB RAM memory
– CUDA-enabled NVIDIA GeForce GTX560 GPU

• Dataset
– 6 videos* of 273 minutes total duration
– 30 manually selected objects

• Ground-truth (generated via manual annotation)

Examples of sought objects

– 75.632 frames contain at least one of these objects
– 333.455 frames do not include any of the selected objects

* The videos are episodes from the “Antiques Roadshow” of the Dutch public broadcaster AVRO (http://avro.nl/)
Information Technologies Institute
Centre for Research and Technology Hellas

11
Experiments and Results
• Aim: quantify the improvement that each extension of the baseline
approach is responsible for
• Four experimental configurations:
– C1: baseline implementation
– C2: GPU-accelerated implementation,
– C3: GPU-accelerated and video-structure-based sampling
implementation
– C4: complete proposed approach which includes:
 GPU-processing
 video-structure-based sampling
 and robustness to scale variations

Information Technologies Institute
Centre for Research and Technology Hellas

12
Experiments and Results
• Detection accuracy is expressed in terms of Precision, Recall and F-Score
• Evaluation was performed in a per-frame basis, i.e. considering the 30
selected objects and counting the number of frames where these were
correctly detected, missed, etc.
• Time efficiency was evaluated by expressing the processing time of each
configuration as a factor of the actual duration of the processed videos
• Robustness to scale variations was quantified using two specific sets of
frames where the object of interest was observed from:
– a very close viewing position (2.940 frames) and
– a very distant viewing position (4.648 frames)

Information Technologies Institute
Centre for Research and Technology Hellas

13
Experiments and Results
Precision

Recall

F-Score

Processing Time
(x Real-Time)

C1

0.999

0.868

0.929

2.98-5.26

C2

0.999

0.850

0.918

0.35-1.24

C3

0.999

0.849

0.918

0.03-0.13

C4

0.999

0.872

0.931

0.03-0.19

Evaluation results for configurations C1 to C4
Precision

Recall

F-Score

Precision

Recall

F-Score

C1

0.999

0.856

0.922

C1

0.999

0.831

0.907

C2

0.999

0.856

0.922

C2

0.999

0.831

0.907

C3

1.000

0.852

0.920

C3

1.000

0.799

0.888

C4

1.000

0.992

0.996

C4

1.000

0.914

0.955

Evaluation results for highly zoomed-out instances
Information Technologies Institute
Centre for Research and Technology Hellas

Evaluation results for highly zoomed-in instances

14
Experiments and Results
Detection accuracy
• All versions exhibited very good results in terms of detection accuracy
• Version C4 (complete proposed approach) achieved the best results
• The algorithm performed considerably well for a range of different scales
and orientations and for partial visibility or partial occlusion
Processing time
• The video-structure-based sampling
strategy led to a great reduction of the
required processing time
• The algorithm needs about 10% of the
video’s duration, preserving the same
high levels of detection accuracy with
the slower configurations
Online demo available at: http://www.youtube.com/watch?v=0IeVkXRTYu8
Information Technologies Institute
Centre for Research and Technology Hellas

15
Extensions, ideas and plans
• Recent extension: Multiple instances of an object of interest can be used
as input for more efficient re-detection of 3D objects
• Future ideas: test the algorithm’s performance as a tool for chapter
segmentation in videos where the chapters are temporally demarcated by
the presence of a specific object (e.g. a painting in a video about art)
• Future plans: evaluate the extended algorithm’s performance (detection
accuracy and time efficiency) in a new set of videos

Input

Information Technologies Institute
Centre for Research and Technology Hellas

Output

16
Conclusions
• The proposed method can be used for fast and accurate re-detection of
pre-defined objects in videos
• The time performance of the implemented algorithm allows for real-time
processing of multi-media content
• Extended by a prior object labeling step, this technique can be seen as:
– A reliable tool for instance-based annotated, spatio-temporal
fragments in videos
– A key-enabled technology for finding similar content and establishing
links between related media fragments, thus contributing to the
realization of interactive and linked TV

Information Technologies Institute
Centre for Research and Technology Hellas

17
Questions?
More information:
http://www.iti.gr/~bmezaris
bmezaris@iti.gr
Information Technologies Institute
Centre for Research and Technology Hellas

18

Weitere ähnliche Inhalte

Andere mochten auch

LinkedTV project results at the end of year 2
LinkedTV project results at the end of year 2LinkedTV project results at the end of year 2
LinkedTV project results at the end of year 2LinkedTV
 
VideoHypE: An Editor Tool for Supervised Automatic Video Hyperlinking
VideoHypE: An Editor Tool for Supervised Automatic Video HyperlinkingVideoHypE: An Editor Tool for Supervised Automatic Video Hyperlinking
VideoHypE: An Editor Tool for Supervised Automatic Video HyperlinkingLinkedTV
 
LinkedTV: Building the future of television
LinkedTV: Building the future of televisionLinkedTV: Building the future of television
LinkedTV: Building the future of televisionLinkedTV
 
Av Relaties: Zoeken En Contextualisering In Linkedtv En Axes
Av Relaties: Zoeken En Contextualisering In Linkedtv En AxesAv Relaties: Zoeken En Contextualisering In Linkedtv En Axes
Av Relaties: Zoeken En Contextualisering In Linkedtv En AxesLinkedTV
 
Implementation of Hyperlinks in videos with HTML5
Implementation of Hyperlinks in videos with HTML5Implementation of Hyperlinks in videos with HTML5
Implementation of Hyperlinks in videos with HTML5LinkedTV
 
LinkedTV results at the end of the 3rd year
LinkedTV results at the end of the 3rd yearLinkedTV results at the end of the 3rd year
LinkedTV results at the end of the 3rd yearLinkedTV
 
Video Hyperlinking Tutorial (Part A)
Video Hyperlinking Tutorial (Part A)Video Hyperlinking Tutorial (Part A)
Video Hyperlinking Tutorial (Part A)LinkedTV
 

Andere mochten auch (7)

LinkedTV project results at the end of year 2
LinkedTV project results at the end of year 2LinkedTV project results at the end of year 2
LinkedTV project results at the end of year 2
 
VideoHypE: An Editor Tool for Supervised Automatic Video Hyperlinking
VideoHypE: An Editor Tool for Supervised Automatic Video HyperlinkingVideoHypE: An Editor Tool for Supervised Automatic Video Hyperlinking
VideoHypE: An Editor Tool for Supervised Automatic Video Hyperlinking
 
LinkedTV: Building the future of television
LinkedTV: Building the future of televisionLinkedTV: Building the future of television
LinkedTV: Building the future of television
 
Av Relaties: Zoeken En Contextualisering In Linkedtv En Axes
Av Relaties: Zoeken En Contextualisering In Linkedtv En AxesAv Relaties: Zoeken En Contextualisering In Linkedtv En Axes
Av Relaties: Zoeken En Contextualisering In Linkedtv En Axes
 
Implementation of Hyperlinks in videos with HTML5
Implementation of Hyperlinks in videos with HTML5Implementation of Hyperlinks in videos with HTML5
Implementation of Hyperlinks in videos with HTML5
 
LinkedTV results at the end of the 3rd year
LinkedTV results at the end of the 3rd yearLinkedTV results at the end of the 3rd year
LinkedTV results at the end of the 3rd year
 
Video Hyperlinking Tutorial (Part A)
Video Hyperlinking Tutorial (Part A)Video Hyperlinking Tutorial (Part A)
Video Hyperlinking Tutorial (Part A)
 

Ähnlich wie Fast object re-detection and localization in video for spatio-temporal fragment creation

Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...MediaMixerCommunity
 
Re-using Media on the Web tutorial: Media Fragment Creation and Annotation
Re-using Media on the Web tutorial: Media Fragment Creation and AnnotationRe-using Media on the Web tutorial: Media Fragment Creation and Annotation
Re-using Media on the Web tutorial: Media Fragment Creation and AnnotationMediaMixerCommunity
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...SWAMI06
 
Video copy detection using segmentation method and
Video copy detection using segmentation method andVideo copy detection using segmentation method and
Video copy detection using segmentation method andeSAT Publishing House
 
Research Proposal Presentation Pitch
Research Proposal Presentation PitchResearch Proposal Presentation Pitch
Research Proposal Presentation Pitchtchoonyong
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realityNAVER Engineering
 
PhD Thesis Proposal
PhD Thesis Proposal PhD Thesis Proposal
PhD Thesis Proposal Ziqiang Feng
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationIRJET Journal
 
Video stream analysis in clouds an object detection and classification frame...
Video stream analysis in clouds  an object detection and classification frame...Video stream analysis in clouds  an object detection and classification frame...
Video stream analysis in clouds an object detection and classification frame...Finalyearprojects Toall
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video SummarizationVasileiosMezaris
 
FYP-Final-External
FYP-Final-ExternalFYP-Final-External
FYP-Final-ExternalAhmed Rik
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET Journal
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learningpratik pratyay
 

Ähnlich wie Fast object re-detection and localization in video for spatio-temporal fragment creation (20)

Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...Fast object re detection and localization in video for spatio-temporal fragme...
Fast object re detection and localization in video for spatio-temporal fragme...
 
Re-using Media on the Web tutorial: Media Fragment Creation and Annotation
Re-using Media on the Web tutorial: Media Fragment Creation and AnnotationRe-using Media on the Web tutorial: Media Fragment Creation and Annotation
Re-using Media on the Web tutorial: Media Fragment Creation and Annotation
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
 
Video copy detection using segmentation method and
Video copy detection using segmentation method andVideo copy detection using segmentation method and
Video copy detection using segmentation method and
 
Research Proposal Presentation Pitch
Research Proposal Presentation PitchResearch Proposal Presentation Pitch
Research Proposal Presentation Pitch
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
Defense_20140625
Defense_20140625Defense_20140625
Defense_20140625
 
Mini Project- Digital Video Editing
Mini Project- Digital Video EditingMini Project- Digital Video Editing
Mini Project- Digital Video Editing
 
PhD Thesis Proposal
PhD Thesis Proposal PhD Thesis Proposal
PhD Thesis Proposal
 
Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
 
Video stream analysis in clouds an object detection and classification frame...
Video stream analysis in clouds  an object detection and classification frame...Video stream analysis in clouds  an object detection and classification frame...
Video stream analysis in clouds an object detection and classification frame...
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
 
FYP-Final-External
FYP-Final-ExternalFYP-Final-External
FYP-Final-External
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
 
VVC Project.pptx
VVC Project.pptxVVC Project.pptx
VVC Project.pptx
 
VVC Project.pptx
VVC Project.pptxVVC Project.pptx
VVC Project.pptx
 
VVC_PPT.pptx
VVC_PPT.pptxVVC_PPT.pptx
VVC_PPT.pptx
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 

Mehr von LinkedTV

LinkedTV Deliverable 9.1.4 Annual Project Scientific Report (final)
LinkedTV Deliverable 9.1.4 Annual Project Scientific Report (final)LinkedTV Deliverable 9.1.4 Annual Project Scientific Report (final)
LinkedTV Deliverable 9.1.4 Annual Project Scientific Report (final)LinkedTV
 
LinkedTV Deliverable 9.3 Final LinkedTV Project Report
LinkedTV Deliverable 9.3 Final LinkedTV Project ReportLinkedTV Deliverable 9.3 Final LinkedTV Project Report
LinkedTV Deliverable 9.3 Final LinkedTV Project ReportLinkedTV
 
LinkedTV Deliverable 7.7 - Dissemination and Standardisation Report (v3)
LinkedTV Deliverable 7.7 - Dissemination and Standardisation Report (v3)LinkedTV Deliverable 7.7 - Dissemination and Standardisation Report (v3)
LinkedTV Deliverable 7.7 - Dissemination and Standardisation Report (v3)LinkedTV
 
LinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV Scenarios
LinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV ScenariosLinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV Scenarios
LinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV ScenariosLinkedTV
 
LinkedTV Deliverable 5.7 - Validation of the LinkedTV Architecture
LinkedTV Deliverable 5.7 - Validation of the LinkedTV ArchitectureLinkedTV Deliverable 5.7 - Validation of the LinkedTV Architecture
LinkedTV Deliverable 5.7 - Validation of the LinkedTV ArchitectureLinkedTV
 
LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...
LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...
LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...LinkedTV
 
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...LinkedTV
 
LinkedTV Deliverable 2.7 - Final Linked Media Layer and Evaluation
LinkedTV Deliverable 2.7 - Final Linked Media Layer and EvaluationLinkedTV Deliverable 2.7 - Final Linked Media Layer and Evaluation
LinkedTV Deliverable 2.7 - Final Linked Media Layer and EvaluationLinkedTV
 
LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...
LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...
LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...LinkedTV
 
LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...
LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...
LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...LinkedTV
 
LinkedTV - an added value enrichment solution for AV content providers
LinkedTV - an added value enrichment solution for AV content providersLinkedTV - an added value enrichment solution for AV content providers
LinkedTV - an added value enrichment solution for AV content providersLinkedTV
 
LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)
LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)
LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)LinkedTV
 
LinkedTV Newsletter (2015 edition)
LinkedTV Newsletter (2015 edition)LinkedTV Newsletter (2015 edition)
LinkedTV Newsletter (2015 edition)LinkedTV
 
LinkedTV Deliverable D4.6 Contextualisation solution and implementation
LinkedTV Deliverable D4.6 Contextualisation solution and implementationLinkedTV Deliverable D4.6 Contextualisation solution and implementation
LinkedTV Deliverable D4.6 Contextualisation solution and implementationLinkedTV
 
LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)
LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)
LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)LinkedTV
 
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...LinkedTV
 
LinkedTV Deliverable D1.5 The Editor Tool, final release
LinkedTV Deliverable D1.5 The Editor Tool, final release LinkedTV Deliverable D1.5 The Editor Tool, final release
LinkedTV Deliverable D1.5 The Editor Tool, final release LinkedTV
 
LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...
LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...
LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...LinkedTV
 
LinkedTV D8.6 Market and Product Survey for LinkedTV Services and Technology
LinkedTV D8.6 Market and Product Survey for LinkedTV Services and TechnologyLinkedTV D8.6 Market and Product Survey for LinkedTV Services and Technology
LinkedTV D8.6 Market and Product Survey for LinkedTV Services and TechnologyLinkedTV
 
LinkedTV D7.6 Project Demonstrator v2
LinkedTV D7.6 Project Demonstrator v2LinkedTV D7.6 Project Demonstrator v2
LinkedTV D7.6 Project Demonstrator v2LinkedTV
 

Mehr von LinkedTV (20)

LinkedTV Deliverable 9.1.4 Annual Project Scientific Report (final)
LinkedTV Deliverable 9.1.4 Annual Project Scientific Report (final)LinkedTV Deliverable 9.1.4 Annual Project Scientific Report (final)
LinkedTV Deliverable 9.1.4 Annual Project Scientific Report (final)
 
LinkedTV Deliverable 9.3 Final LinkedTV Project Report
LinkedTV Deliverable 9.3 Final LinkedTV Project ReportLinkedTV Deliverable 9.3 Final LinkedTV Project Report
LinkedTV Deliverable 9.3 Final LinkedTV Project Report
 
LinkedTV Deliverable 7.7 - Dissemination and Standardisation Report (v3)
LinkedTV Deliverable 7.7 - Dissemination and Standardisation Report (v3)LinkedTV Deliverable 7.7 - Dissemination and Standardisation Report (v3)
LinkedTV Deliverable 7.7 - Dissemination and Standardisation Report (v3)
 
LinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV Scenarios
LinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV ScenariosLinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV Scenarios
LinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV Scenarios
 
LinkedTV Deliverable 5.7 - Validation of the LinkedTV Architecture
LinkedTV Deliverable 5.7 - Validation of the LinkedTV ArchitectureLinkedTV Deliverable 5.7 - Validation of the LinkedTV Architecture
LinkedTV Deliverable 5.7 - Validation of the LinkedTV Architecture
 
LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...
LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...
LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...
 
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...
 
LinkedTV Deliverable 2.7 - Final Linked Media Layer and Evaluation
LinkedTV Deliverable 2.7 - Final Linked Media Layer and EvaluationLinkedTV Deliverable 2.7 - Final Linked Media Layer and Evaluation
LinkedTV Deliverable 2.7 - Final Linked Media Layer and Evaluation
 
LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...
LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...
LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...
 
LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...
LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...
LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...
 
LinkedTV - an added value enrichment solution for AV content providers
LinkedTV - an added value enrichment solution for AV content providersLinkedTV - an added value enrichment solution for AV content providers
LinkedTV - an added value enrichment solution for AV content providers
 
LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)
LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)
LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)
 
LinkedTV Newsletter (2015 edition)
LinkedTV Newsletter (2015 edition)LinkedTV Newsletter (2015 edition)
LinkedTV Newsletter (2015 edition)
 
LinkedTV Deliverable D4.6 Contextualisation solution and implementation
LinkedTV Deliverable D4.6 Contextualisation solution and implementationLinkedTV Deliverable D4.6 Contextualisation solution and implementation
LinkedTV Deliverable D4.6 Contextualisation solution and implementation
 
LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)
LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)
LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)
 
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...
 
LinkedTV Deliverable D1.5 The Editor Tool, final release
LinkedTV Deliverable D1.5 The Editor Tool, final release LinkedTV Deliverable D1.5 The Editor Tool, final release
LinkedTV Deliverable D1.5 The Editor Tool, final release
 
LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...
LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...
LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...
 
LinkedTV D8.6 Market and Product Survey for LinkedTV Services and Technology
LinkedTV D8.6 Market and Product Survey for LinkedTV Services and TechnologyLinkedTV D8.6 Market and Product Survey for LinkedTV Services and Technology
LinkedTV D8.6 Market and Product Survey for LinkedTV Services and Technology
 
LinkedTV D7.6 Project Demonstrator v2
LinkedTV D7.6 Project Demonstrator v2LinkedTV D7.6 Project Demonstrator v2
LinkedTV D7.6 Project Demonstrator v2
 

Kürzlich hochgeladen

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Kürzlich hochgeladen (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Fast object re-detection and localization in video for spatio-temporal fragment creation

  • 1. Fast object re-detection and localization in video for spatio-temporal fragment creation Evlampios Apostolidis, Vasileios Mezaris, Ioannis Kompatsiaris Information Technologies Institute / Centre for Research and Technology Hellas ICME MMIX 2013, San Jose, CA, USA, July 2013 Information Technologies Institute Centre for Research and Technology Hellas 1
  • 2. Overview • • • • Introduction - problem formulation Related work Baseline approach Proposed approach – GPU-based processing – Video-structure-based sampling of video frames – Robustness to scale variations • Experiments and results • Conclusions Information Technologies Institute Centre for Research and Technology Hellas 2
  • 3. Introduction – problem formulation • Object re-detection: a particular case of image matching • Main goal: find instances of a specific object within a single video or a collection of videos – Input: object of interest + video file – Processing: similarity estimation by means of image matching – Output: detected instances of the object of interest Information Technologies Institute Centre for Research and Technology Hellas 3
  • 4. Introduction – problem formulation Extension for interactive and linked TV • Semi-automatic identification and annotation of object-specific spatiotemporal media fragments – – – – Annotate the object of interest Run the object re-detection algorithm Get automatically instance-based annotated video fragments Find related content fragments and establish links between them Assign a label to the object of interest Instance-based annotated video fragment Links to related content Information Technologies Institute Centre for Research and Technology Hellas 4
  • 5. Related work • Extraction and matching of scale- and rotation-invariant local descriptors is one of the most popular SoA approaches for similarity estimation between pairs of images – Local feature extraction • Edge detectors (e.g. Canny), corner detectors (e.g. Harris-Laplace) – Local feature description • SIFT or extensions of it, SURF, BRISK, binary descriptors such as BRIEF, … – Matching of local descriptors • k-Nearest Neighbor search between descriptor pairs using brute-force or hashing – Filtering of erroneous matches • Symmetry test between the pairs of matched descriptors • Ratio test regarding the distances of the calculated nearest neighbors • Geometric verification between the pair of images using RANSAC – Extensions • Combined use of keypoints and motion information (tracking) • Bag-of-Words (BoW) matching for pruning Information Technologies Institute Centre for Research and Technology Hellas 5
  • 6. Proposed approach • Starting from a baseline approach, – Improve detection accuracy – Reduce the needed processing time • Work directions: – GPU-based processing – Video-structure-based sampling of frames – Enhancing robustness to scale variations Information Technologies Institute Centre for Research and Technology Hellas 6
  • 7. GPU-based processing Accelerated parts of the overall pipeline: • Video decompression into frames • Keypoint detection and description • Brute-Force matching and 2-NN search • Drawing of the calculated bounding boxes (optional) Information Technologies Institute Centre for Research and Technology Hellas 7
  • 8. Video-structure-based sampling • Sequential processing of video frames is replaced by a structure-based one, using the analysis results of a shot segmentation method Example Check shot 1 2 No detection! Detection! Check the frames Move to the of thisshot next shot Detect and highlight the object of interest Information Technologies Institute Centre for Research and Technology Hellas 8
  • 9. Robustness to scale variations Problem • Major changes in scale may lead to detection failure due to the significant limitation of the area that is used for matching • Zoom-in case: the middle image (b) corresponds to a small upper right area of the object O in the left one (a) • Zoom-out case: in the right image (c) the object O occupies a very small part of the frame • Both cases lead to a considerable reduction of the number of matched pairs of descriptors, and thus often to detection failure a b Information Technologies Institute Centre for Research and Technology Hellas c 9
  • 10. Robustness to scale variations Solution • we automatically generate a zoomed-out and a centralized zoomed-in instance of the object O and we utilize them in the matching procedure • Zoomed-in instance: – selection of a center-aligned sub-area of the original object O and enlargement to the actual size of O by applying bilinear interpolation – 70% of the original image area  140% zoom-in factor • Zoomed-out instance: – shrink the original image O into a smaller one using nearest neighbor interpolation – the maximum zoom-out factor is determined by the restrictions of the GPUbased implementation of SURF Information Technologies Institute Centre for Research and Technology Hellas 10
  • 11. Experiments and Results • System specifications – Intel Core i7 processor at 3.4GHz – 8GB RAM memory – CUDA-enabled NVIDIA GeForce GTX560 GPU • Dataset – 6 videos* of 273 minutes total duration – 30 manually selected objects • Ground-truth (generated via manual annotation) Examples of sought objects – 75.632 frames contain at least one of these objects – 333.455 frames do not include any of the selected objects * The videos are episodes from the “Antiques Roadshow” of the Dutch public broadcaster AVRO (http://avro.nl/) Information Technologies Institute Centre for Research and Technology Hellas 11
  • 12. Experiments and Results • Aim: quantify the improvement that each extension of the baseline approach is responsible for • Four experimental configurations: – C1: baseline implementation – C2: GPU-accelerated implementation, – C3: GPU-accelerated and video-structure-based sampling implementation – C4: complete proposed approach which includes:  GPU-processing  video-structure-based sampling  and robustness to scale variations Information Technologies Institute Centre for Research and Technology Hellas 12
  • 13. Experiments and Results • Detection accuracy is expressed in terms of Precision, Recall and F-Score • Evaluation was performed in a per-frame basis, i.e. considering the 30 selected objects and counting the number of frames where these were correctly detected, missed, etc. • Time efficiency was evaluated by expressing the processing time of each configuration as a factor of the actual duration of the processed videos • Robustness to scale variations was quantified using two specific sets of frames where the object of interest was observed from: – a very close viewing position (2.940 frames) and – a very distant viewing position (4.648 frames) Information Technologies Institute Centre for Research and Technology Hellas 13
  • 14. Experiments and Results Precision Recall F-Score Processing Time (x Real-Time) C1 0.999 0.868 0.929 2.98-5.26 C2 0.999 0.850 0.918 0.35-1.24 C3 0.999 0.849 0.918 0.03-0.13 C4 0.999 0.872 0.931 0.03-0.19 Evaluation results for configurations C1 to C4 Precision Recall F-Score Precision Recall F-Score C1 0.999 0.856 0.922 C1 0.999 0.831 0.907 C2 0.999 0.856 0.922 C2 0.999 0.831 0.907 C3 1.000 0.852 0.920 C3 1.000 0.799 0.888 C4 1.000 0.992 0.996 C4 1.000 0.914 0.955 Evaluation results for highly zoomed-out instances Information Technologies Institute Centre for Research and Technology Hellas Evaluation results for highly zoomed-in instances 14
  • 15. Experiments and Results Detection accuracy • All versions exhibited very good results in terms of detection accuracy • Version C4 (complete proposed approach) achieved the best results • The algorithm performed considerably well for a range of different scales and orientations and for partial visibility or partial occlusion Processing time • The video-structure-based sampling strategy led to a great reduction of the required processing time • The algorithm needs about 10% of the video’s duration, preserving the same high levels of detection accuracy with the slower configurations Online demo available at: http://www.youtube.com/watch?v=0IeVkXRTYu8 Information Technologies Institute Centre for Research and Technology Hellas 15
  • 16. Extensions, ideas and plans • Recent extension: Multiple instances of an object of interest can be used as input for more efficient re-detection of 3D objects • Future ideas: test the algorithm’s performance as a tool for chapter segmentation in videos where the chapters are temporally demarcated by the presence of a specific object (e.g. a painting in a video about art) • Future plans: evaluate the extended algorithm’s performance (detection accuracy and time efficiency) in a new set of videos Input Information Technologies Institute Centre for Research and Technology Hellas Output 16
  • 17. Conclusions • The proposed method can be used for fast and accurate re-detection of pre-defined objects in videos • The time performance of the implemented algorithm allows for real-time processing of multi-media content • Extended by a prior object labeling step, this technique can be seen as: – A reliable tool for instance-based annotated, spatio-temporal fragments in videos – A key-enabled technology for finding similar content and establishing links between related media fragments, thus contributing to the realization of interactive and linked TV Information Technologies Institute Centre for Research and Technology Hellas 17

Hinweis der Redaktion

  1. 5 representative key-frames per each shot (starting + ending + 3 intermediate) By applying this sampling strategy the algorithm analyses in full only the parts (i.e. the shots) of the video where the object appears (being visible in at least one of the key-frames of these shots) and quickly rejects all remaining parts by performing a small number of comparisons.
  2. E.g. a highly zoomed-in instance of the object may correspond to a very small portion of the searched object O, while an instance that is seen from a very distant viewing position may occupy only a small part of the overall image.
  3. For the last column (Time), a range is reported for each configuration, since processing times can vary significantly depending on the video structure and the percentage of frames in which the sought object appears.