Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
CUNI at MediaEval 2012: Search and Hyperlinking Task
1. CUNI at MediaEval 2012
Search and Hyperlinking Task
Petra Galuščáková and Pavel Pecina
Institute of Formal and Applied Linguistics
Charles University in Prague
{galuscakova,pecina}@ufal.mff.cuni.cz
2. Search and Hyperlinking Task
● Search and Hyperlinking task
● Search Subtask
–look up the relevant segment in the set of visual data
● Hyperlinking Subtask
and then possibly find another video segments related to the
–
retrieved one
● We have participated in the Search Subtask only
● Both transcripts (LIMSI and LIUM) were used
● We did not use concept recognition, shot segmentation and face
detection
3. Segmentation
● The exact relevant passage in the recording should be retrieved
→ the transcripts were at first divided into segments
● The IR system then was used for the retrieval in the collection of
such segments
● Two strategies for segmentation:
● Regular segmentation according to the time
● TextTilling
4. Regular Segmentation
● Segments of 45, 60, 90 and 120 seconds
● Segments were partially overlapping
● Each 30 seconds a new segment was created.
● The segment was removed from the list of the retrieved
segments if it partially overlapped with one of the higher
ranked segments.
5. TextTiling Segmentation
● Good results achieved in RSR MediaEval Track in 2011 [Eskevich et
al, 2012].
● The transcripts were at first preprocessed and the sentences
boundaries (based mainly on the punctuation) were marked.
● Used settings:
● average number of the words in a sentence was set to 27 and
● average number of the sentences in one segment was set to 9
● Better correspond to the 90 seconds long segments.
6. Terrier
● Terrier information retrieval system was used
● http://terrier.org
● Wide range of applicable search engines, language models and
available features
● The highest score was achieved applying Hiemstra Language
Model and TF IDF search engine.
● Terrier settings: we used Porter Stemmer, stopword list, query
expansion and implicit parameters for both TF IDF search
engine and Hiemstra language model
8. Results
Tran. Eng. Seg MRR mGAP MASP
60 30 10 60 30 10 Mod 60 30 10
- LIMSI Hiem No 0.34 0.27 0.10 0.21 0.10 0 0.57 0 0 0
1 LIMSI TFIDF 90s 0.42 0.31 0.15 0.26 0.16 0.03 0.56 0.11 0.08 0.04
2 LIUM Hiem 60s 0.38 0.34 0.19 0.26 0.17 0.03 0.50 0.11 0.11 0.06
3 LIMSI TFIDF 60s 0.47 0.40 0.19 0.31 0.20 0.04 0.62 0.16 0.14 0.06
4 LIMSI Hiem 90s 0.47 0.36 0.19 0.29 0.19 0.04 0.64 0.12 0.09 0.04
5 LIMSI Hiem TT 0.28 0.26 0.2 0.21 0.16 0.03 0.37 0.16 0.16 0.15
● Runs 1 and 2 were required, only title field of the query was used
● Another three runs use also short title field
● In all of the cases metadata information was added (description
and tags) to each segment.
9. Observations
● The highest MRR and mGAP scores were achieved applying
regular segmentation.
● The highest MASP score was achieved using TextTiling
segmentation
● The difference between scores achieved by TF IDF engine with 60
seconds long segments and Hiemstra LM with 90 seconds long
segments are very small for MRR and mGAP measures but it is
higher for MASP measure.
10. Segment Length
● Shorter segments achieve higher mGAP and MASP scores but this
dependency is more pronounced for MASP measure
● MRR score achieves the highest values for the 90 seconds long segments
● Window size 60 seconds
11. Future Work
● We would especially like to aim on the increasing mGAP and MASP
score in future
→ we would like improve the segmentation precision
● And use audio and visual information (e.g. shot segmentation)
● Examine shorter segments
13. Conclusions
● Two types of segmentation: regular according to the time and
TextTiling
● Terrier IR system, Hiemstra LM and TF IDF search engine
were used
● The highest MRR and mGAP scores were achieved using
regular segmentation (60 and 90 seconds) comparing to
TextTiling segmentation algorithm which achieved the highest
MASP scores
● The dependency of the measures on the length of the
segments was examined.