We describe details of our runs and the results obtained for the "IR for Spoken Documents (SpokenDoc) Task'' at NTCIR-9. The focus of our participation in this task was the investigation of the use of segmentation methods to divide the manual and ASR transcripts into topically coherent segments. The underlying assumption of this approach is that these segments will capture passages in the transcript relevant to the query. Our experiments investigate the use of two lexical coherence based segmentation algorithms (TextTiling, C99). These are run on the provided manual and ASR transcripts, and the ASR transcript with stop words removed. Evaluation of the results shows that TextTiling consistently performs better than C99 both in segmenting the data into retrieval units as evaluated using the centre located relevant information metric and in having higher content precision in each automatically created segment.
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
1. DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 1 / 52
DCU at the NTCIR-9 SpokenDoc
Passage Retrieval Task
Maria Eskevich, Gareth J.F. Jones
Centre for Digital Video Processing
Centre for Next Generation Localisation
School of Computing
Dublin City University
Ireland
December, 8, 2011
2. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 2 / 52
Outline
Retrieval Methodology
Transcript Preprocessing
Text Segmentation
Retrieval Setup
Results
Official Metrics
uMAP
pwMAP
fMAP
Conclusions
3. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 3 / 52
Retrieval Methodology
Transcript
6. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 6 / 52
Retrieval Methodology
Transcript Queries
Segmentation Information Request
Topically
Indexed
Coherent
Segments Indexing Segments
7. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 7 / 52
Retrieval Methodology
Transcript Queries
Segmentation Information Request
Topically
Indexed Retrieval
Coherent
Segments Indexing Segments Retrieval Results
8. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 8 / 52
6 Retrieval Runs
1-best ASR with Manual
1-best ASR
stop words removed Transcript
9. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 9 / 52
6 Retrieval Runs
1-best ASR with Manual
1-best ASR
stop words removed Transcript
C99
ASR C99 TT
ASR TT
10. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 10 / 52
6 Retrieval Runs
1-best ASR with Manual
1-best ASR
stop words removed Transcript
C99 C99
ASR C99 TT ASR NSW C99 TT
ASR TT ASR NSW TT
11. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 11 / 52
6 Retrieval Runs
1-best ASR with Manual
1-best ASR
stop words removed Transcript
C99 C99 C99
ASR C99 TT ASR NSW C99 TT Manual C99 TT
ASR TT ASR NSW TT Manual TT
12. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 12 / 52
6 Retrieval Runs
ASR C99 ASR NSW C99 Manual C99
ASR TT ASR NSW TT Manual TT
13. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 13 / 52
Transcript Preprocessing
Recognize individual morphemes of the sentences:
ChaSen 2.4.0, based on Japanese morphological analyzer
JUMAN 2.0 with ipadic grammar 2.7.0
14. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 14 / 52
Transcript Preprocessing
Recognize individual morphemes of the sentences:
ChaSen 2.4.0, based on Japanese morphological analyzer
JUMAN 2.0 with ipadic grammar 2.7.0
Form the text out of the base forms of the words in order to
avoid stemming
15. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 15 / 52
Transcript Preprocessing
Recognize individual morphemes of the sentences:
ChaSen 2.4.0, based on Japanese morphological analyzer
JUMAN 2.0 with ipadic grammar 2.7.0
Form the text out of the base forms of the words in order to
avoid stemming
Remove the stop words (SpeedBlog Japanese
Stop-words) for one of the runs
16. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 16 / 52
Text Segmentation
Use of the algorithms originally developed for text:
Individual IPUs are treated as sentences
TextTiling:
Cosine similarities between adjacent blocks of sentences
C99:
Compute similarity between sentences using a cosine
similarity measure to form a similarity matrix
Cosine scores are replaced by the rank of the score in the
local region
Segmentation points are assigned using a clustering
procedure
17. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 17 / 52
Retrieval Setup
SMART information retrieval system extended to use language
modelling with a uniform document prior probability.
18. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 18 / 52
Retrieval Setup
SMART information retrieval system extended to use language
modelling with a uniform document prior probability.
A query q is scored against a document d within the SMART
framework in the following way:
n
P(q|d) = (λi P(qi |d) + (1 − λi )P(qi ))
i=1
19. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 19 / 52
Retrieval Setup
SMART information retrieval system extended to use language
modelling with a uniform document prior probability.
A query q is scored against a document d within the SMART
framework in the following way:
n
P(q|d) = (λi P(qi |d) + (1 − λi )P(qi ))
i=1
where
q = (q1 , . . . qn ) is a query comprising of n query terms,
P(qi |d) is the probability of generating the i th query term
from a given document d being estimated by the maximum
likelihood,
P(qi ) is the probability of generating it from the collection
and is estimated by document frequency
20. Retrieval Methodology DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 20 / 52
Retrieval Setup
SMART information retrieval system extended to use language
modelling with a uniform document prior probability.
A query q is scored against a document d within the SMART
framework in the following way:
n
P(q|d) = (λi P(qi |d) + (1 − λi )P(qi ))
i=1
where
q = (q1 , . . . qn ) is a query comprising of n query terms,
P(qi |d) is the probability of generating the i th query term
from a given document d being estimated by the maximum
likelihood,
P(qi ) is the probability of generating it from the collection
and is estimated by document frequency
The retrieval model used λi = 0.3 for all qi , this value
being optimized on the TREC-8 ad hoc dataset.
21. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 21 / 52
Outline
Retrieval Methodology
Transcript Preprocessing
Text Segmentation
Retrieval Setup
Results
Official Metrics
uMAP
pwMAP
fMAP
Conclusions
22. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 22 / 52
Results: Official Metrics
Transcript Segmentation uMAP pwMAP fMAP
type type
BASELINE 0.0670 0.0520 0.0536
manual tt 0.0859 0.0429 0.0500
manual C99 0.0713 0.0209 0.0168
ASR tt 0.0490 0.0329 0.0308
ASR C99 0.0469 0.0166 0.0123
ASR nsw tt 0.0312 0.0141 0.0174
ASR nsw C99 0.0316 0.0138 0.0120
23. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 23 / 52
Results: Official Metrics
Transcript Segmentation uMAP pwMAP fMAP
type type
BASELINE 0.0670 0.0520 0.0536
manual tt 0.0859 0.0429 0.0500
manual C99 0.0713 0.0209 0.0168
ASR tt 0.0490 0.0329 0.0308
ASR C99 0.0469 0.0166 0.0123
ASR nsw tt 0.0312 0.0141 0.0174
ASR nsw C99 0.0316 0.0138 0.0120
Only runs on the manual transcript had higher scores than
the baseline (only uMAP metric)
24. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 24 / 52
Results: Official Metrics
Transcript Segmentation uMAP pwMAP fMAP
type type
BASELINE 0.0670 0.0520 0.0536
manual tt 0.0859 0.0429 0.0500
manual C99 0.0713 0.0209 0.0168
ASR tt 0.0490 0.0329 0.0308
ASR C99 0.0469 0.0166 0.0123
ASR nsw tt 0.0312 0.0141 0.0174
ASR nsw C99 0.0316 0.0138 0.0120
Only runs on the manual transcript had higher scores than
the baseline (only uMAP metric)
TextTiling results are consistently higher than C99 for all
the metrics for manual and ASR runs
25. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 25 / 52
Time-based Results Assessment Approach
For each run and each query:
Relevant Passage 1
Relevant Passage 2
Passage 3
Relevant Passage 4
...
where:
Length of the Relevant Part
Precision =
Length of the Whole Passage
26. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 26 / 52
Time-based Results Assessment Approach
For each run and each query:
Relevant Passage 1 Precision
Relevant Passage 2 Precision
Passage 3
Relevant Passage 4 Precision
...
where:
Length of the Relevant Part
Precision =
Length of the Whole Passage
27. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 27 / 52
Time-based Results Assessment Approach
For each run and each query:
Average
Relevant Passage 1 Precision of
Precision
per Query
Relevant Passage 2 Precision
for ranks
with
Passage 3
passages
containing
Relevant Passage 4 Precision relevant
content
...
where:
Length of the Relevant Part
Precision =
Length of the Whole Passage
28. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 28 / 52
Average of Precision for all passages with relevant
content
29. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 29 / 52
Average of Precision for all passages with relevant
content
TextTiling algorithm has higher average of precision for all
types of transcript, i.e.topically coherent segments
are better located
30. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 30 / 52
Results: utterance-based MAP (uMAP)
Transcript type Segmentation type uMAP
BASELINE 0.0670
manual tt 0.0859
manual C99 0.0713
ASR tt 0.0490
ASR C99 0.0469
ASR nsw tt 0.0312
ASR nsw C99 0.0316
31. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 31 / 52
Results: utterance-based MAP (uMAP)
Transcript type Segmentation type uMAP
BASELINE 0.0670
manual tt 0.0859
manual C99 0.0713
ASR tt 0.0490
ASR C99 0.0469
ASR nsw tt 0.0312
ASR nsw C99 0.0316
The trend ‘manual > ASR > ASR nsw’ for both C99 and
TextTiling is not proved by the averages of precision
32. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 32 / 52
Results: utterance-based MAP (uMAP)
Transcript type Segmentation type uMAP
BASELINE 0.0670
manual tt 0.0859
manual C99 0.0713
ASR tt 0.0490
ASR C99 0.0469
ASR nsw tt 0.0312
ASR nsw C99 0.0316
The trend ‘manual > ASR > ASR nsw’ for both C99 and
TextTiling is not proved by the averages of precision
Higher average values of the TextTiling segmentation over
C99 are not reflected in the uMAP scores
33. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 33 / 52
Results: utterance-based MAP (uMAP)
Transcript type Segmentation type uMAP
BASELINE 0.0670
manual tt 0.0859
manual C99 0.0713
ASR tt 0.0490
ASR C99 0.0469
ASR nsw tt 0.0312
ASR nsw C99 0.0316
The trend ‘manual > ASR > ASR nsw’ for both C99 and
TextTiling is not proved by the averages of precision
Higher average values of the TextTiling segmentation over
C99 are not reflected in the uMAP scores
For some of the queries runs on C99 segmentation
have better ranking of the segments with relevant
content
34. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 34 / 52
Relevance of the Central IPU Assessment
Number of ranks Average of Precision
taken or not taken for the passages at ranks
into account by pwMAP that are taken or not taken
into account by pwMAP
35. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 35 / 52
Relevance of the Central IPU Assessment
Number of ranks Average of Precision
taken or not taken for the passages at ranks
into account by pwMAP that are taken or not taken
into account by pwMAP
TextTiling has higher numbers of segments that have
central IPU relevant to the query
36. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 36 / 52
Relevance of the Central IPU Assessment
Number of ranks Average of Precision
taken or not taken for the passages at ranks
into account by pwMAP that are taken or not taken
into account by pwMAP
TextTiling has higher numbers of segments that have
central IPU relevant to the query
Overall the numbers of the ranks where the segment with
relevant is retrieved is approximately the same
for both segmentation techniques
37. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 37 / 52
Results: pointwise MAP (pwMAP)
Transcript type Segmentation type pwMAP
BASELINE 0.0520
manual tt 0.0429
manual C99 0.0209
ASR tt 0.0329
ASR C99 0.0166
ASR nsw tt 0.0141
ASR nsw C99 0.0138
38. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 38 / 52
Results: pointwise MAP (pwMAP)
Transcript type Segmentation type pwMAP
BASELINE 0.0520
manual tt 0.0429
manual C99 0.0209
ASR tt 0.0329
ASR C99 0.0166
ASR nsw tt 0.0141
ASR nsw C99 0.0138
TextTiling segmentation puts better topic boundaries for
relevant content and have higher precision scores for the
retrieved relevant passages
39. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 39 / 52
Average Length of Relevant Part and Segments
(in seconds)
Center IPU is relevant
40. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 40 / 52
Average Length of Relevant Part and Segments
(in seconds)
Center IPU is relevant
Center IPU is relevant: Average length of the relevant
content is of the same order for both segmentation
schemes, slightly higher for TextTiling
41. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 41 / 52
Average Length of Relevant Part and Segments
(in seconds)
Center IPU is not relevant
42. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 42 / 52
Average Length of Relevant Part and Segments
(in seconds)
Center IPU is not relevant
Center IPU is not relevant: Average length of the relevant
content is higher for C99 segmentation,
due to the poor segmentation it correlates
with much longer segments
43. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 43 / 52
Average Length of Relevant Part and Segments
(in seconds)
Center IPU is relevant Center IPU is not relevant
Center IPU is relevant: Average length of the relevant
content is of the same order for both segmentation
schemes, slightly higher for TextTiling
Center IPU is not relevant: Average length of the relevant
content is higher for C99 segmentation,
due to the poor segmentation it correlates
with much longer segments
44. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 44 / 52
Results: fraction MAP (fMAP)
Transcript type Segmentation type fMAP
BASELINE 0.0536
manual tt 0.0500
manual C99 0.0168
ASR tt 0.0308
ASR C99 0.0123
ASR nsw tt 0.0174
ASR nsw C99 0.0120
45. Results DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 45 / 52
Results: fraction MAP (fMAP)
Transcript type Segmentation type fMAP
BASELINE 0.0536
manual tt 0.0500
manual C99 0.0168
ASR tt 0.0308
ASR C99 0.0123
ASR nsw tt 0.0174
ASR nsw C99 0.0120
Average number of ranks with segments having
non-relevant center IPU is more than 5 times higher
Segmentation technique with longer poor segmented
passages (C99) has much lower precision-based scores
46. Conclusions DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 46 / 52
Outline
Retrieval Methodology
Transcript Preprocessing
Text Segmentation
Retrieval Setup
Results
Official Metrics
uMAP
pwMAP
fMAP
Conclusions
47. Conclusions DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 47 / 52
Conclusions
TextTiling segmentation shows better overall retrieval
performance than C99:
48. Conclusions DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 48 / 52
Conclusions
TextTiling segmentation shows better overall retrieval
performance than C99:
Higher numbers of segments with higher precision
49. Conclusions DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 49 / 52
Conclusions
TextTiling segmentation shows better overall retrieval
performance than C99:
Higher numbers of segments with higher precision
Higher precision even for the segments with non-relevant
center IPU
50. Conclusions DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 50 / 52
Conclusions
TextTiling segmentation shows better overall retrieval
performance than C99:
Higher numbers of segments with higher precision
Higher precision even for the segments with non-relevant
center IPU
High level of poor segmentation makes it harder to retrieve
relevant content for C99 runs
51. Conclusions DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 51 / 52
Conclusions
TextTiling segmentation shows better overall retrieval
performance than C99:
Higher numbers of segments with higher precision
Higher precision even for the segments with non-relevant
center IPU
High level of poor segmentation makes it harder to retrieve
relevant content for C99 runs
Removal of stop words before segmentation did not have
any positive effect on the results
52. Conclusions DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task 52 / 52
Thank you for your attention!
Questions?