Numerous studies have analyzed the influences of word segmentation (WS) performance on information retrieval (IR) for Mandarin Chinese and have demonstrated a non-monotonic relationship between WS accuracy and IR effectiveness. The usefulness of the compound words that have been a focus of the IR literature is not reflected by common WS evaluation metrics of word-based precision (P) and recall (R). This investigation proposes alternative measurements of WS accuracy, which are based on negative segments that are annotated against four standards of referenced corpora, called true negative rate (TNR) and negative predictive value (NPV), and compares with P and R through search engine simulation,. Accuracy-controlled WS systems segment queries for the simulation including NTCIR collections and "Sogou" logs. Mean average precision (MAP) estimates the similarity of search results between the original and segmented queries. The statistics demonstrate that TNR and NPV are generally more closely correlated with MAP than are P and R.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieval @ PACLIC 2011
1. / 36
EVALUATION
via Negativa
Mike Tian-Jian Jiang, Chen-Wei Shih, Chan-Hung Kuo,
Richard Tzong-Han Tsai, and Wen-Lian Hsu
National Tsing Hua University
Academia Sinica
Taiwan
中
文
詞分
INFORMATION
RETRIEVAL
1
4. / 36
“... the smallest free form that may be
uttered in isolation with semantic or
pragmatic content (with literal or
practical meaning) ...”
http://en.wikipedia.org/wiki/Word
4
5. / 36
“... the task of defining what
constitutes a ‘word’ involves
determining where one word ends
and another word begins...”
http://en.wikipedia.org/wiki/Word#Word_boundaries
5
6. / 36
Word Boundary?
• Phonology
• Morphology
• Orthography
• Compound? Multi-word expression?
• Multi-word vs. multiword vs. multi word
• CJKV?
• Multi-character expression?
6
7. / 36
What is a Word?
to computational linguistics
7
8. / 36
Standard de jure?
• Academia Sinica Balanced Corpus
• Chinese Treebank of University of
Pennsylvania
• City University of Hong Kong
• Microsoft Research Asia
• Peking University
8
9. / 36
... then match
standards
the more accuracy, the better communication?
9
10. / 36
What is a Word?
to computational linguistics applications
10
12. / 36
Standard de facto?
• Word n-gram
• Character n-gram
• Hybrid
12
13. / 36
Monotonic or not?
better WS results yield better IR outcomes?
13
14. / 36
Is it finite?
How to evaluate WS-to-application influence?
14
15. / 36
Via Negativa
“It describes God by saying what he is not, rather than what he is, because as
finite beings we can not recognize God's attributes in any real and full sense
and because God is beyond what our language can positively describe. “
http://www.blackwellreference.com/public/tocnode?id=g9781405106795_chunk_g978140510679515_ss1-58
http://www.blackmetal.com/scans0710/teratism-via-negativa.jpg
15
29. / 36
Pragmatical WS
accuracy-controlled systems on different standards
1, 1/2, 1/4, ..., 1/16384 data of Bakeoff 2005 for
CRF
http://scifun.files.wordpress.com/2010/07/1278929569066.jpg
29
34. / 36
Discussion
• 上海滩 (the bund of Shanghai)
• MSR: 上海滩,上海 / 滩,上 / 海 / 滩
• PKU: 上海滩,上海 / 滩,上 / 海滩
• May be caused by......
• Standard differences?
• Lexicon disappearances?
34
35. / 36
Concerns
• Other accuracy-controlled WS systems than CRF?
• The same training data, different standards?
• Conventional/comparative IR experiments?
• Lucene? Lemur/Indri?
• TREC and NTCIR?
• Silver standards?
• Relaxation of negative patterns?
• Graphical or n-best list output of WS?
• Oracle precision, recall, TNR, NPV, etc?
• Other applications than IR?
• Out-of-vocabulary?
35