10. 実験その2
• Reuters コーパスの文字割合を生成確率に
使ってランダムコーパスを作る
_ 0.2186 i 0.0568 r 0.0560
a 0.0646 j 0.0016 s 0.0591
b 0.0119 k 0.0054 t 0.0694
c 0.0292 l 0.0360 u 0.0213
d 0.0331 m 0.0205 v 0.0090
e 0.0885 n 0.0575 w 0.0101
f 0.0176 o 0.0566 x 0.0025
g 0.0139 p 0.0198 y 0.0116
h 0.0270 q 0.0016 z 0.0007
13. References
• Manning and Schuetze (1999). "Foundations of
Statistical Natural Language Processing"
• Zipf (1949). "Human Behavior and the Principle
of Least Effort"
• Wentian Li (1992). "Random Texts Exhibit Zipf's-
Law-Like Word Frequency Distribution"
• Cancho and Sole (2003). "Least effort and the
origins of scaling in human language"