Suche senden
Hochladen
nl190segment
â¢
0 gefÀllt mir
â¢
389 views
Hiroshi Ono
Folgen
Technologie
Bildung
Melden
Teilen
Melden
Teilen
1 von 8
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
æ©æ¢°ç¿»èš³ã®ä»æç©èª
æ©æ¢°ç¿»èš³ã®ä»æç©èª
Hiroshi Nakagawa
Â
èªç¶æŒç¹¹ã«åºã¥ãæéã®å«æé¢ä¿ã®èšŒæãçšãããã¬ãŒãºã¢ã©ã€ã¡ã³ãã®è©Šã¿
èªç¶æŒç¹¹ã«åºã¥ãæéã®å«æé¢ä¿ã®èšŒæãçšãããã¬ãŒãºã¢ã©ã€ã¡ã³ãã®è©Šã¿
Hitomi Yanaka
Â
é»ãç®ã®å€§ããªå¥³ã®åïŒæ§æããæå³ãž
é»ãç®ã®å€§ããªå¥³ã®åïŒæ§æããæå³ãž
Hiroshi Nakagawa
Â
Segmenting Sponteneous Japanese using MDL principle
Segmenting Sponteneous Japanese using MDL principle
Yusuke Matsubara
Â
IBMModel2
IBMModel2
Hidekazu Oiwa
Â
Ism npblm-20120315
Ism npblm-20120315
é浩 å®
Â
111127.lsj143.ç°å· japanese conjugation and dm
111127.lsj143.ç°å· japanese conjugation and dm
Takumi Tagawa
Â
program_draft3.pdf
program_draft3.pdf
Hiroshi Ono
Â
Weitere Àhnliche Inhalte
Andere mochten auch
Pylm public
Pylm public
Kei Uchiumi
Â
nodalities_issue7.pdf
nodalities_issue7.pdf
Hiroshi Ono
Â
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
Hiroshi Ono
Â
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
Hiroshi Ono
Â
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
Hiroshi Ono
Â
GamecenteræŠèª¬
GamecenteræŠèª¬
Hiroshi Ono
Â
EventDrivenArchitecture
EventDrivenArchitecture
Hiroshi Ono
Â
Voltdb - wikipedia
Voltdb - wikipedia
Hiroshi Ono
Â
èªç¶èšèªåŠçã®ããã®Deep Learning
èªç¶èšèªåŠçã®ããã®Deep Learning
Yuta Kikuchi
Â
Andere mochten auch
(9)
Pylm public
Pylm public
Â
nodalities_issue7.pdf
nodalities_issue7.pdf
Â
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
Â
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
Â
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
Â
GamecenteræŠèª¬
GamecenteræŠèª¬
Â
EventDrivenArchitecture
EventDrivenArchitecture
Â
Voltdb - wikipedia
Voltdb - wikipedia
Â
èªç¶èšèªåŠçã®ããã®Deep Learning
èªç¶èšèªåŠçã®ããã®Deep Learning
Â
Mehr von Hiroshi Ono
downey08semaphores.pdf
downey08semaphores.pdf
Hiroshi Ono
Â
BOF1-Scala02.pdf
BOF1-Scala02.pdf
Hiroshi Ono
Â
TwitterOct2008.pdf
TwitterOct2008.pdf
Hiroshi Ono
Â
camel-scala.pdf
camel-scala.pdf
Hiroshi Ono
Â
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
Hiroshi Ono
Â
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
Hiroshi Ono
Â
scalaliftoff2009.pdf
scalaliftoff2009.pdf
Hiroshi Ono
Â
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
Hiroshi Ono
Â
program_draft3.pdf
program_draft3.pdf
Hiroshi Ono
Â
nodalities_issue7.pdf
nodalities_issue7.pdf
Hiroshi Ono
Â
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
Hiroshi Ono
Â
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
Hiroshi Ono
Â
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
Hiroshi Ono
Â
downey08semaphores.pdf
downey08semaphores.pdf
Hiroshi Ono
Â
BOF1-Scala02.pdf
BOF1-Scala02.pdf
Hiroshi Ono
Â
TwitterOct2008.pdf
TwitterOct2008.pdf
Hiroshi Ono
Â
pamphlet_honsyou.pdf
pamphlet_honsyou.pdf
Hiroshi Ono
Â
camel-scala.pdf
camel-scala.pdf
Hiroshi Ono
Â
program_draft3.pdf
program_draft3.pdf
Hiroshi Ono
Â
nodalities_issue7.pdf
nodalities_issue7.pdf
Hiroshi Ono
Â
Mehr von Hiroshi Ono
(20)
downey08semaphores.pdf
downey08semaphores.pdf
Â
BOF1-Scala02.pdf
BOF1-Scala02.pdf
Â
TwitterOct2008.pdf
TwitterOct2008.pdf
Â
camel-scala.pdf
camel-scala.pdf
Â
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
Â
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
Â
scalaliftoff2009.pdf
scalaliftoff2009.pdf
Â
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
Â
program_draft3.pdf
program_draft3.pdf
Â
nodalities_issue7.pdf
nodalities_issue7.pdf
Â
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
Â
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
Â
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
Â
downey08semaphores.pdf
downey08semaphores.pdf
Â
BOF1-Scala02.pdf
BOF1-Scala02.pdf
Â
TwitterOct2008.pdf
TwitterOct2008.pdf
Â
pamphlet_honsyou.pdf
pamphlet_honsyou.pdf
Â
camel-scala.pdf
camel-scala.pdf
Â
program_draft3.pdf
program_draft3.pdf
Â
nodalities_issue7.pdf
nodalities_issue7.pdf
Â
KÃŒrzlich hochgeladen
IFIP IP3ã§ã®è³æ Œå¶åºŠã察象ãšããåœéèªå®ïŒIPSJ86å šåœå€§äŒã·ã³ããžãŠã ïŒ
IFIP IP3ã§ã®è³æ Œå¶åºŠã察象ãšããåœéèªå®ïŒIPSJ86å šåœå€§äŒã·ã³ããžãŠã ïŒ
ssuser539845
Â
2024 01 Virtual_Counselor
2024 01 Virtual_Counselor
arts yokohama
Â
What is the world where you can make your own semiconductors?
What is the world where you can make your own semiconductors?
Industrial Technology Research Institute (ITRI)(å·¥æ¥æè¡ç 究é¢, å·¥ç é¢)
Â
æç¶å¯èœãªDrupal Meetupã®ã³ã - Drupal Meetup Tokyoã®ç¥èŠ
æç¶å¯èœãªDrupal Meetupã®ã³ã - Drupal Meetup Tokyoã®ç¥èŠ
Shumpei Kishi
Â
2024 02 Nihon-TankenãïœTowards a More Inclusive Japanïœ
2024 02 Nihon-TankenãïœTowards a More Inclusive Japanïœ
arts yokohama
Â
TaketoFujikawa_å°æ¬äžã®åäœè¡šçŸã«åºã¥ãã¢ãã¡ãŒã·ã§ã³åç»ã·ã¹ãã ã®ææ¡_SIGEC71.pdf
TaketoFujikawa_å°æ¬äžã®åäœè¡šçŸã«åºã¥ãã¢ãã¡ãŒã·ã§ã³åç»ã·ã¹ãã ã®ææ¡_SIGEC71.pdf
Matsushita Laboratory
Â
ãä»ããã§ãéã«åããGPTsã«ãã 掻çšLTäŒ - 人ãšAIããå調ããHumani-in-the-Loopãž
ãä»ããã§ãéã«åããGPTsã«ãã 掻çšLTäŒ - 人ãšAIããå調ããHumani-in-the-Loopãž
Tetsuya Nihonmatsu
Â
æ å ±åŠçåŠäŒ86åå šåœå€§äŒ_Generic OAMãDeep Learningæè¡ã«ãã£ãŠå®çŸããããã®èª²é¡ãšè§£æ±ºæ¹æ³
æ å ±åŠçåŠäŒ86åå šåœå€§äŒ_Generic OAMãDeep Learningæè¡ã«ãã£ãŠå®çŸããããã®èª²é¡ãšè§£æ±ºæ¹æ³
ssuser370dd7
Â
2024 03 CTEA
2024 03 CTEA
arts yokohama
Â
2024 04 minnanoito
2024 04 minnanoito
arts yokohama
Â
ARã¹ã¿ãŒãã¢ãããOnePlanetã® Apple Vision Proãžã®æ ç±ãšææŠ
ARã¹ã¿ãŒãã¢ãããOnePlanetã® Apple Vision Proãžã®æ ç±ãšææŠ
Sadao Tokuyama
Â
20240326_IoTLT_vol109_kitazaki_v1___.pdf
20240326_IoTLT_vol109_kitazaki_v1___.pdf
Ayachika Kitazaki
Â
KÃŒrzlich hochgeladen
(12)
IFIP IP3ã§ã®è³æ Œå¶åºŠã察象ãšããåœéèªå®ïŒIPSJ86å šåœå€§äŒã·ã³ããžãŠã ïŒ
IFIP IP3ã§ã®è³æ Œå¶åºŠã察象ãšããåœéèªå®ïŒIPSJ86å šåœå€§äŒã·ã³ããžãŠã ïŒ
Â
2024 01 Virtual_Counselor
2024 01 Virtual_Counselor
Â
What is the world where you can make your own semiconductors?
What is the world where you can make your own semiconductors?
Â
æç¶å¯èœãªDrupal Meetupã®ã³ã - Drupal Meetup Tokyoã®ç¥èŠ
æç¶å¯èœãªDrupal Meetupã®ã³ã - Drupal Meetup Tokyoã®ç¥èŠ
Â
2024 02 Nihon-TankenãïœTowards a More Inclusive Japanïœ
2024 02 Nihon-TankenãïœTowards a More Inclusive Japanïœ
Â
TaketoFujikawa_å°æ¬äžã®åäœè¡šçŸã«åºã¥ãã¢ãã¡ãŒã·ã§ã³åç»ã·ã¹ãã ã®ææ¡_SIGEC71.pdf
TaketoFujikawa_å°æ¬äžã®åäœè¡šçŸã«åºã¥ãã¢ãã¡ãŒã·ã§ã³åç»ã·ã¹ãã ã®ææ¡_SIGEC71.pdf
Â
ãä»ããã§ãéã«åããGPTsã«ãã 掻çšLTäŒ - 人ãšAIããå調ããHumani-in-the-Loopãž
ãä»ããã§ãéã«åããGPTsã«ãã 掻çšLTäŒ - 人ãšAIããå調ããHumani-in-the-Loopãž
Â
æ å ±åŠçåŠäŒ86åå šåœå€§äŒ_Generic OAMãDeep Learningæè¡ã«ãã£ãŠå®çŸããããã®èª²é¡ãšè§£æ±ºæ¹æ³
æ å ±åŠçåŠäŒ86åå šåœå€§äŒ_Generic OAMãDeep Learningæè¡ã«ãã£ãŠå®çŸããããã®èª²é¡ãšè§£æ±ºæ¹æ³
Â
2024 03 CTEA
2024 03 CTEA
Â
2024 04 minnanoito
2024 04 minnanoito
Â
ARã¹ã¿ãŒãã¢ãããOnePlanetã® Apple Vision Proãžã®æ ç±ãšææŠ
ARã¹ã¿ãŒãã¢ãããOnePlanetã® Apple Vision Proãžã®æ ç±ãšææŠ
Â
20240326_IoTLT_vol109_kitazaki_v1___.pdf
20240326_IoTLT_vol109_kitazaki_v1___.pdf
Â
nl190segment
1.
ãã€ãºéå±€èšèªã¢ãã«ã«ããæåž«ãªã圢æ çŽ è§£æ ææ© å€§å° å±±ç°
æŠå£« äžç° ä¿®å NTT ã³ãã¥ãã±ãŒã·ã§ã³ç§åŠåºç€ç 究æ ã 619-0237 京éœåºçžæ¥œé¡ç²Ÿè¯çºãããã¯ããªåŠç éœåžãå å° 2â4 daichi@cslab.kecl.ntt.co.jp {yamada,ueda}@cslab.kecl.ntt.co.jp æŠèŠ æ¬è«æã§ã¯, æåž«ããŒã¿ãèŸæžãå¿ èŠãšãã, ããããèšèªã«é©çšã§ããæåž«ãªã圢æ çŽ è§£æåšããã³èšèª ã¢ãã«ãææ¡ãã. 芳枬ãããæååã, æå nã°ã©ã -åèª nã°ã©ã ããã³ãã©ã¡ããªãã¯ãã€ãºæ³ã®æ çµ ã§çµ±åãã確çã¢ãã«ããã®åºåãšã¿ãªã, MCMC æ³ãšåçèšç»æ³ãçšããŠ, ç¹°ãè¿ãé ãããåèªãã æšå®ãã. ææ¡æ³ã¯, ããããèšèªã®çæååããçŽæ¥, å šãç¥èãªãã« Kneser-Ney ãšåçã«é«ç²ŸåºŠã«ã¹ ã ãŒãžã³ã°ãã, æªç¥èªã®ãªã nã°ã©ã èšèªã¢ãã«ãæ§ç¯ããæ¹æ³ãšã¿ãªãããšãã§ãã. 話ãèšèãå€æãå«ãæ¥æ¬èª, ããã³äžåœèªåèªåå²ã®æšæºçãªããŒã¿ã»ããã§ã®å®éšã«ãã, ææ¡æ³ã®æ å¹æ§ããã³å¹çæ§ã確èªãã. ããŒã¯ãŒã: 圢æ çŽ è§£æ, åèªåå², èšèªã¢ãã«, ãã³ãã©ã¡ããªãã¯ãã€ãºæ³, MCMC Bayesian Unsupervised Word Segmentation with Hierarchical Language Modeling Daichi Mochihashi Takeshi Yamada Naonori Ueda NTT Communication Science Laboratories Hikaridai 2-4, Keihanna Science City, Kyoto Japan 619-0237 daichi@cslab.kecl.ntt.co.jp {yamada,ueda}@cslab.kecl.ntt.co.jp Abstract This paper proposes a novel unsupervised morphological analyzer of arbitrary language that does not need any supervised segmentation nor dictionary. Assuming a string as the output from a nonpara- metric Bayesian hierarchical n-gram language model of words and characters, âwordsâ are iteratively estimated during inference by a combination of MCMC and an eï¬cient dynamic programming. This model can also be considered as a method to learn an accurate n-gram language model directly from characters without any âwordâ information. Keywords: Word segmentation, Language Modeling, Nonparametric Bayes, MCMC 1 ã¯ããã« æ¥æ¬èªã®åœ¢æ çŽ è§£æã¯çŸåš 99%以äžã®æ§èœãæã£ãŠ ãããšèšãããã [1], ã¯ãããŠæ¬åœã ããã. çŸåšã®é«ç²ŸåºŠãªåœ¢æ çŽ è§£æåšã¯ãã¹ãŠ, 人æã§äœ æããæåž«ããŒã¿ãããšã«æ©æ¢°åŠç¿ãŸãã¯ã«ãŒã«ã« ãã£ãŠæ§ç¯ãããŠãã, ãã®éã®æåž«ããŒã¿ã¯æ°èèš äºãã»ãšãã©ã§ãã. 話ãèšèã, ããã°çã§ã¿ãã ãå£èªäœã®æ¥æ¬èªã«ã¯æ¬¡ã ã«æ°èªãæ°ããè¡šçŸãç ãŸã, ãŸãåèªåå²ã®åºæºãææ§ãªãã, 圢æ çŽ è§£æ ãé«ç²ŸåºŠã«è¡ãããšã¯å°é£ã§ãã. æåž«ããŒã¿ã人 æã§äœæããå Žåã§ã, ãã®æ§ç¯ãã¡ã³ããã³ã¹ã« ã¯è«å€§ãªã³ã¹ããããã, ãããäœããã®æå³ã§ãæ£ è§£ãã§ãããšããä¿èšŒããªã.1 ããã«, å€æãæªç¥ã®èšèªãªã©ã«ã¯ããããæåž« ããŒã¿ããªã, ãããŸã§åœ¢æ çŽ è§£æã¯äžå¯èœã§ãã£ã. å³ 1 ã«, ãæºæ°ç©èªãã®åé ã®äžéšã MeCab [2] 㧠1æ°èèšäºã®å Žåã§ãåé¡ã¯åæ§ã§ãã, ãæ£è§£ãããŒã¿ã¯æ¬ 質çã«äžæã§ã¯ãªã. ãã£ãŠ, è€æ°ã®åè©äœç³»ãã¿ã°ä»ãåºæºã ãã, æåž«ããåŠç¿ã¯ããããæ£ææ§ããéããããšãã§ããªã. 圢æ çŽ è§£æããäŸã瀺ã. ããïœãã²ãªããšãããªã»ïœ ã«ïœã»ïœã¯ïœãïœãããªã©ã®è§£æçµæãèŠããšããã ããã«, çŸä»£æã®æåž«ããåŠç¿ã«åºã¥ã圢æ çŽ è§£æåš ã§ã¯, ããããæãé©åã«åå²ããããšãã§ããªã. 圢æ çŽ è§£æãããçµæã¯, ããªæŒ¢åå€æãçµ±èšçæ© æ¢°ç¿»èš³, é³å£°èªèãªã©å€ãã®å Žå, ããã§çšãããã nã°ã©ã ãªã©ã®èšèªã¢ãã«ãžã®å ¥åãšããŠäœ¿ããã. 人æã«ããæåž«ããŒã¿ãåºæ¬ãšããåŸæ¥ã®åœ¢æ çŽ è§£ æã«ã¯, é©çšã®éã®ããããæ§èœãæé©åããŠããªã ãšããåé¡ããã£ã. ãŸãçåŠçãããã¯èšç®èšèª åŠçã«ã¿ããš, ããšãæªç¥ã®èšèªã§ãã£ããšããŠã, èšèªããŒã¿ã«é ããçµ±èšçæ§è³ªãçšããŠ, ãåèªãã® ãããªåºç€çãªåäœã«ã€ããŠã¯å°åºã§ããããšãæ ãŸãã. ããããèãã«åºã¥ã, æ¬è«æã§ã¯ä»»æã®èšèªã«ã€ ããŠ, 芳枬ãããæååã®ã¿ããèŸæžãæåž«ããŒã¿ã å šã䜿ããã«ãåèªããæšå®ããããšã®ã§ãã, ãã³ ãã©ã¡ããªãã¯ãã€ãºæ³ã«åºã¥ããæåž«ãªã圢æ çŽ è§£æåšããã³èšèªã¢ãã«ãææ¡ãã. ææ¡æ³ã¯ä»»æ 1
2.
äžã«ïœãïœãã²ãªããšïœèŠïœããŠãŸã€ãïœãïœãŸã²ïœãïœ åé«ãïœãïœã¯ããïœå®®ïœã®ïœåŸ¡ïœå®¹è²ïœã«ïœãïœãïœ ãªã»ïœã«ïœã»ïœã¯ïœãïœãïœã¯ããšïœãžïœãïœæ¹ïœãª ãïœãïœãã€ããïœãïœãªãïœãïœãïœäžïœã®ïœäººïœå ãïœåïœãšïœèãïœãïœãïœè€ïœå£Œïœãªãïœã³ããŸã² ãŠïœãïœåŸ¡ïœããŒãïœãïœãšãã©ãïœãªãïœã°ïœãïœã ãïœããïœæ¥ïœã®ïœå®®ïœãšïœèãïœãïœã· · · å³
1: ãæºæ°ç©èªãã® MeCab ã«ãã解æ. ã®èšèªã®æååããçŽæ¥èšèªã¢ãã«ãåŠç¿ããæ¹æ³ ãšãèŠãªãããšãã§ã, æšè«ã®éã«å¹çç㪠MCMC æ³ãçšããŠç¹°ãè¿ãåèªåå²ãæ¹è¯ããŠããããšã§ åŠç¿ãè¡ã. æçµçã«åŠç¿ããŒã¿ã®æé©ãªåèªåå² ãšèšèªã¢ãã«ãåŸãã, èšèªã¢ãã«ãçšããŠãã¿ã㢠ã«ãŽãªãºã ã§è§£æããããšã«ããæªç¥ããŒã¿ã®åœ¢æ çŽ è§£æãè¡ãããšãã§ãã. æåž«ãªãåŠç¿ã®ãã, ææ¡æ³ã¯åŠç¿ããŒã¿ãåçç ã«ãããã§ãå¢ããããšãã§ã, ãæªç¥èªããååšã ã, ãã¡ã€ã³é©å¿ã容æã§ãã. ãŸã, æåž«ããã㌠ã¿ãäºåç¥èãšããŠçµã¿èŸŒãããšãã§ãã. 以äžã§ã¯ãŸã, 2 ç« ã§æåž«ãªã圢æ çŽ è§£æã®å®åŒå ãš, ãããŸã§ã®é¢é£ç 究ã«ã€ããŠèª¬æãã. 3 ç« ã§ ã¯éå±€ãã€ãºæ³ã«ãã nã°ã©ã ã¢ãã«ãæåâåèªãš ããã«éå±€åããŠåŸãããèšèªã¢ãã«ã瀺ã, 4 ç« ã§ MCMC æ³ãšåçèšç»æ³ãçµã¿åãããåŠç¿æ³ã«ã€ã ãŠè¿°ã¹ã. 5 ç« ã§æ°èèšäºã»è©±ãèšèã»å€æã®æ¥æ¬èª, ããã³äžåœèª, è±èªã®åèªåå²ã®å®éšãè¡ã£ãŠæå¹æ§ ã瀺ã, 6 ç« ã§èå¯ãè¡ã£ãŠå šäœããŸãšãã. 2 æåž«ãªã圢æ çŽ è§£æãšã¯ èªç¶èšèªã®æåå s=c1c2 · · · cN ãäžãããããšã, æåž«ãªã圢æ çŽ è§£æãšã¯, s ãåå²ããŠåŸãããåèª å w = w1w2 · · · wM ã®ç¢ºç p(w|s) ãæ倧ã«ããå èªå Ëw ãæ±ããåé¡ãšèããããšãã§ãã. 2 Ëw = argmax w p(w|s) (1) ããã¯, ãèšèªãšããŠæãèªç¶ãªåèªåå²ããæ±ã ãããšããããšãšçãã. ã圢æ çŽ è§£æããšãããš w ã®åè©ã¿ã°ä»ããå«ãããšãå€ãã, åè©ã®æ±ºå® ã«ã¯æ¬æ¥, æ§æ解æãå¿ èŠãšãããšèããããããš, ãŸã nã°ã©ã ãçµ±èšçæ©æ¢°ç¿»èš³ãªã©å€ãã®ã¿ã¹ã¯ã« ãããŠåèªåå²ã®ã¿ãå¿ èŠãšãããããšãã, æ¬è«æ ã§ã¯ã圢æ çŽ è§£æããšã¯æãåºæ¬çãª, åèªåå²ãæ ãããšãšãã.3 (1) åŒã®ç¢ºç p(w|s) ã¯èšèªã¢ãã«ã«ãã£ãŠèšç®ã ãããšãã§ã, ãããæ倧åãã Ëw ã¯, åèªèŸæžã ãã³èšèªã¢ãã«ãååšããã°, å¯èœãªåèªã®çµã¿åã ãã«ã€ããŠãã¿ãã¢ã«ãŽãªãºã ãé©çšããããšã§åŸ ãããšãã§ãã. ããã, æåž«ãªã圢æ çŽ è§£æã«ãããŠã¯ããããå èªãæªç¥ã§ãã. [3][5] ã§ã¯ãã®å¶çŽãããç·©ã, æª 2ãã®å®åŒåã¯äžè¬åãããšçµ±èšç¿»èš³ãšã¿ãããšãã§ã, s ã ã²ãããªåã®ãšã, ããªæŒ¢åå€æãšç䟡ãšãªã. 3ç°¡åãªæåž«ãªãåè©æšå®ã«ã¯, ææ¡æ³ã«ãã£ãŠåèªåå²ãè¡ã£ ãåŸ, HMM ãèµ°ãããæ¹æ³ [4] ããã. s = 圌 女 ã® èš ã£ ã èš è 㯠· · · z = 0 1 1 1 0 1 0 1 1 · · · w = 圌 女 ã® èš ã£ ã èš è 㯠· · · å³ 2: åèªåå²ãšæœåšå€æ°ãã¯ãã« z. ç¥èªã®åèªããããæå nã°ã©ã ã§äžããã, åèªãª ã¹ããäžããäžã§, (1) åŒã«ããåå²ãšèšèªã¢ãã«ã 亀äºã«æé©åããæ¹æ³ã瀺ããã, äŸç¶ãšããŠåèªå å²æžã¿ã³ãŒãã¹ã, åèªãªã¹ããå¿ èŠãšããŠãã. ã ããã¯æªç¥ã®èšèªã«ã€ããŠã¯åççã«æºåäžå¯èœã§ ãã, ãŸãæ¢ç¥ã®èšèªã«ã€ããŠã, åèªåå²ã®ãæ£è§£ã ã¯äžæã§ã¯ãªã [6], ããšãã°è©±ãèšèãå£èªäœã«ã€ ããŠã¯äœããåèªããšãã¹ããåå®ããããšãéåžž ã«é£ãã. ããã«, åèªã®çš®é¡ã¯æéã§ã¯ãªã, ãã ã¹ãã«ã¯æ¢åã®åèªãªã¹ãã§ã«ããŒã§ããªã倧éã® ãæªç¥èªããå«ãŸããŠãã, ããããæªç¥èªã®åãæ± ãã圢æ çŽ è§£æã®éèŠãªåé¡ãšãªã£ãŠããŠãã [7]. çŽç²ã«çµ±èšçæ©æ¢°åŠç¿ã®åé¡ãšããŠã¿ããš, (1) åŒ ã¯ s ã®åæå ci ã«ãã®çŽåŸãåèªå¢çã®ãšã 1, ã ãã§ãªããšã 0 ããšãæœåšå€æ° zi ããããšèããã°, w ã¯æœåšå€æ°ãã¯ãã« z = z1z2 · · · zN ãšåäžèŠã§ ãããã, Ëz = argmax z p(z|s) (2) ãæ倧åãã Ëz ãæ±ããåŠç¿åé¡ãšèããããšã㧠ãã. ãã㯠z ãé ãç¶æ ãšãã, semi-Markov 㢠ãã«ãŸãã¯åå²ã¢ãã« [8] ãšåŒã°ãã HMM ã®å€çš® ã§ãã, åæ s ã«ã€ããŠå¯èœãª z ã¯ææ°çã«ååšãã ãã, å¹ççãªåŠç¿ãå¿ èŠãšãªã. ç°¡åãªæ¹æ³ãšããŠæè¿ã®ãã®ã«, MDL ãåºæºã«æ åã®ãã£ã³ãã³ã°ãç¹°ãè¿ãæ¹æ³ [9] ããã, ãŸãã ããã€ãºçãªæ¹æ³ãšããŠ, [10] ã¯éå±€ãã£ãªã¯ã¬éçš (HDP) ã«ããåèªãã€ã°ã©ã ã¢ãã«ãçšããŠ, zi ã ã®ãã¹ãµã³ãã©ã«ããäžæåãã€æŽæ°ããæ¹æ³ã瀺 ãã. ããã, ãããã®æ¹æ³ã¯åèªåå²ãäžç®æãã€å€ã ãããã«, èšå€§ãªèšç®éãå¿ èŠãšãã. ããã«, åèª åå²ã§ã¯ç°ãªã zi ã®éã«é«ãçžé¢ãããããã«åæ ãããããŠé ã, éåžžã«å°éã®ã³ãŒãã¹ã«ã€ããŠãã é©çšã§ããªãã£ã. ãŸã, ãã®æ¹æ³ã§ã¯åèªã®ãã€ã° ã©ã ãŸã§ããèæ ®ããããšãã§ãã, ã¢ãã«ãåèªå å²ã®ããã«è£å©çã«å°å ¥ããããã®ã§, äœããåèªã ããããã®åºæºãæã£ãŠããªããšããåé¡ããã. ããã«å¯Ÿãæ¬è«æã§ã¯, æåâåèªã®éå±€ nã°ã©ã èš èªã¢ãã«ã®æ§èœãš, ããã«åºã¥ãåèªåå²ãçŽæ¥æé© åããæ¹æ³ã瀺ã, ãã®ããã«åçèšç»æ³ãš MCMC ãçµã¿åãããå¹ççãªåŠç¿æ³ãææ¡ãã. ææ¡æ³ã¯ nã°ã©ã èšèªã¢ãã«ã®ãã€ãºã¢ãã«ã§ã ã HPYLM ãåºã«ããŠãããã, 次ã«ãŸã HPYLM ã«ã€ããŠèª¬æã, ç¶ããŠãããæåâåèªãšéå±€åã ãããšã§, ããããèšèªããã³æªç¥èªã«å¯Ÿå¿ã圢æ çŽ è§£æãè¡ãããšã®ã§ããèšèªã¢ãã«ã瀺ã. 2
3.
(a) Pitman-Yor éçšã«ãã,
nã°ã©ã ååž Gn ã®éå±€çãªçæ. (b) ç䟡㪠CRP ãçšããè¡šçŸ. åŠç¿ããŒã¿ã®ååèªãã客ã ãšã¿ãŠ, 察å¿ããæèããŒãã«äžã€ãã€è¿œå ããŠãã. å³ 3: n ã°ã©ã èšèªã¢ãã«ã®ãã€ãºåŠç¿. 3 HPYLM ãã NPYLM ãž 3.1 HPYLM: ãã€ãº n ã°ã©ã èšèªã¢ãã« èšèªã¢ãã«ãçšããŠåœ¢æ çŽ è§£æãè¡ãããã«ã¯, å¯ èœãªããããåèªåå²ã«ã€ããŠç¢ºçãäžããæ¹æ³ã å¿ èŠãšãªã. åŸæ¥ããã«ã¯, æªç¥èªãè¡šãç¹å¥ãªã㌠ã¯ã³ UNK ãå°å ¥ããŠç¢ºçãæ±ãããªã©, ãã¥ãŒãªã¹ ãã£ãã¯ãªæ¹æ³ã䜿çšãããŠããã [3], ãã£ãªã¯ã¬ éçšããã³ãã®äžè¬åã§ãã Pitman-Yor éçšã«ã ã nã°ã©ã ã¢ãã«ãçšããããšã§, çè«çã«èŠéãã ã, 粟å¯ãªã¢ãã«åãå¯èœã«ãªã. ããã«ã€ããŠç°¡å ã«èª¬æãã. Pitman-Yor(PY) éçšã¯, åºåºæž¬åºŠãšãã°ãããã 確çååž G0 ã«äŒŒãã©ã³ãã ãªé¢æ£ç¢ºçååž G ãç æãã確çéçšã§ãã, äžã®ããã«æžããã. G ⌠PY(G0, d, Ξ) . (3) d ã¯ãã£ã¹ã«ãŠã³ãä¿æ°, Ξ 㯠G ãå¹³åçã«ã©ã®ãã ã G0 ãšäŒŒãŠããããå¶åŸ¡ãã, PY éçšã®ãã©ã¡ãŒ ã¿ã§ãã. d = 0 ã®ãšã, PY(G0, 0, Ξ) ã¯ãã£ãªã¯ã¬ éçš DP(Ξ) ãšäžèŽãã. ããŸãŠãã°ã©ã ååž G1 = { p(·) } ããããšãããš, åèª v ãæèãšãããã€ã°ã©ã ååž G2 = { p(·|v) } 㯠G1 ãšã¯ç°ãªãã, é«é »åºŠèªãªã©ã«ã€ã㊠G1 ã åæ ããŠãããšèãããããã, G1 ãåºåºæž¬åºŠãšã ã PY éçšã«ãã G2 ⌠PY(G1, d, Ξ) ãšçæããã ãšä»®å®ããããšãã§ãã. åæ§ã«ãã©ã€ã°ã©ã ååž G3 = { p(·|v v) } ã¯ãã€ã°ã©ã ååžãåºåºæž¬åºŠãšã ㊠G3 ⌠PY(G2, d, Ξ) ãšçæã§ã, G1, G2, G3 ã¯å³ 3(a) ã®ãããªæšæ§é ããªãããšã«ãªã. å®éã«ã¯ G ã¯ç©åæ¶å»ããããšãã§ã, ãã®ãšã, éå±€ Pitman-Yor éçšã«åºã¥ã nã°ã©ã èšèªã¢ãã« (HPYLM) ã¯å³ 3(b) ã®ããã«, éå±€ç㪠CRP(äžè¯ æçåºéçš) ã§è¡šçŸããããšãã§ãã. ãã® CRP 㧠ã¯, åŠç¿ããŒã¿ã®ååèªãã客ããšåŒã³, nã°ã©ã æ èã«å¯Ÿå¿ããæšã®èã«äžã€ãã€è¿œå ããŠãã. äŸã ã°, ãã©ã€ã°ã©ã ã®åŠç¿ããŒã¿ã«ã圌 㯠è¡ãããšã ãæããã£ããšã, 4 人ã®å®¢ â圌â âã¯â âè¡ãâ â$â ã, ããããçŽåã® 2 åèª â$ $â â$ 圌â â圌 ã¯â â㯠è¡ãâ ã®æèã«å¯Ÿå¿ããèã«è¿œå ãã. â$â ã¯èšèªã¢ ãã«ã§å¿ èŠãªæå¢çãè¡šã, é·ã 0 ã®åèªã§ãã. åèª w ã®å®¢ãããŒã h ã«è¿œå ããããšã¯, 察å¿ã ã nã°ã©ã ã«ãŠã³ã c(w|h) ã 1 å¢ããããšãæå³ã ã. ãã ã, ããã¯ãªããšåãæå³ã§ããã¯æ¬åœã¯, 芪ããŒãã§ã® 1 ã€çãæè h ãçšãã (nâ1) ã°ã© ã ããçæãããå¯èœæ§ããã.4 ãã®æ, 客 w ã®ã³ ããŒãã代ç客ããšããŠèŠª h ã«ãåæ§ã«è¿œå ãã. ãã®å®¢ã®è¿œå ã¯ååž°çã«è¡ããã, ãã¹ãŠã®çš®é¡ã®å èªã¯å¿ ã, 察å¿ãã客ããŠãã°ã©ã ããªãã¡æ ¹ããŒã ã« 1 ã€ä»¥äžæã€ããšã«ãªã (å³ 3(b)). ããããŠ, ã«ãŠã³ã c(w|h) ã®ãã¡, 芪ããŒãããç æããããšæšå®ãããåæ°ã thw ãšãããš, HPYLM ã§ã® nã°ã©ã 確ç p(w|h)㯠(nâ1)ã°ã©ã 確ç p(w|h ) ã䜿ã£ãŠ, 次ã®ããã«éå±€çã«è¡šãããšãã§ãã. p(w|h) = c(w|h)âd·thw Ξ+c(h) + Ξ+d·th · Ξ+c(h) · p(w|h ) (4) ããã§, th · = â w thw, c(h)= â w c(w|h) ãšãã. äžè¬ã«ã¯ thw 㯠c(w|h) ã®å¯Ÿæ°ã®ãªãŒããŒã®æ°ã« ãªãã [11], thw ãåžžã« 1 ã«ãããš (4) 㯠Kneser-Ney ã¹ã ãŒãžã³ã° [12] ãšäžèŽã, HPYLM 㯠Kneser-Ney nã°ã©ã ã®, ãã粟å¯ãªãã€ãºã¢ãã«ã§ããããšãã ãã. åŠç¿ã®éã«ã¯ MCMC æ³ãçšã, 客ãã©ã³ãã ã«éžãã§åé€ã, ãŸãè¿œå ããããšãç¹°ãè¿ãã㚠㧠thw ãæé©åããŠãã. d, Ξ ã®æšå®ãªã©è©³ããã¯, [11] ãåç §ãããã. 3.2 HPYLM ã®éå±€å (4) åŒã¯åèªãŠãã°ã©ã ã®å Žåã¯, p(w|h ) ãåèªã® äºå確çãè¡šããŒãã°ã©ã ãšãªãã, ããã¯ã©ã®ãã ã«äžãããããã ããã. èªåœãæéãªãã° 1/|V | (V ã¯èªåœéå) ãšããã° ããã, 圢æ çŽ è§£æã«ãããŠã¯èªåœã¯ç¡éã§ãã, ã ãããéšåæååãåèªãšãªãå¯èœæ§ããã. ãã ã, èšèªã«ãããŠåèªãšãªãã¹ã綎ãã¯æ±ºã㊠ã©ã³ãã ã§ã¯ãªã. ããã§, æ¬ç 究ã§ã¯ [3] ãšåæ§ã«, åèªã®äºå確çããã®ç¶Žãã®æå nã°ã©ã ã«ãã£ãŠ äžã, G0(w) = p(c1c2 · · · ck) (5) 4ããšããš c(w|h)=0 ã ã£ããšã, 確ç 0 ã®äºè±¡ããã«ãŠã³ã ãçæãããããšã«ãªã£ãŠããŸããã, æåã¯å¿ ã芪ããçæã ãããã®ã§ãã. ããã, 2 åç®ä»¥éã¯ãããšã¯éããªã. 3
4.
ãšäºå確çãèšç®ããããšã«ãã. c1 ·
· · ck ã¯, åèª w ã®æååãšããŠã®è¡šèšã§ãã. p(c1 · · · ck) ã¯æå HPYLM ã«ãã£ãŠåæ§ã«èšç®ããã.5 æå nã°ã©ã ãªãŒã㌠n ã«å¯ŸããäŸåæ§ãé¿ãããã, æ¬ç 究㧠ã¯æåã¢ãã«ã«ã¯å¯å€é·ã® â-ã°ã©ã èšèªã¢ãã« [13] ãçšãã. ãã®ãšã, åèªãŠãã°ã©ã ååž G1 㯠(5) åŒã§äžããããåèªäºå確ç G0 ãåºåºæž¬åºŠãšããŠ, G1 ⌠PY(G0, d0, Ξ0) ã®ããã«åæ§ã« PY éçšããç æãããããšã«ãªã. ããã¯å³ 4 ã®ããã«, åèª HPYLM ã®åºåºæž¬åºŠã«ãŸ ãæå HPYLM ãåã蟌ãŸãã, éå±€ nã°ã©ã ã¢ã ã«ã§ãã, ä»¥äž Nested Pitman-Yor Language Model (NPYLM) ãšåŒã¶.6 ãã®ã¢ãã«ã§ã¯, ãŸãæå nã° ã©ã ã«ãã£ãŠåèªãç¡éã«çæãã, ãããåèª nã° ã©ã ã«ãã£ãŠçµã¿åãããããšã§æååãçæãã ã. ããããã®ç®æšã¯, 芳枬å€ã§ãããã®æååã®ã¿ ãã, é ãããåèªããæšå®ã, åèªã¢ãã«ãšæå㢠ãã«ãåæã«æ±ããããšã§ãã. (5) åŒã¯ãããã綎ãã«ç¢ºçãäžãããã, G0, ã ãã³ããããçæããã G1, G2, · · · ã¯ãã¹ãŠå¯ç® ç¡é次å ãšãªãããšã«æ³šæãããã. ãã®å Žåã§ã CRP ã«åºã¥ã, (4) åŒããã³ (5) åŒãçŽ çŽã«é©çšã ãããšã§ nã°ã©ã 確çãæ±ãŸã. ããããæ§æãã, NPYLM ã§ã®åèª nã°ã©ã 確çã«ã¯ã€ãã«, æå nã° ã©ã ã§èšç®ãããåèªã®è¡šèšç¢ºçãåæ ãããŠãã, äž¡è ãèŠéãããçµ±åããèšèªã¢ãã«ãšãªã£ãŠãã. å®éã«ã¯, (5) åŒã ãã§ã¯é·ãåèªã®ç¢ºçãå°ãã ãªãããããã, æ¬ç 究ã§ã¯åèªé·ããã¢ãœã³ååž ã«åŸãããã«ããã«è£æ£ãè¡ã£ã. ããã«ã€ããŠã¯ 4.3 ç¯ã§è©³ããè¿°ã¹ã. CRP è¡šçŸ NPYLM ã§ã¯åèªã¢ãã«ãšæåã¢ãã« ã¯ç¬ç«ã§ã¯ãªã, CRP ãä»ããŠç¹ãã£ãŠãã. åèª HPYLM ã®ãŠãã°ã©ã ã«åèª w ãæ°ããçŸããã, 察å¿ããå€æ° t w ã 1 å¢ãããšã, ãã㯠w ããŠã ã°ã©ã ã®åºåºæž¬åºŠ, ããªãã¡æå HPYLM ããçæ ãããããšãæå³ããã®ã§, w ãæåå c1 · · · ck ã« å解ããŠåŸããã âæâ ãæå HPYLM ã«ããŒã¿ãš ããŠè¿œå ãã. éã«ãŠãã°ã©ã ãã w ãæ¶ããã, t w ã 1 æžã£ãå Žå, 察å¿ããããŒã¿ãç¡å¹ãšãªã£ã ããšãæå³ããã®ã§, æå HPYLM ãããã®ããŒã¿ ãåé€ãã. ãããã¯ãã¹ãŠ, éåžžã® HPYLM ãšåæ§ã« MCMC ã®äžã§åèªã®åé€ãšåè¿œå ãã©ã³ãã ã«ç¹°ãè¿ããš ãã«èµ·ããã, ããŸåèªã¯æªç¥ã§ãããã, ãŸãæã åèªã«å解ããå¿ èŠããã. æ¬ç 究ã§ã¯ãããåç èšç»æ³ã«ãã£ãŠå¹ççã«è¡ã, MCMC ãšçµã¿åãã ãŠã¢ãã«å šäœãåŠç¿ããŠãã. ããã«ã€ããŠæ¬¡ã«èª¬ æãã. 5æå HPYLM ã§ã®æçµçãªåºåºæž¬åºŠ G0 ã«ã¯, 察象ãšãã èšèªã®å¯èœãªæåéå (JIS X0208 ãªãã° 6879 å) ã«ã€ããŠç 確çã®äºåååžãçšãã. 6å³å¯ã«ã¯, ãã㯠Nested Dirichlet Process [14] ã®æå³ã§ ããã¹ããããŠããããã§ã¯ãªãã, çŽèŠ³çãªå称ãçšãã. å³ 4: NPYLM ã®éå±€ CRP è¡šçŸ. 4 åŠç¿ åæã®åèªåå² w, ããªãã¡ z ãæ±ããæãç°¡å ãªæ¹æ³ã¯, z1, . . . , zD ã®äžãã 1 ã€ã®æåã«å¯Ÿå¿ãã zi ãã©ã³ãã ã«éžã³, ããã 1 ã 0 ããèšèªã¢ãã« ããåŸããã確çãçšããŠãµã³ããªã³ã°ã, ãã®çµæ ã«ãã£ãŠèšèªã¢ãã«ãæŽæ°ãã, ãšããã®ãã¹ãµã³ã ãªã³ã°ãç¹°ãè¿ãæ¹æ³ã§ãã. å åãµã³ããªã³ã°ã ç¹°ãè¿ãã°, z ã¯çã®ååžã§ãã (2) åŒããã®ãµã³ã ã«ã«åæãã. [15] ããã, ãã®æ¹æ³ã¯åŠç¿ããŒã¿ã®ãã¹ãŠã®æåæ¯ ã«ãµã³ããªã³ã°ãç¹°ãè¿ããã, 2 ç« ã§è¿°ã¹ãããã« ç¹ã«åèªåå²ã®å Žåã¯ããããŠéå¹ççã§ãã7 , 㢠ããŒãªã³ã°ãè¡ããªãéãåæãé£ãã [10]. ãŸã, é£å士ã®åèªã®é¢ä¿ã®ã¿ãèŠãŠãããã, ãã€ã°ã©ã ãŸã§ããèæ ®ã§ããªããšããåé¡ããã. 4.1 Blocked Gibbs Sampler ããã«ä»£ãã, æ¬ç 究ã§ã¯æããšã®åèªåå² w ã, åçèšç»æ³ã«ããå¹ççã«ãµã³ããªã³ã°ãã. w ã ãªãã¡ z ããŸãšããŠãµã³ããªã³ã°ãããã, ãã㯠ãããã¯åã®ãã¹ãµã³ãã© [15] ãšåŒã°ãããã®ãšãª ã, å³ 5 ã«ç€ºããã¢ã«ãŽãªãºã ãšãªã. æåã¯åèªãæªç¥ã®ãã, æåå s å šäœãäžã€ã® ãåèªããšãªããã®ãŸãŸæåã¢ãã«ã«æž¡ãããã, 2 åç®ä»¥éã¯å€ãåèªåå²ã«ããããŒã¿ãèšèªã¢ãã« ããåé€ããåŸ, s ã®æ°ããåèªåå² w(s) ã p(w|s) ãããµã³ãã«ã, èšèªã¢ãã«ãæŽæ°ãã. ãã®æäœã ãã¹ãŠã®æã«ã€ããŠã©ã³ãã ãªé çªã§ç¹°ãè¿ãè¡ã, 1: for j = 1 · · · J do 2: for s in randperm (s1, · · · , sD) do 3: if j >1 then 4: Remove customers of w(s) from Î 5: end if 6: Draw w(s) according to p(w|s, Î) 7: Add customers of w(s) to Î 8: end for 9: Sample hyperparameters of Î 10: end for å³ 5: NPYLM Î ã®ãããã¯åã®ãã¹ãµã³ãã©. 7[16] ã§ã¯, ãã®æ¹æ³ã¯ âDirect Gibbsâ ãšåŒã°ããŠãã. 4
5.
å³ 6: å¯èœãªåèªåå²
j ã®åšèŸºåã«ããååã確ç α[t][k] ã®èšç®. 1: for t = 1 to N do 2: for k = max(1, tâL) to t do 3: Compute α[t][k] according to (6). 4: end for 5: end for 6: Initialize t â N, i â 0, w0 â $ 7: while t > 0 do 8: Draw k â p(wi|ct tâk+1, Î) · α[t][k] 9: Set wi â ct tâk+1 10: Set t â t â k, i â i + 1 11: end while 12: Return w = wi, wiâ1, · · · , w1. å³ 7: åèªåå² w ã® Forward-Backward ãµã³ããªã³ ã° (ãã€ã°ã©ã ã®å Žå). åèªåå²ãšããã«åºã¥ãèšèªã¢ãã«ã亀äºã«æé©å ããŠãã. ã京éœå€§åŠãã®ããã«è€æ°ã®åå²ãããã ãå Žå, ã京éœå€§åŠããšãäº¬éœ å€§åŠãã®äž¡æ¹ã確ç çã«èæ ®ããããšã§, å±æ解ã«é¥ãããšãé¿ã, ãã ããã¢ãã«ãåŸãããšãã§ãã. å³ 8 ã«, 京倧ã³ãŒã ã¹ã«ãã㊠Gibbs ã®ç¹°ãè¿ãæ¯ã«åèªåå² w(s) ã 確ççã«æ¹è¯ãããŠããæ§åã瀺ãã. 4.2 Forward ï¬ltering-Backward sampling ããã§ã¯, å ·äœçã« w(s) ããµã³ããªã³ã°ããã«ã¯ã© ãããã°ããã®ã ããã. HMM ã®ãã€ãºåŠç¿ã§ç¥ ãããŠãã Forward ï¬ltering-Backward sampling æ³ [16] ãå¿çšãããš, ãã㯠PCFG ã®æ§ææšã® MCMC ã«ãããµã³ããªã³ã° [17] ãšæ¬è³ªçã«åãæ¹æ³ã§è¡ã ããšãã§ããããšãããã. Forward ï¬ltering ãã®ããã«, ãã€ã°ã©ã ã®å Žå ã¯ååã確ç α[t][k] ãå°å ¥ãã. α[t][k] 㯠s ã®éšå æåå c1 · · · ct ã, æåŸã® k æåãåèªãšããŠçæ ããã確çã§ãã (å³ 6), 次ã®ååž°åŒã«ãã, ãã以 åã®å¯èœãªåå²ãã¹ãŠã«ã€ããŠåšèŸºåãããŠãã. α[t][k] = tâkâ j=1 p(ct tâk+1|ctâk tâkâj+1) · α[tâk][j] (6) ãã ã α[0][0]=1 ã§ãã, cn · · · cm =cm n ãšæžãã. ãã®é¢ä¿ãæãç«ã€çç±ã¯ä»¥äžã§ãã. äºå€å€æ° å z1 · · · zN ãä¿æããããšã¯, åæå» t ã«ãããŠå·Š åŽã®æãè¿ãåèªå¢çãžã®è·é¢ qt ãä¿æããããšãš ç䟡ã§ãããã, α[t][k] = p(ct 1, qt =k) (7) 1 ç¥æžã§ã¯ç°äººé€š è¡ã® äºåæ£ ãç Žæãã ã 2 ç¥æž ã§ã¯ ç°äººé€š è¡ã® äºåæ£ ãç Žæãã ã 10 ç¥æž ã§ã¯ ç°äººé€š è¡ã® äºåæ£ ãç Žæãã ã 50 ç¥æž 㧠ã¯ç°äºº 通 è¡ ã® äº å æ£ ã ç Žæ ã ã ã 100 ç¥æž ã§ ã¯ ç° äººé€š è¡ ã® äº å æ£ ã ç Žæ ã ã ã 200 ç¥æž 㧠㯠ç°äººé€š è¡ ã® äº å æ£ ã ç Žæ ã ã ã å³ 8: ã®ãã¹ãµã³ããªã³ã°ã®ç¹°ãè¿ããšåèªåå² w(s) ã®æ¹è¯. w(s) ã¯æ尀解ãšã¯éãã, 確ççã§ãã. = â j p(ct tâk+1, ctâk 1 , qt =k, qtâk =j) (8) = â j p(ct tâk+1|ctâk 1 , qtâk =j)p(ctâk 1 , qtâk =j) (9) = â j p(ct tâk+1|ctâk tâkâj+1)α[tâk][j] (10) ãæãç«ã£ãŠãã. ããã§, (9) åŒã§ qt ãš qtâk ã®æ¡ 件ã€ãç¬ç«æ§ãçšãã. Backward sampling ååã確çããŒã㫠α[t][k] ãæ±ãŸããš, ææ«ããåŸåãã«å¯èœãªåèªåå²ããµã³ ããªã³ã°ããããšãã§ãã. α[N][k] ã¯æåå cN 1 ã® ãã¡æåŸã® k æåãåèªã§ãã確çã§ãã, ææ«ã«ã¯ å¿ ãç¹å¥ãªåèª$ãååšãããã, p($|cN Nâk)·α[N][k] ã«æ¯äŸãã確ç㧠k ããµã³ãã«ã, æåŸã®åèªã決 ããããšãã§ãã. ãã®åã®åèªãä»æ±ºããåèªã« åæ¥ããããã«åæ§ã«ãµã³ãã«ã§ã, ãããæååã® å é ã«éãããŸã§ç¹°ãè¿ã. (å³ 7) ãã©ã€ã°ã©ã äžã§ã¯ç°¡åã®ãããã€ã°ã©ã ã®å Žå ã説æããã, ãã©ã€ã°ã©ã ã®å Žåã¯, ååã確ç㫠α[t][k][j] ãçšãã. 8 ããã¯æåå ct 1 ã, æåŸã® k æå, ããã³ããã«ãã®åã® j æåãåèªãšããŠçæ ããã確çã§ãã. Forward-Backward ã¢ã«ãŽãªãº ã ã¯è€éã«ãªãããçç¥ããã, 2 次㮠HMM ã®ãã¿ ãã¢ã«ãŽãªãºã [19] ãšåæ§ã«ããŠå°åºããããšã㧠ãã. èšç®é ãã®ã¢ã«ãŽãªãºã ã®èšç®éã¯æååé·ã N ãšããŠ, æããšã«ãã€ã°ã©ã ã®å Žå㯠O(NL2 ), ãã© ã€ã°ã©ã 㯠O(NL3 ) ã§ãã. ãã ã, L ã¯åèªã®å¯ èœãªæå€§é· (†N) ãšãã. 4.3 åèªã¢ãã«ãšãã¢ãœã³è£æ£ ãã®ã¢ãã«ã¯ãã€ãºçãªéå±€ nã°ã©ã ã¢ãã«ãšã㊠èªç¶ãªãã®ã§ããã, å®éã«ã¯åŒ (5) ã ãã§ã¯, ã«ã¿ ã«ãèªãªã©, 綎ãã®é·ãåèªã®ç¢ºçãå°ãããªããã ããšããåé¡ãçãã [3]. åèªé·ã¯å€§ãŸãã«ãã¢ãœ ã³ååžã«åŸããã, ãããè£æ£ããããã«, (5) åŒã p(c1 · · · ck) = p(c1 · · · ck, k|Î) (11) = p(c1 · · · ck, k|Î) p(k|Î) Po(k|λ) (12) ãšå€åœ¢ãã. p(k|Î) ã¯æå nã°ã©ã ã¢ãã« Î ãã 8çè«çã«ã¯ 4 ã°ã©ã ããã以äžãå¯èœã§ããã, ããŸãã«è€ éã«ãªãäžæ¹ã§, å·®ã¯ããã»ã©å€§ãããªããšèãããã. ããã ãã®ãããªå Žå㯠Particle MCMC æ³ [18] ãææã ãšæããã ã, äºåå®éšã§ã¯åçèšç»æ³ã»ã©å¹ççã§ã¯ãªãã£ã. 5
6.
ã¢ãã« P R
F LP LR LF NPYLM 74.8 75.2 75.0 47.8 59.7 53.1 HDP 61.9 47.6 53.8 57.0 57.5 57.2 è¡š 1: è±èªé³çŽ åããŒã¿ã§ã®æ§èœæ¯èŒ. NPYLM ãæ æ¡æ³ã瀺ã. âHDPâ ã®çµæ㯠[10] ããåŒçšãã. ã¢ãã« èšç®æé iteration NPYLM 17 å 200 HDP 10 æé 55 å 20000 è¡š 2: è¡š 1 ã®çµæã«èŠããèšç®é. NPYLM ã¯å®é㫠㯠50 å, 4 åã»ã©ã§ã»ãŒåæãã. é·ã k ã®åèªãåºçŸãã確çã§ãã, [3] ãªã©ã§ã¯ p(k|Î) = (1 â p($))kâ1 p($) ãšèšç®ããŠããã, ãã ã¯ãŠãã°ã©ã ã®å Žå以å€ã¯æ£ãããªã. æ¬ç 究ã§ã¯, ã¢ã³ãã«ã«ããµã³ããªã³ã°ãçšã㊠Πããåèªãã© ã³ãã ã«çæã, æ£ç¢ºãªå€ãæšå®ãã.9 λ ã®æšå® æ¬ç 究ã§ã¯ (12) ã®ãã¢ãœã³ååž Po(k|λ) ã®ãã©ã¡ãŒã¿ λ ãå®æ°ã§ã¯ãªã, ã¬ã³ãäºåååž p(λ) = Ga(a, b) = ba Î(a) λaâ1 eâbλ (13) ãäžããŠ, ããŒã¿ããèªåçã«æšå®ãã. a, b 㯠p(λ) ãã»ãŒäžæ§ååžãšãªããã€ããŒãã©ã¡ãŒã¿ã§ãã. åèªåå²ã§åŸãããèªåœéåã W ãšãããš, λ ã® äºåŸååžã¯ | · | ãåèªã®é·ããè¿ãé¢æ°ãšããŠ, p(λ|W) â p(W|λ)p(λ) = â wâW ( eâλ λ|w| |w|! )t(w) · ba Î(a) λaâ1 eâbλ = Ga ( a+ â wâW t(w)|w|, b+ â wâW t(w) ) (14) ãšãªã. ããã§, t(w) ã¯åãåèª w ãæå HPYLM ããçæããããšæšå®ãããåæ°, ããªãã¡åèªãŠã ã°ã©ã ã§ã® t w ã§ãã. ã«ã¿ã«ãèªã挢åãªã©, åèª çš®æ¯ã«é·ãã®ååžã¯ç°ãªããã [3], ååèªçš®10 æ¯ã« ç°ãªã λ ãçšã, Gibbs ã®ç¹°ãè¿ãæ¯ã« λ ã (14) ã ããµã³ããªã³ã°ãã. 5 å®éš 5.1 è±èªé³çŽ åããŒã¿ çŽæ¥ã®å è¡ç 究ã§ãã [10] ãšæ¯èŒãããã, æåã« [10] ã§äœ¿ãããŠããè±èªã®é³çŽ åããŒã¿ãçšããŠå® éšãè¡ã£ã. ãã®ããŒã¿ã¯ CHILDES ããŒã¿ããŒã¹ ãåºã«äœæããã, 9,790 åã®é³çŽ åæžãèµ·ããã㌠ã¿ã§ãã.11 äžæã®å¹³å㯠9.79 æåãšããªãçãã ã, å®éšã§ã¯ L=4 ãšãã. 9ãã®èšç®ã¯, çŸåšã®èšç®æ©ã§ã¯æ°ç§ã§çµäºãã. 10åèªçš®ãšããŠã¯, è±å, æ°å, èšå·, ã²ãããª, ã«ã¿ã«ã, 挢å, 挢å+ã²ãããªæ··å, 挢å+ã«ã¿ã«ãæ··å, ãã以å€ã®èš 9 çš®ãçš ãã. å®è£ 㯠Unicode ã§è¡ã£ãŠæåçš®å€å®ã«ã¯ ICU [20] ãäœ¿çš ããŠãããã, èšèªã«ã¯äŸåããªã. 11ãã®ããŒã¿ã¯å®è£ ããã³è©äŸ¡çšããã°ã©ã ãšãšãã«, http:// homepages.inf.ed.ac.uk/sgwater/ ããå ¥æã§ãã. ã¢ãã« MSR CITYU 京倧 NPY(2) 0.802 (51.9) 0.824 (126.5) 0.621 (23.1) NPY(3) 0.807 (48.8) 0.817 (128.3) 0.666 (20.6) NPY(+) 0.804 (38.8) 0.823 (126.0) 0.682 (19.1) ZK08 0.667 (â) 0.692 (â) â è¡š 3: æ£è§£ãšã®äžèŽç (F å€) ããã³, æåãããã㌠ãã¬ãã·ãã£. NPY(2), NPY(3) ã¯åèªãã€ã°ã©ã ããã³ãã©ã€ã°ã©ã ã® NPYLM, NPY(+) 㯠NPY(3) ã®åŠç¿ããŒã¿ã 2 åã«ããå Žå. ZK08 㯠[21] ã§ã® æé«å€ã瀺ã. æåã¢ãã«ã«ã¯ â ã°ã©ã ãçšãã. MSR CITYU 京倧 Semi 0.893 (48.8) 0.895 (124.6) 0.914 (20.3) Sup 0.945 (81.5) 0.941 (194.8) 0.971 (21.3) è¡š 4: åæåž«ããããã³æåž«ããåŠç¿ã®ç²ŸåºŠ. åæåž« ããåŠç¿ã§ã¯, 10000 æã®æåž«ããŒã¿ãçšãã. è¡š 1 ã«, 200 åã® Gibbs iteration åŸã®çµæã瀺ã. 粟床 (P), åçŸç (R), F å€ (F) ãšã [10] ã«æ¯ã¹ãŠå€§ å¹ ã«äžæããŠãã, ææ¡æ³ã®æ§èœã®é«ãã瀺ããŠã ã. äžæ¹, åèªåå²ã§åŸãããèªåœã«å¯Ÿããåæ§ã®å€ (LP,LR,LF) ã¯å¿ ãããäžæããŠããããã§ã¯ãªã. è¡š 2 ã«, è¡š 1 ã®çµæãåŸãããã«å¿ èŠãšããèšç®æé ã瀺ã. [10] ã®ç¹°ãè¿ãåæ°ã¯, è«æã«æžãããŠãã ãã®ã䜿çšãã. MCMC ã®åæã¯äžæã§ã¯ãªãã, æšå®ãéåžžã«å¹ççã«ãªã£ãŠããããšãããã. 5.2 æ¥æ¬èªããã³äžåœèªã³ãŒã㹠次ã«, å®éã®æšæºçãªã³ãŒãã¹ãšããŠ, å ¬éããŒã¿ ã»ããã§ããäžåœèªã® SIGHAN Bakeoï¬ 2005 [22] ã®åèªåå²ããŒã¿ã»ããããã³äº¬å€§ã³ãŒãã¹ã䜿㣠ãŠå®éšãè¡ã£ã. äžåœèª æåž«ãªãã§ã®ææ°ã®çµæã§ãã[21] (Bakeoï¬ 2006 ã®ã¯ããŒãºãããŒã¿ã䜿çš) ãšæ¯èŒãããã, äºè ã§å ±éãªãã®ãšããŠç°¡äœäžåœèªçšã« Microsoft Research Asia (MSR) ã®ã»ãã, ç¹äœäžåœèªçšã« City University of Hong Kong (CITYU) ã®ã»ãããäœ¿çš ãã. ãããã 50,000 æãã©ã³ãã ã«éžãã§åŠç¿ ããŒã¿ãšã, è©äŸ¡ããŒã¿ã¯å梱ã®ãã®ãçšãã. æ¥æ¬èª 京倧ã³ãŒãã¹ããŒãžã§ã³ 4 ã®ãã¡, ã©ã³ãã ã«éžãã 1,000 æãè©äŸ¡ããŒã¿, æ®ãã® 37,400 æã åŠç¿ããŒã¿ãšããŠçšãã. ããããåŠç¿ããŒã¿ã¯ç©ºçœããã¹ãŠåãé€ããç æååã§ãã, äžåœèªã§ã¯ L=4, æ¥æ¬èªã§ã¯ L=8 ãš ãã. ãªã, äžèšã®å ããŒã¿ã¯äº¬å€§ã³ãŒãã¹çŽ 3.7 äžæ, MSR 8.6 äžæ, CITYU 5.3 äžæã§ããã, ææ¡æ³ã¯ æåž«ãªãåŠç¿ã®ãã, åŠç¿ããŒã¿ãåççã«ããã ã§ãå¢ããããšãã§ãã. ãã®å¹æãæ€èšŒãããã, ããã«åéã®åŠç¿ããŒã¿ã京倧ã³ãŒãã¹ã¯æ¯æ¥æ°è 1996 幎床12 ãã, MSR ã¯æªäœ¿çšã®éšåããã³ PKU 12京倧ã³ãŒãã¹ (1995 幎床æ¯æ¥æ°è) ãšè¿ã幎床ãçšãã. 6
7.
ä¹ æ¥ ä»
ã® è± æå çŽ ã¿ã€ã 㺠㯠ã å åœ åéšã® ãŠã§ã€ãã¹ ã«è¿ã ããŒãã©ã³ã ã® æµ·è»åºå° ã 欧 å· åã ç©è³ ã® éžæã åºå° ãšã㊠æ¥æ¬ äŒæ¥ ãªã ã äŒæ¥ é£å ã« ãã£ãã å£²åŽ ãã æ§æ³ ã æµ®äž ã ㊠ã ã ãš å ±ã ã ã äºèŒª äº äœ ã® æž æ°Žå®ä¿ 㯠ã€ã³ã«ã¬ ã ïŒ çš®ç® ã å¶ ã ㊠ãã ã å äº ã« ã©ã ãŸã§ è¿« ã ã ã 第çŸåäºå è¥å· ã» çŽæšè³ ã® éžè å§å¡ äŒ ã¯ ã å äº æ¥ å€ ã æ±äº¬ ã» ç¯å° ã® ã æ° å楜 ã 㧠è¡ãã ã è¥å· è³ ã çŽæšè³ ãš ã è©²åœ äœ ãªã 㚠決㟠ã£ã ã å³ 9: 京倧ã³ãŒãã¹ã®åœ¢æ çŽ è§£æ (NPY(3+)). ã»ãããã, CITYU 㯠Sinica ã»ããããè¿œå ããå® éšãåæã«è¡ã£ã. çµæ 400 åã® Gibbs iteration åŸã®äº¬å€§ã³ãŒãã¹ã® ãã¹ãããŒã¿ã®åœ¢æ çŽ è§£æäŸãå³ 9 ã«, æ°å€çµæã è¡š 3 ã«ç€ºã.13 京倧ã³ãŒãã¹ã® F å€ãçŽæã»ã©é«ã ãªãã®ã¯, âæ£è§£ã³ãŒãã¹â ãšæŽ»çšèªå°Ÿã®æ±ããç°ãª ãããšã, ãã«è¿ããã®ãããªæ £çšå¥, ãæµ·è»åºå°ã ãæž æ°Žå®ä¿ããšãã£ãåºæåè©ãææ¡æ³ã§ã¯é©åã«çµ åãããŠããããšã«ãããšèãããã. äžæ¹ã§äœé » 床èªã¯ããŒã¿ãå°ãªãããå©è©ãšçµåããå Žåãã ã, äºãæåã¢ãã«ãåŠç¿ããã, ããã«ããŒã¿ãå¢ ããå¿ èŠããã. äžåœèªã§ã¯ãããã®ã»ããã«ã€ããŠã, ãã¥ãŒãªã¹ ãã£ãã¯ãª [21] ã§ã®æé«å€ã倧ããäžåã£ãŠãã, 粟 å¯ãªç¢ºçã¢ãã«ã«åºã¥ãææ¡æ³ã®æå¹æ§ã瀺ããŠã ã. äžåœèªã«ã€ããŠã¯ãã€ã°ã©ã ãšãã©ã€ã°ã©ã ã® çµæã«å€§ããªéãã¯ãªãã, æ¥æ¬èªã§ã¯ãã©ã€ã°ã© ã ã®æ¹ãæ§èœãããªãäžæããŠãã. å®éã«è¡š 3 ã« ã¯è¡šããŠããªãã, åèªãããããŒãã¬ãã·ãã£ã¯ 336.1(ãã€ã°ã©ã ) ãã 154.0(ãã©ã€ã°ã©ã ) ãžãšå€§ ããæžå°ããŠãã. ããã¯ãã©ã€ã°ã©ã ãæ¥æ¬èªã® åèªéã®è€éãªé¢ä¿ããšãã, é«ç²ŸåºŠãªäºæž¬ãšãã çãåèªåå²ãçãã§ãã (åŠç¿ããŒã¿ã®å¹³ååèªé· 2.02â1.80) ããšãæå³ãã. åæåž«ããåŠç¿ ææ¡æ³ã¯å®å šãªçæã¢ãã«ã§ãã ã, æåž«ãªãåŠç¿ã ãã§ã¯ãªã, åæåž«ããåŠç¿ãæ åž«ããåŠç¿ãè¡ãããšãã§ãã. ããã«ã¯å³ 5 ã®ã¢ ã«ãŽãªãºã ã«ãããŠ, åèªåå² w(s) ãæåž«ããã®ã ã®ã«åºå®ããã°ãã. è¡š 4 ã«, éåžžã®åŠç¿ããŒã¿ã®ã ã¡ãããã 1 äžæãæåž«ãããšããå Žå, ããã³ãã¹ ãŠæåž«ãããšããå Žåã®ç²ŸåºŠã瀺ã. æåž«ããã®å Ž å, æ¥æ¬èªã§ 97%, äžåœèªã§ 94%çšåºŠ, åæåž«ããã® å Žåã, 1/5 çšåºŠã®æåž«ããããŒã¿ã§æ¥æ¬èªã»äžåœèª ãšã 90%çšåºŠã®æ§èœãéæãã. ãã ã, æåž«ãªãåŠç¿ã«ãšã£ãŠäººæã«ããåå²ãšã® äžèŽçãé«ãããšããæ£è§£ããšã¯éããªãããšã«æ³š æãããã. å®éã«ãã¹ãããŒã¿ã®æåãããããŒã ã¬ãã·ãã£ã¯, æåž«ãªã, åæåž«ããã®æ¹ãæ£è§£ã³ãŒ ãã¹ã®åèªåå²ãçšããå Žåãããã£ãšé«ãæ§èœã æã£ãŠãã, 人æã§äžããåèªåå²ãèšèªã¢ãã«ãšã ãŠæé©ãšã¯éããªãããšã瀺ããŠãã. 13æ¥æ¬èªã¯ L=8 ãšæ¢çŽ¢ç¯å²ãåºããã, çµã¿åãããèæ ®ã ããš, äžåœèªããåé¡ãããªãé£ãã. ãã¥ã 㮠埡 æ ã« ã ã 女埡 æŽè¡£ ããŸã ãã¶ã ã² ããŸã² ã ã äžã« ã ããš ããããš ãªã é 㫠㯠ãã 㬠ã ã ããã㊠æãã ããŸãµ ãã ãã ã ã¯ãããã æ 㯠㚠æã² ãã ã ããŸãž ã 埡 æ¹ã ã ãããŸã ã ãã®ã« ããšãã ããã¿ ããŸãµ ã åã ã»ã© ã ãã ãã äž è ã® æŽè¡£ ã㡠㯠ã ãŸã㊠ãã ãã ã ã æå€ ã® å®®ä» ã«ã€ã㊠ã ã 人㮠å¿ã ã®ã¿ åã ã ã æšã¿ã è² ãµ ã€ã ã ã«ã ãã ã ã ã ããš ãã€ã ã 㪠ã ãã ã ã ã® å¿ çŽ° ã ã« é ãã¡ ãªã ã ã ãããã ããã ãã¯ã ãªã ãã®ã« æã»ã ㊠ã · · · å³ 10: ãæºæ°ç©èªãã®æåž«ãªã圢æ çŽ è§£æ. 5.3 話ãèšèã³ãŒãã¹ ææ¡æ³ã¯è©±ãèšèãããã°çã«ã¿ãããå£èªãªã©, å èªã®åºæºãææ§ãªå Žåã«ç¹ã«å¹æçã ãšèãããã. ããã調ã¹ããã, [9] ãšåæ§ã«, æ¥æ¬èªè©±ãèšèã³ãŒ ãã¹ [23] (CSJ) ã®ã察話ãéšåãçšããŠå®éšãè¡ã£ ã. [9] ã§ã¯æãšããåäœãååšããªããªã©ååŠçã ç°ãªãã, åŠç¿ããã³è©äŸ¡ã«çšããæžãèµ·ããããŒã¿ ã¯åäžã§ãã. ãã®ããŒã¿ã¯åŠç¿ 6405 æ, ãã¹ã 322 æãšããªãå°ãªããã, ããã«ã察話ãéšå以å€ãã 5 äžæãåŠç¿ããŒã¿ãšããŠè¿œå ããå®éšãè¡ã£ã. å³ 11 ã«åèªåå²ã®äŸã, è¡š 5 ã«æåãããããŒã ã¬ãã·ãã£ã®æ¯èŒã瀺ã. ãã£ãŠããã®ãã®ãããªäŒ 話æç¹æã®è¡šçŸããã£ã©ãŒãæåž«ãªãã§èªèãã㊠ãã, æåãããããŒãã¬ãã·ãã£ã§ã¯ CSJ ã®çå äœãçšããå Žåãããåªããæ§èœãæã£ãŠãã. 14 NPY(2) NPY(2+) NPY(3) NPY(3+) çåäœ (+) 16.8 13.9 21.2 18.1 14.9 è¡š 5: CSJ ã®æååœãããã¹ãã»ããããŒãã¬ãã· ãã£. +ã¯åŠç¿ããŒã¿ãå¢ãããå Žåãè¡šã. 5.4 å€æããã³è¥¿æ¬§èª ææ¡æ³ã¯æåž«ããŒã¿ãå¿ èŠãšãã, ãã¹ãŠã®ãã©ã¡ãŒ ã¿ãããŒã¿ããåŠç¿ãããã, ããããèšèªã«é©çš ããããšãã§ãã. ç¹ã«, å€æãæèªæã®åœ¢æ çŽ è§£æ ã¯, æ¬ææ³ã«ããåããŠå®å šã«å¯èœã«ãªã£ã. å³ 10 ã«, ãæºæ°ç©èªãã®åé ã圢æ çŽ è§£æããäŸã瀺ã. çŸä»£æã®å Žåãšåæ§ã«, äœé »åºŠèªãšå©è©ãçµåãã ããšãããã, å€å žææ³ãæåž«ããŒã¿ãäžåäžããŠã ãªãã«ãããããã, å€ãã®å Žåã«ããããŠé©åãªå å£ ã å£ ã åã ㊠ãŸã ã ã å£ ã® åœ¢ ã ã¯ã å£ ã£ãŠãã㮠㯠å ãå«ãã ãã ãã© ã㌠ã¯ã 㯠ã ãã å ãå«ãã ãã ãã© ã ãã ãã£ã éå ã®ããšã 調é³éå ã£ãŠ èšã èš³ ãã ãã ã ㌠ãã® èšèã® çºå£° ãã æ ã® ãã ãããã é å èšè ã çºå£° ãã çºã« çºã« è¡ãªã ã㮠㌠è ã ãšã å ã ãšã ã¯ã ããã㯠åé ã ãšã ãµãŒã ãã ãã£ã ãã® ã ãã ã¿ã㪠ååã㊠ãã å 調 ã ㊠éåžžã« ãã çŽ æ©ã åã èš³ ã§ã ã ã 㯠ã ãã ãã£ã ãã® ã ãã ã 調é³éå ã£ãŠ èšã ãã§ã ã ã»ãŒ ã å³ 11: æ¥æ¬èªè©±ãèšèã³ãŒãã¹ã®åœ¢æ çŽ è§£æ. 14ãã€ã°ã©ã ã®æ§èœãé«ãçç±ã¯, æ¯èŒã®å¿ èŠãããã£ã©ãŒã æ®ãããã, ããŒã¿ãå°ãªãå Žåã¯ãã©ã€ã°ã©ã ãæ å ±æºãšã㊠ãµãããããªãããã ãšèãããã. 7
8.
lastly,shepicturedtoherselfhowthissamelittlesisterofhe rswould,intheafter-time,beherselfagrownwoman;andh owshewouldkeep,throughallherriperyears,thesimplean dlovingheartofherchildhood:andhowshewouldgathera boutherotherlittlechildren,andmaketheireyesbrightan deagerwithmanyastrangetale,perhapsevenwiththedre amofwonderlandoï¬ongago:andhowshewouldfeelwitha lltheirsimplesorrows,andï¬ndapleasureinalltheirsimple joys,rememberingherownchild-life,andthehappysumm erdays. (a) åŠç¿ããŒã¿ (éšå). last
ly , she pictured to herself how this same little sister of her s would , inthe after - time , be herself agrown woman ; and how she would keep , through allher ripery ears , the simple and loving heart of her child hood : and how she would gather about her other little children ,and make theireyes bright and eager with many a strange tale , perhaps even with the dream of wonderland of longago : and how she would feel with all their simple sorrow s , and ï¬nd a pleasure in all their simple joys , remember ing her own child - life , and thehappy summerday s . (b) åèªåå²çµæ. èŸæžã¯äžå䜿çšããŠããªã. å³ 12: âAlice in Wonderland â ã®åèªåå². èªåå²ãåŸãããŠããããšãããã. äœé »åºŠèªã«ã€ ããŠã, å€æã®èŠåºãèªãæåã¢ãã«ã«äºåã«äžã㊠ããããšã«ãã£ãŠ, ããã«æ¹åããããšæåŸ ã§ãã. æåŸã«, ææ¡æ³ã¯æ±æŽèªã ãã§ãªã, 西欧èªãã¢ã© ãã¢èªã«ããã®ãŸãŸé©çšããããšãã§ãã. å³ 12 ã«, 空çœããã¹ãŠåé€ãã âAlice in Wonderland â ã®åŠ ç¿ããã¹ããš, ããããæšå®ããåèªåå²ã瀺ã. ã ã®åŠç¿ããã¹ã㯠1,431 æ, 115,961 æåãšéåžžã«å° ããã«ãããããã, æåž«ãªãã§é©ãã»ã©æ£ç¢ºãªåèª åå²ãåŸãããŠãã. ãŸã, last-ly, her-s ãªã©æ¥å°ŸèŸ ãèªåçã«åé¢ãããŠããããšã«æ³šæãããã. ã ãããçµæã¯å±æãè€åèªã®å€ããã€ãèª, ãã£ã³ã© ã³ãèªçã®è§£æã«ç¹ã«æçšã ãšèãããã. 6 èå¯ããã³ãŸãšã æ¬ç 究ã§ã¯, éå±€ Pitman-Yor éçšã«ãããã€ãº nã° ã©ã èšèªã¢ãã«ãæåâåèªãšããã«éå±€åããèšèª ã¢ãã«ãçšã, MCMC æ³ãšåçèšç»æ³ã«ãã, ãã ããèšèªã«é ãããåèªããæååããçºèŠããèš èªã¢ãã«ããã³åœ¢æ çŽ è§£æåšãææ¡ãã. ææ¡æ³ã¯èå¥ã¢ãã«ã«ããã CRF ã®ãããªåå ãâåŸãåãã¢ã«ãŽãªãºã ã®æåž«ãªãåŠç¿çãšãã¿ã ããšãã§ã, CRF+HMM ã«ããåæåž«ããåè©ã¿ã° ä»ã [24] ã®ããã«, èå¥åŠç¿ãšã®èåã®åºç€ãäžã ããšèãããã. äžæ¹ã§, ããé«åºŠãªåèªã¢ãã«ãé ãç¶æ ãçšãããªã©, èšèªã¢ãã«èªäœã®é«åºŠåã«ãã é«ç²ŸåºŠåãè¡ã£ãŠãããã. è¬èŸ æ¬ç 究ãè¡ãåæ©ä»ããšãªã£ã Vikash Mansinghka æ° (MIT), å®è£ ã«é¢ããŠæçãªã¢ããã€ã¹ãããã ããé«æå²æ° (Google), å®éšããŒã¿ã®è©³çŽ°ãæã㊠ããã ããæŸååä»æ° (æ±å€§) ã«æè¬ããŸã. åèæç® [1] å·¥è€æ, å±±æ¬è«, æŸæ¬è£æ²». Conditional Random Fields ã çšããæ¥æ¬èªåœ¢æ çŽ è§£æ. æ å ±åŠçåŠäŒç ç©¶å ±å NL-161, pages 89â96, 2004. [2] Taku Kudo. MeCab: Yet Another Part-of-Speech and Morphological Analyzer. http://mecab. sourceforge.net/. [3] æ°žç°ææ. åèªåºçŸé »åºŠã®æåŸ å€ã«åºã¥ãããã¹ãããã®èª åœç²åŸ. æ å ±åŠçåŠäŒè«æèª, 40(9):3373â3386, 1999. [4] Sharon Goldwater and Tom Griï¬ths. A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging. In Proceedings of ACL 2007, pages 744â751, 2007. [5] å±±æ¬åæ, èäºçäžé. æåž«ãªãåŠç¿ã«ããæã®åå². In èš èªåŠçåŠäŒç¬¬ 8 å幎次倧äŒçºè¡šè«æé (NLP2002), pages 579â582, 2002. [6] å·¥è€æ. 圢æ çŽ åšèŸºç¢ºçãçšããåãã¡æžãã®äžè¬åãšãã® å¿çš. In èšèªåŠçåŠäŒå šåœå€§äŒè«æé NLP-2005, 2005. [7] äžå·å²æ²», æŸæ¬è£æ²». åèªã¬ãã«ãšæåã¬ãã«ã®æ å ±ãçšãã äžåœèªã»æ¥æ¬èªåèªåå². æ å ±åŠçåŠäŒè«æèª, 46(11):2714â 2727, 2005. [8] Kevin Murphy. Hidden semi-Markov models (seg- ment models), 2002. http://www.cs.ubc.ca/Ëmurphyk/ Papers/segment.pdf. [9] æŸååä», ç§èåè¯, 蟻äºæœ€äž. æå°èšè¿°é·åçã«åºã¥ãã æ¥æ¬èªè©±ãèšèã®åèªåå². In èšèªåŠçåŠäŒç¬¬ 13 å幎次倧 äŒçºè¡šè«æé (NLP2007), 2007. [10] Sharon Goldwater, Thomas L. Griï¬ths, and Mark Johnson. Contextual Dependencies in Unsupervised Word Segmentation. In Proceedings of ACL/COLING 2006, pages 673â680, 2006. [11] Yee Whye Teh. A Bayesian Interpretation of Interpo- lated Kneser-Ney. Technical Report TRA2/06, School of Computing, NUS, 2006. [12] Reinhard Kneser and Hermann Ney. Improved backing- oï¬ for m-gram language modeling. In Proceedings of ICASSP, volume 1, pages 181â184, 1995. [13] ææ©å€§å°, é ç°è±äžé. Pitman-Yor éçšã«åºã¥ãå¯å€é· n-gram èšèªã¢ãã«. æ å ±åŠçåŠäŒç ç©¶å ±å 2007-NL-178, pages 63â70, 2007. [14] Abel Rodriguez, David Dunson, and Alan Gelfand. The Nested Dirichlet Process. Journal of the American Sta- tistical Association, 103:1131â1154, 2008. [15] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapman & Hall / CRC, 1996. [16] Steven L. Scott. Bayesian Methods for Hidden Markov Models. Journal of the American Statistical Associa- tion, 97:337â351, 2002. [17] Mark Johnson, Thomas L. Griï¬ths, and Sharon Gold- water. Bayesian Inference for PCFGs via Markov Chain Monte Carlo. In Proceedings of HLT/NAACL 2007, pages 139â146, 2007. [18] Arnaud Doucet, Christophe Andrieu, and Roman Holenstein. Particle Markov Chain Monte Carlo. in submission, 2009. [19] Yang He. Extended Viterbi algorithm for second order hidden Markov process. In Proceedings of ICPR 1988, pages 718â720, 1988. [20] ICU: International Components for Unicode. http://site.icu-project.org/. [21] Hai Zhao and Chunyu Kit. An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Uniï¬ed Framework. In Proceedings of IJCNLP 2008, 2008. [22] Tom Emerson. SIGHAN Bakeoï¬ 2005, 2005. http://www.sighan.org/bakeoï¬2005/. [23] åœç«åœèªç 究æ. æ¥æ¬èªè©±ãèšèã³ãŒãã¹, 2008. http://www.kokken.go.jp/katsudo/seika/corpus/. [24] Jun Suzuki, Akinori Fujino, and Hideki Isozaki. Semi- Supervised Structured Output Learning Based on a Hy- brid Generative and Discriminative Approach. In Pro- ceedings of EMNLP-CoNLL 2007, pages 791â800, 2007. 8
Jetzt herunterladen